In January 2021, the artificial intelligence research lab OpenAI released a limited version of a piece of software called Dall-E. The software allowed users to type in a simple description of an image they had in mind, and after a brief pause, the software provided an almost eerily good interpretation of their proposal, worthy of a hired illustrator or Adobe-savvy designer - but a lot faster and free. Typing in, for example, "a pig with wings flying over the moon illustrated by Antoine de Saint-Exupéry" resulted after a minute or two of processing in something reminiscent of the blotchy but recognizable watercolor brushes of the creator of The Little Prince.
The latter ability has come to be known as "prompt engineering": the technique of formulating one's instructions in terms that are most clearly understood by the system, so that it returns the results that most closely match - or perhaps exceed - expectations. Tech commentators were quick to predict that in a code-free future, prompt engineering would become a coveted and well-paying job description, where the most powerful way of interacting with intelligent systems would be through the medium of human language. We would no longer need to be able to draw or write computer code: we would simply whisper our wishes into the machine and it would do the rest. The limits of AI creations would be the limits of our own imagination.
Dall-E imitators and further developments quickly followed. Dall-E mini (later renamed Craiyon) gave those not invited to OpenAI's private services a chance to play around with a similar, less powerful, but still very impressive tool. Meanwhile, the independent commercial company Midjourney and the open-source program Stable Diffusion used a different approach to classifying and generating images to achieve essentially the same goals. Within months, the field had rapidly evolved into short video and 3D model generation, with new tools emerging daily from academic departments and hobbyist programmers, as well as the established giants of social media and now AI: Facebook (aka Meta). , Google, Microsoft and others. A new field of research, software and competition had opened up.
The name Dall-E connects the robot protagonist from Disney's Wall-E with the Spanish surrealist Salvador Dalí. On the one hand you have the character of a bold, autonomous, and lovable little machine sweeping up the debris of a collapsed human civilization, and on the other a man whose most common quip is, "Those who seek nothing to imitate produce nothing." and "The important thing is to create confusion, not remove it." Both are admirable namesakes for the wide array of tools that have come to be known as AI image generators.
Over the past year, this new wave of consumer AI, encompassing both image generation and tools like ChatGPT, has captured everyone's imagination. It has also boosted the fortunes of large tech companies that, despite many efforts, have failed to convince most of us that either blockchain or virtual reality (“the metaverse”) is the future we all want. At least this one feels fun for five minutes or so; and "AI" still has that scintillating sci-fi quality, reminiscent of giant robots and superhuman brains, that offers that little exposure to the truly new. Of course, what's happening under the hood is anything but new.
There have been no major breakthroughs in the academic discipline of artificial intelligence for several decades. The underlying technology of neural networks - a method of machine learning based on how the physical brain works - was theorized and even put into practice as early as the 1990s. Even then, you could create images with it, but they were mostly formless abstractions, blobs of color with little emotional or aesthetic resonance. The first convincing AI chatbots are even further back. In 1964, Joseph Weizenbaum, a computer scientist at the Massachusetts Institute of Technology, developed a chatbot called Eliza. Eliza was modeled after a "person-centered" psychotherapist: whatever you said would be reflected back to you.
Early AIs didn't know much about the world, and academic departments lacked the computing power to use it on a large scale. The difference today is not in intelligence, but in data and power. The big tech companies have spent 20 years collecting vast amounts of data from culture and everyday life, building huge, power-hungry data centers filled with increasingly powerful computers to process it. What were once creaky old neural networks have become superpowers, and the surge of AI we're seeing is the result.
AI image generation relies on the compilation and analysis of millions and millions of tagged images; That is, images that are already provided with some kind of description of their content. These images and descriptions are then processed by neural networks, which learn to associate specific and deeply nuanced qualities of the image - shapes, colors, compositions - with specific words and phrases. These qualities are then layered on top of each other to create new arrangements of form, color and composition based on the billions of differently weighted associations generated by a simple prompt. But where did all these original images come from?
The datasets published by LAION, a German non-profit organization, are a good example of the type of image-text collections used to train large AI models (they formed the basis of Stable Diffusion and Google's Imagen ). For more than a decade, another nonprofit web organization, Common Crawl, has indexed and stored as much of the public World Wide Web as it can access, archiving up to 3 billion pages each month. LAION researchers took a portion of the Common Crawl data and pulled out each image with an "alt" tag, a line of text or something to be used to describe images on web pages.
“It's the digital equivalent of getting stolen property. Someone stole the picture from my late doctor's files and it ended up on the internet somewhere, and then it was scraped into this record," Lapine told Ars Technica. "It's bad enough that a photo was leaked, but now it's part of a product. And that goes for all photos, medical records or not. And the potential for future abuse is really high." (According to her Twitter account, Lapine continues to use tools like Dall-E to make her own art.)
All of this type of publicly available AI, whether it works with images or words, and the many data-driven applications of this type are based on this vast appropriation of existing culture, the extent of which we can scarcely comprehend. Public or private, legal or otherwise, most text and images scraped together by these systems are in the nebulous realm of "fair use" (allowed in the US but questionable if not outright illegal in the EU). As with most operations in advanced neural networks, it is truly impossible to understand how they work from the outside, save for rare encounters like Lapine's. But we can be sure: the results of this type of AI are far from the magical,
AI image and text generation is pure primitive accumulation: expropriating the labor of the many to enrich and advance some Silicon Valley tech companies and their billionaire owners. These companies made their money by meddling in every aspect of everyday life, including the most personal and creative areas of our lives: our secret passions, our private conversations, our likenesses, and our dreams. They encircled our imagination in much the same way that landowners and robber barons once encircled common lands. They promised that by doing so they would open up new realms of human experience, give us access to all human knowledge, and create new types of human connections.
The weirdness of AI imaging is both output and input. One user tried typing nonsensical expressions and was confused and a little uncomfortable to find that the Dall-E mini seemed to have a very good idea of what a "Crungus" was: an otherwise unfamiliar expression that consistently featured images of a snarling, naked ogre-like figure produced. Crungus was so clear in the program's imagination that he could easily be manipulated: other users were quick to offer images of ancient Crungus tapestries, Roman-style Crungus mosaics, oil paintings of Crungus, photos of Crungus hugging various celebrities, and, this is the internet, at , "sexy" Crungus.
So who or what is Crungus? Twitter users were quick to dub him "the first AI cryptid," a creature like Bigfoot that in this case exists in the underexplored terrain of the AI's imagination. And that's about the clearest answer we can get at this point given our limited understanding of how the system works. We cannot look into its decision-making processes because the way these neural networks "think" is inherently inhuman. It is the product of an incredibly complex, mathematical ordering of the world, as opposed to the historical, emotional way people order their thinking. The Crungus is a dream that emerges from the AI's world model, composed of billions of references, who have escaped their origin and merged into a mythological figure detached from human experience. Which is good, even amazing - but it begs the question, whose dreams are we drawing on here? What composition of human culture, what perspective on it, created this nightmare?
A similar experience was had by another digital artist who was experimenting with negative prompts, a technique to create what the system sees as the polar opposite of what is being described. When the artist entered "Brando::-1", the system returned what looked a bit like a logo for a video game company called DIGITA PNTICS. That this might be the opposite of Marlon Brando across the multiple dimensions of the system's worldview seems reasonable enough. But when they verified it went the other way by typing " DIGITA PNTICS skyline logo::-1," something far stranger happened: All of the images showed a sinister-looking woman with sunken eyes and flushed cheeks, whom the artist dubbed Loab . once discovered, Loab seemed unusually and disturbingly persistent. Refeeding the image into the program, combined with increasingly different text prompts, brought Loab back again and again, in increasingly nightmarish forms where blood, gore and violence reigned supreme.
Here's an explanation for Loab and possibly Crungus: While it's very, very hard to envision how the machine's imagination works, it's possible to envision it having a form. This shape will never be smooth or neatly rounded, but will have valleys and peaks, mountains and valleys, areas of rich information and areas that lack many features at all. These areas of high information content correspond to association networks about which the system "knows" a lot. One can imagine that the regions related to human faces, cars, and cats, for example, are quite dense given the distribution of images found by examining the entire Internet.
An AI image generator will rely on these regions the most when creating its images. But there are other less-visited places that come into play when negative prompts - or actually nonsensical phrases - are employed. To answer such queries, the machine must resort to more esoteric, less secure connections, perhaps even inferring the opposite from the totality of its knowledge. Here in the back country are Loab and Crungus to be found.
That's a satisfying theory, but it raises certain uncomfortable questions about why Crungus and Loab look like this; why they tilt towards horror and violence, why they hint at nightmares. AI image generators seem to have recreated even our darkest fears in their attempt to understand and replicate all of human visual culture. Perhaps this is just a sign that these systems are really very good at aping the human consciousness, right down to the horrors that lurk in the depths of existence: our fear of filth, death and corruption. And if so, we must acknowledge that these will be permanent components of the machines we build in our own image. There is no escape from such obsessions and dangers no moderating or constructing away the reality of human existence. The filth and disgust of living and dying will remain with us and need to be addressed, as will hope, love, joy and discovery.
This is important because AI image generators will do what all previous technologies have done, but they will also go further. They will reproduce the biases and prejudices of those who create them, like the webcams that only recognize white faces, or the predictive police systems that besiege low-income neighborhoods. And they will improve the game too: the scale of AI performance shifts from the narrow realm of puzzles and challenges - playing chess or Go or following traffic rules - to the much broader realm of imagination and creativity.
While claims of AI's "creativity" may be exaggerated - there is no true originality in image generation, only very skilled imitation and mimicry - this does not mean that it is unable to undertake many common "artistic" tasks, long considered the preserve of skilled workers, from illustrators and graphic designers to musicians, videographers and even writers. That's a huge shift. AI is now engaging with the underlying experience of feelings, emotions and moods, and this will allow it to shape and affect the world at ever deeper and more compelling levels.
Introduced by OpenAI in November 2022, ChatGPT has continued to transform our understanding of how AI and human creativity might interact. Structured as a chatbot - a program that mimics human conversation - ChatGPT can do much more than conversation. When prompted, it is able to write working computer code, solve math problems, and mimic common writing tasks, from book reviews to academic papers, wedding speeches, and legal contracts.
It was immediately clear how the program could be a boon to those who find it difficult to write emails or essays, for example, but also how it could be used, like image generators, to replace those who make a living from these tasks . Many schools and universities have already introduced policies banning the use of ChatGPT amid fears students will use it to write their essays, while the academic journal Nature has had to publish policies explaining why the program is not considered a research author performed (there cannot be consent and cannot be held accountable). But institutions themselves are not immune to inappropriate use of the tool: In February, Peabody College for Education and Human Development, part of Vanderbilt University in Tennessee, shocked students when it issued a letter of condolence and advice following a Michigan school shooting. While the letter spoke of the value of community, mutual respect and togetherness, a note at the end indicated that it was written by ChatGPT - which felt both morally wrong and somehow wrong or spooky to many. There seem to be many areas of life where machine intervention requires deeper thought. mutual respect and togetherness, a note at the end stated it was written by ChatGPT - which felt both morally wrong and somehow wrong or spooky to many. There seem to be many areas of life where machine intervention requires deeper thought. mutual respect and togetherness, a note at the end stated it was written by ChatGPT - which felt both morally wrong and somehow wrong or spooky to many. There seem to be many areas of life where machine intervention requires deeper thought.
If it were inappropriate to replace all of our communication with ChatGPT, then a clear trend is that it's becoming a sort of clever assistant, guiding us through the morass of available knowledge to the information we're looking for. Microsoft was a pioneer in this direction, reconfiguring its often despised Bing search engine as a chatbot with ChatGPT, thereby massively increasing its popularity. But despite the online (and journalistic) rush to consult ChatGPT on almost any issue imaginable, its relationship with knowledge itself is a bit shaky.
A recent face-to-face interaction with ChatGPT went like this. I asked her to suggest some books for me to read based on a new area of interest: multispecies democracy, the idea of involving non-human creatures in political decision-making. It's pretty much the tool's most useful application: "Hey, here's something I'm thinking about, can you tell me more?" and ChatGPT obliges. It gave me a list of several books that explored this novel area of interest in depth, and explained in compelling human language why I should read them. That was awesome! However, only one of the four books listed turned out to actually exist, and some of the concepts I think ChatGPT should explore further,
Well, that didn't happen because ChatGPT is right-wing by nature. It's because it's inherently stupid. It has read most of the internet and knows what human speech is supposed to sound like, but it has no relation to reality. They're dream phrases that sound about right, and honestly, listening to him speak is about as interesting as listening to someone's dreams. It's very good at producing what sounds like sense, and best at producing the clichés and banalities that make up the bulk of its diet, but remains unable to meaningfully relate to the world as it actually is. Distrust anyone who pretends this is an echo or even an approximation of consciousness. (When this piece was about to be released,
Believing this type of AI to be actually knowledgeable or meaningful is actively dangerous. There is a risk of poisoning the well of collective thinking and our ability to think at all. If, as suggested by tech companies, the results of ChatGPT queries are provided as answers for those searching for knowledge online, and if, as suggested by some commenters, ChatGPT is used in the classroom as a teaching tool, then the hallucinations will be in enter the permanent record, effectively putting itself between us and more legitimate, verifiable sources of information, until the line between the two becomes so blurred as to be invisible. Furthermore, our ability as individuals Exploring and critically evaluating knowledge on our own behalf has never been more necessary, not least because of the damage tech companies have already done to the way information is disseminated. To place our full trust in the dreams of poorly programmed machines would be to give up such critical thinking altogether.
AI technologies are also bad for the planet. Training a single AI model - according to a study published in 2019 - could emit the equivalent of more than 284 tons of carbon dioxide, which is almost five times the entire lifetime of the average American car, including its manufacture. These emissions are projected to increase by nearly 50% over the next five years as the planet continues to heat up, acidifying oceans, igniting wildfires, triggering superstorms and driving species into extinction. It is hard to imagine anything utterly stupider than artificial intelligence as it is practiced today.
So, let's take a step back. If these current incarnations of "artificial" "intelligence" are so boring, what are the alternatives? Can we imagine powerful technologies for sorting and communicating information that don't exploit, abuse, mislead and replace us? Yes, we can - once we step out of the corporate power networks that define the current AI wave.
In fact, there are already examples of AI being used to benefit specific communities by bypassing the entrenched power of corporations. Indigenous languages are under threat worldwide. The UN estimates that one disappears every two weeks, and with those disappearances come generations of knowledge and experience. This problem, the result of centuries of colonialism and racist assimilation policies, is exacerbated by the increasing dominance of machine-learning language models, which ensure that popular languages increase their power while lesser-known languages are robbed of their fame and expertise.
In Aotearoa, New Zealand, a small non-profit radio station broadcasting in the Māori language called Te Hiku Media decided to address this discrepancy between how different languages are represented in technology. His vast archive of more than 20 years of broadcasts, representing a wide range of idioms, slang expressions and unique idioms, many of which are no longer spoken by anyone, was digitized but had to be transcribed to be of use to linguists and language researchers the Māori community. In response, the radio station decided to train its own speech recognition model so that it would be able to "listen" to its archive and create transcriptions.
Over the next few years, using open-source technologies as well as homegrown systems, Te Hiku Media achieved the nearly impossible: a high-precision Māori language speech recognition system built and owned by its own language community. It was more than a software expense. The station contacted every Māori community group it could find and asked them to record themselves speaking prewritten statements in order to provide a corpus of annotated speeches, a requirement for training their model.
There was a cash prize for whoever submitted the most sentences – one activist, Te Mihinga Komene, recorded 4,000 sentences alone – but organizers found that the biggest motivation for the contributors was a shared vision of reviving the language and them at the same time owned by the community. Within weeks, it created a model that recognized recorded speech with an accuracy of 86% - more than enough to start transcribing its entire archive.
Te Hiku Media's accomplishment paved the way for other indigenous groups, who are now undertaking similar projects by Mohawk peoples of southeastern Canada and native Hawaiians. It also established the principle of data sovereignty in relation to indigenous languages and thus also to other forms of indigenous knowledge. As international for-profit companies began turning to Māori speakers to help build their own models, Te Hiku Media campaigned against these efforts, arguing, “They suppressed our languages and physically knocked them out of our grandparents, and now they want to they sell our language back to us as a service.”
"Data is the last frontier of colonization," wrote Keoni Mahelona, a native Hawaiian and one of the co-founders of Te Hiku Media. All of Te Hiku's work is released under what is known as the Kaitiakitanga License, a legal guarantee of guardianship and custody that ensures that all data that went into the language model and other projects remain the property of the community that created them – in this case, the Māori speakers who have offered their assistance – and they may or may not license them as they see fit according to their tikanga (Māori customs and protocols). In this way, the Māori language is being revived while defying and transforming the systems of digital colonialism,
I think the lesson of the current wave of "artificial" "intelligence" is that intelligence is a bad thing when imagined by corporations. If your worldview is one in which profit maximization is the king of virtues and all things should be held by the standard of shareholder value, then of course your artistic, imaginative, aesthetic and emotional expressions will be miserably impoverished. We deserve better in the tools we use, the media we consume and the communities we live in, and we will only get what we deserve when we are able to give our best to participate in it. And don't let them intimidate you either - they really aren't that complicated. Ursula K. Le Guin wrote, "Technology is what we can learn."