Can generative AI create art? Two years ago I took my first swing at answering that, at least from my perspective.1 As AI systems become more advanced, this question, and the issues surrounding it have become of greater import. With a new release from OpenAI, it’s become a topic of great passion, and one prime to revisit for me.
I would like to explore this topic more deeply than I did previously, both in terms of cultural impact, and historical context. This is something that is easy to take an emotional position on, though as someone that considers themself an artist, it’s worthy of more nuanced examination. I’ll be touching on art broadly, photography specifically, and unlike my last essay on the topic, I’ll spend some time discussing writing.
Let’s start with the recent events that has brought this to the fore. OpenAI released an update to their image generation feature, using their newer 4o model, to create images and video that substantially improve over the results of prior models.
Shortly after this release, users discovered that they could direct the system to recreate existing images in the style of the beloved-by-many Studio Ghibli. This became a viral trend, and used for everything from creating cute family photos to the cruelest of political machinations. To say this prompted considerable debate would be an understatement.
In my prior post on this topic, I spent some time discussing the definition of art, and how it may apply to generative AI. I broke down each element of the definition, providing arguments for and against. In the end, I settled on something simpler than a technical definition: given that art depends so heavily on the eye, opinion, and perspective of the beholder, it may be best to paraphrase Forest Gump and say simply that “art is as art does.”
There is a great unbroken chain in art, starting with the first people that painted on cave walls, and connects over the millennia to the artists of today. Each and every one building on the work of others. No artist exists in isolation.
Every work of art, starting with those crude lines of the first cave paintings, has built on the knowledge, experience, style, and design of others. Some mimic, some contrast, some defy, and some refine, but all artists build on the work of others.
Even those art forms that create ex nihilo, starting with a blank canvas and an idea, draw from the knowledge, ideas, and visions of others. Each artist collects this knowledge, and distills it into their works.
I am a photographer, and I don’t mean that in terms of it being a hobby I’ve dabbled in. I was a photojournalist - work that has won awards (and death threats), I’ve paid my bills through sports and portrait photography, I’ve been a professional wedding photographer, and now focus on fine art photography.
I’ve invested a non-trivial portion of my life into this art, and learning everything I can about it. The physics of light, the mechanical design of lenses, the chemistry of film, the electrical engineering of digital sensors. I’ve studied countless images to learn what works and what doesn’t. I’ve studied the psychology of how people perceive photos, model posing, facial expressions, and perspective. I’ve refined my style over decades of effort.
Every image I create is a distillation of that knowledge. It’s the result of everything that I’ve learned, and applying that to a specific scene. When a person generates an image using AI, the AI system understands none of that. Yet, critically, it benefits from that knowledge.
Generative AI is trained on massive troves of data, including millions of photographs.2 Through these photographs, this knowledge is effectively transferred by proxy; the system may have no understanding of why, but it does gain the understanding of how. Thanks to this training, generative AI still uses this distilled knowledge.
In the last essay, I shared an image I created using generative AI; today I include a revised version. The prior version was created in Stable Diffusion, after several hours of effort, this version was recreated with OpenAI’s 4o model.
As was the case previously, this image follows my style, my choices of lenses, my affinity for cityscapes, and of course my love of sharp contrasts. It was the result of substantial effort, and more time than I would invest in most of the photographs that I create. Of course, it’s not perfect, though reality is also rarely perfect.
One could argue that this is nothing but “AI slop” - cheapening the work of photographers, models, tattoo artists, and others. Or one could argue that it’s a new way to create photos with greater flexibility and freedom.
Today, as a photographer, I am as much of a purist as possible while using a digital camera. I use film emulation on the camera to achieve the high-contrast black and white style, and otherwise my photos are untouched. No edits, not even cropping.
While not intended to be a slight against those with a different creative process, I have said many times that “my art is photography, not Photoshop.” The idea of altering the moments I capture after the fact is entirely anathema to me.
To me, an AI generated photo is no different than a real photo that has been heavily edited in Photoshop: it’s the product of a tool, and the artistic value is found in the intent and vision of the person controlling the tool.
Today, I would say that this question is absurd, though that’s not always been the case.
while the photograph is the mere mechanical reproduction of the physical features or outlines of some object animate or inanimate, and involves no originality of thought or any novelty in the intellectual operation connected with its visible reproduction in shape of a picture
On March 3rd, 1865, President Abraham Lincoln signed an amendment to the existing copyright laws that extended copyright protection to photographs in the US, a move that was later tested before the Supreme Court. There was a time where there was a real argument that photographs were devoid of creativity.
Charles Baudelaire, in a letter titled “Le Public Moderne et la Photographie” said, in a rather famous quote:
As the photographic industry was the refuge of all failed painters, too ill-equipped or too lazy to complete their studies, this universal infatuation bore not only the character of blindness and imbecility, but also the color of vengeance. […] it is obvious that this industry, by invading the territories of art, has become art’s most mortal enemy […]
Baudelaire was far from alone in attacking photography, and denying its value as a form of art. For painters especially, photography was a problem. Not only did this new technology intrude on their domain of capturing moments and the beauty of the world, it severely impacted their incomes.
Portraiture rapidly changed from paintings, where an artist would charge or weeks or months of work, to photography, where the entire process was down to a couple hours. The steady and reliable incomes that artists counted on to survive simply vanished thanks to a new technology. The parallels here should be obvious.
In my last essay, I should an AI generated “painting” based off on an idea I had. You see, despite quite a bit of effort, I have no ability to draw or paint. I understand the techniques, I’ve practiced, but I’m terrible at it. I simply don’t have the ability to translate what I can imagine into something that my hands can produce.
One could argue that lacking that ability, I have no right to create, based on the vision in my mind, art that requires those abilities. One could argue that using a tool to execute that vision renders the result hollow and meaningless, because my hands are unable to execute the details.3 One could argue that the only those with certain abilities should be able to execute their vision.
I would not make these arguments personally; as noted earlier, to me, the artistic value is in the intent and vision of the creator, regardless of the tools they use to execute it.
There is, quite obviously, the question of if a person is just generating something, versus executing on a clear vision. I’ll talk more about that later.
The social media post that pushed me to write this referred to the text generated by AI as “usually a blathering, vacuous, and erroneous essay disrespectful of the reader’s time.” I’m not going to disagree with that, but I feel like there’s both nuance and context that can be added.
Upon reading that line, the first thing that struck me is how often the same can be said for text written by humans. This is especially true of anything labeled as “thought leadership” or an annoying percentage of LinkedIn posts. Human’s are prone to error, too often don’t check facts, and too rarely are focused on grabbing attention over providing useful information or a perspective that’s actually thought provoking.
A common theme in my career has been serving as a ghost writer for others, providing the first draft of an article that will eventually be published under their name. These articles go through editing processes, they go through PR firms, and along the way, much tends to disappear. Too many of those first drafts, touched only by humans, have become the same “blathering, vacuous, and erroneous essay disrespectful of the reader’s time” that AI is criticised for.
Generative AI is highly effective at distilling what humans create, including both the good and the bad. If it has distilled this tendency in how we write, that may say more about us than it.
In addition to my own work as a ghost writer, I’ve worked with very talented writers doing the same. I’ve seen their early drafts. I’ve worked with them to prepare articles. For articles that broadly fit into the “thought leadership” category, there’s not much difference between their work, and what’s generated by the more advanced chain-of-thought LLMs, when used with a well crafted prompt.
I spend some of my free time writing fiction, some is published here. Mostly dystopian, which I’m guessing isn’t a surprise to anyone reading this.
One of my favourite stories about LLM generated text, and one that started the process of changing how I see AI systems more broadly, came from an idea I had for a short story. It was a novel idea, a bit funny, but a little too absurd to justify the time it makes me to write. I quickly decided that I wasn’t going to pursue it.
Before discarding the idea entirely, I wrote up a detailed prompt for ChatGPT to see what it could do with it.
The result was funny, witty, just the right amount of absurdist, and creative in ways I hadn’t anticipated. Quite frankly, I was shocked by the quality. It was good enough that I stopped writing fiction for nearly a year, because it felt like a waste of time. It wasn’t perfect, but it was at least as good as the first draft of the fiction I’ve written.
I had to wonder if there was a point to writing at all, if an LLM could do in 30 seconds, what took me 40 hours.
The power (and arguably terror) of generative AI is the fact that it is the distillation of human creativity. It’s the sum total of what humans have created for centuries.
It reflects us, sometimes in ways that we don’t like, because we don’t like what humans have created. It is, in many ways, a mirror. Why is it good at writing vacuous essays? Because humans have written a lot of them.
Just as we learn from each other, and create from what we’ve learned, so to do these systems. They generate from what they’ve seen, from what we have created. For good and ill.
One of the oldest acronyms in computing, GIGO, applies today as much as ever. If you use a tool like Photoshop and randomly click buttons, you may end up with an image, but it will certainly be garbage. If you use generative AI the same way, you’ll get the same result. Garbage in, garbage out.
In my experiments, I’ve found that it’s possible to clearly articulate a vision, a structure, a point, enough detail that you produce something that at least resembles what you wanted. When using chain-of-thought (CoT) LLMs4, the results can be quite impressive. They are obviously still far from perfect, but in many cases I’ve seen results on-par with what a person would produce.
From my perspective, the prompts are where you go from garbage to something that may become art. It’s the intent and vision expressed in the prompt; it’s how the tool is used and guided, that imbues the result with meaning.
The difference between Jackson Pollock and a random person throwing paint at a canvas is intent and vision. I think there’s an argument that the same applies to generative AI.
While this goes beyond the core scope of this article, I would be remiss to end the article without touching on this more clearly. I recently wrote about the likely impact on jobs from AI, which as noted there will likely impact creative and knowledge-based jobs disproportionately. Just as photography put many painters out of work, so too will AI. It seems to be an unavoidable reality at this point. For many companies, AI will be “good enough” for jobs to be eliminated.
Generative AI has the ability to allow those without certain talents and abilities to create in ways they never could before, and at the same time to destroy the livelihoods of the artists that would have been responsible for that creation in the past. Unlike what happened with the introduction of photography, this isn’t a shift towards fewer people being paid less to achieve the same thing, but may represent a broad elimination of jobs.
As AI systems continue to evolve, the impact will only grow, and I believe that we, as a society, need to make broad changes to avoid the worst outcomes.
Generative AI is likely many other tools, in that the quality and value of the results are dependent on the quality and value in the inputs. To me at least, it is simply another tool, another means by which people who are willing to put effort into it, can express a vision that may not be practical or even possible otherwise.
It’s also a tool that, when given input of little effort or value, will produce output of little value.
For those that have made it this far, may I offer you something amusing, that you may find thought provoking in unexpected ways. While we talk about the errors and potentially worthless nature of what generative AI creates, may I present something of purely human creation: English as She Is Spoke. The section on “Idiotisms and Proverbs” (p. 58) is quite worth the read.
I suggest reading this original post before completing this one, as it lays out my general perspective on the artistic value of generative AI, which this post builds upon. ↩︎
I will not be touching on the issue of copyright here; it is a complex area of law, and the fair use exception is even more complex due to its nature. Determining if the companies that build AI systems have violated copyright laws is a matter for the courts to decide. Personally, I do believe that there is a solid argument for fair use, based on my understanding of the caselaw in play, I believe that there is also a reasonable argument against it. This is a novel topic, and will require considerable litigation to determine where the fair use line sits. Due to the nature of the fair use exception, the answer can only be found through litigation. ↩︎
My family shares a genetic condition, a rare degenerative neurological disorder, that is most easily compared to Parkinson’s, though progresses slower and eventually stops progressing. One result of which is a small tremor in my hands and limitations on fine motor control. In practice, this, so far, has minimal impact on me, beyond the inability to draw and the need to use camera lenses with some form of optical image stabilisation to compensate for the constant movement of the camera. In the case of arts such as painting and drawing, the limitation goes beyond a lack of talent. ↩︎
Chain-of-thought (CoT) models outperform non-CoT models to such a degree, especially in terms of accuracy, that I don’t use the non-CoT models, except for the most basic of tasks. Using CoT models has fundamentally changed how I view LLMs. ↩︎