No, they won’t get permission first

Cover of "Everywhere Man" by Jim Nelson

In 2011, I wrote a novella about a Silicon Valley startup that trains its virtual reality software from tourist photos it scrapes off the Internet. Millions of these photos are stitched together to create a virtual cable car ride across San Francisco.

This story became Everywhere Man, which was also recorded as an audiobook that you listened to while riding the actual, real-life cable cars. It was one of several literary tours that Oakland-based Invisible City Audio Tours offered. Their idea was to see cities through the lens of literature, and not as a mere collection of landmarks and commercial sights.

During the development of the book, Invisible City’s publisher asked me: “Wouldn’t the startup need to get permission from the people who took the photos?”

Fourteen years later, we’ve received the answer to her question: “Yes, they should get copyright permission. No, they won’t do that, though.”

Of course, today the startups in question are not producing virtual reality tours. They are AI companies feeding massive amounts of copyrighted data into their Language Learning Models (LLMs), which in turn powers their artificial intelligence behemoths—Chat GPT, Claude, Grok, and so forth.

And just like my fictional Silicon Valley startup, these AI companies are being challenged in regard to their use of intellectual property. Shouldn’t these companies have to get copyright permission before using creative works to build their software?

The answer, predictably, is that they don’t believe they need to:

  • Nick Clegg, former executive for Meta (Facebook): Asking artists’ permission before AI companies scrape copyrighted content will “basically kill the AI industry in this country overnight.”
  • In a statement from Open AI, the creators of Chat GPT, they assert “the federal government can both secure Americans’ freedom to learn from AI and avoid forfeiting our AI lead to the [People’s Republic of China] by preserving American AI models’ ability to learn from copyrighted material.”
  • Meanwhile, “OpenAI and Google are pushing the US government to allow their AI models to train on copyrighted material. Both companies outlined their stances in proposals published this week, with OpenAI arguing that applying fair use protections to AI ‘is a matter of national security.'”

It’s not even a matter of asking permission at this point—these companies have already trained their LLMs with copyrighted material and made their AI available to the public. Their earliest defense was that training on copyrighted material was “transformational” and covered as Fair Use. Later they began to frame the argument as a matter of national security. (OpenAI is particularly prone to this claim.) At some point, they’ve all stated publicly, in so many words, that needing to obtain permission from content creators would destroy their business model. (In Nick Clegg’s case, it’s not even so many words. He came right out and stated it.)

Facebook went so far as to use a massive database of flagrantly pirated texts (called LibGen) to train its AI. An internal company document reveals that they kept the source of their texts secret because “if there is media coverage suggesting we have used a dataset we know to be pirated, such as LibGen, this may undermine our negotiating position with regulators on these issues.”

The Atlantic has helpfully produced a searchable database to see if an author’s work was included among LibGen’s trove of pirated texts. Sure enough, three of my books are in the set: Bridge Daughter, Hagar’s Mother, and—you guessed it—Everywhere Man, a book about a Silicon Valley company using stolen intellectual property to train their software.

There is a familiar smell about all of this. In a recent video on typeface piracy (a practice which goes back hundreds of years), designer Linus Boman observes “every time there is a massive technological shift, intellectual property rights suddenly, and very conveniently, become a blind spot. … Is it only considered piracy if the people who do it lack resources and respectability?” Apparently so.


Maybe it’s time to stop telling ourselves that AI will never produce a passable novel, song, or movie—that AI lacks the fiery human spirit to produce creative work of value. Maybe we should concede that AI is more than capable of producing better-than-mediocre works of art.

The open market tells us this is the case. Writers have been caught using AI to produce tens, even hundreds, of novels, all profitable bestsellers with attentive and loyal fan bases. The reason this is true is that AI is a master of imitating others’ work.

Couldn’t an AI be trained only with public domain texts published on or before 1929, the current cut-off point for copyright protection in the United States? Well, it could, but then all those romance novels it produced would read like Jane Eyre and Wuthering Heights. That’s not going to sell many copies.

AI has practical, world-bettering applications in the sciences, healthcare/medicine, mathematics, and beyond. I’m not arguing against AI as a general technology. But it seems all avenues of creating works with AI leads to less-than-optimal market conditions for AI companies and their users if copyright protections are upheld. Why, though, do their short-term profit margins suddenly erase basic copyright law, a legal concept that goes back to the time of Shakespeare?

Ask yourself: Are you better off reading AI-generated novels? Or listening to AI-produced music, or watching AI-generated movies? I see no evidence that AI-produced work is being sold at a lower price than human-generated content, or offering a better experience. What’s in it for me? Lower-quality mass-produced books sold at the same or higher price than before? How is this progress?

Alternate cover of "Everywhere Man" by Jim Nelson

Comments

This site uses Akismet to reduce spam. Learn how your comment data is processed.