No, they won’t get permission first

Cover of "Everywhere Man" by Jim Nelson

In 2011, I wrote a novella about a Silicon Valley startup that trains its virtual reality software from tourist photos it scrapes off the Internet. Millions of these photos are stitched together to create a virtual cable car ride across San Francisco.

This story became Everywhere Man, which was also recorded as an audiobook that you listened to while riding the actual, real-life cable cars. It was one of several literary tours that Oakland-based Invisible City Audio Tours offered. Their idea was to see cities through the lens of literature, and not as a mere collection of landmarks and commercial sights.

During the development of the book, Invisible City’s publisher asked me: “Wouldn’t the startup need to get permission from the people who took the photos?”

Fourteen years later, we’ve received the answer to her question: “Yes, they should get copyright permission. No, they won’t do that, though.”

Of course, today the startups in question are not producing virtual reality tours. They are AI companies feeding massive amounts of copyrighted data into their Language Learning Models (LLMs), which in turn powers their artificial intelligence behemoths—Chat GPT, Claude, Grok, and so forth.

And just like my fictional Silicon Valley startup, these AI companies are being challenged in regard to their use of intellectual property. Shouldn’t these companies have to get copyright permission before using creative works to build their software?

The answer, predictably, is that they don’t believe they need to:

  • Nick Clegg, former executive for Meta (Facebook): Asking artists’ permission before AI companies scrape copyrighted content will “basically kill the AI industry in this country overnight.”
  • In a statement from Open AI, the creators of Chat GPT, they assert “the federal government can both secure Americans’ freedom to learn from AI and avoid forfeiting our AI lead to the [People’s Republic of China] by preserving American AI models’ ability to learn from copyrighted material.”
  • Meanwhile, “OpenAI and Google are pushing the US government to allow their AI models to train on copyrighted material. Both companies outlined their stances in proposals published this week, with OpenAI arguing that applying fair use protections to AI ‘is a matter of national security.'”

It’s not even a matter of asking permission at this point—these companies have already trained their LLMs with copyrighted material and made their AI available to the public. Their earliest defense was that training on copyrighted material was “transformational” and covered as Fair Use. Later they began to frame the argument as a matter of national security. (OpenAI is particularly prone to this claim.) At some point, they’ve all stated publicly, in so many words, that needing to obtain permission from content creators would destroy their business model. (In Nick Clegg’s case, it’s not even so many words. He came right out and stated it.)

Facebook went so far as to use a massive database of flagrantly pirated texts (called LibGen) to train its AI. An internal company document reveals that they kept the source of their texts secret because “if there is media coverage suggesting we have used a dataset we know to be pirated, such as LibGen, this may undermine our negotiating position with regulators on these issues.”

The Atlantic has helpfully produced a searchable database to see if an author’s work was included among LibGen’s trove of pirated texts. Sure enough, three of my books are in the set: Bridge Daughter, Hagar’s Mother, and—you guessed it—Everywhere Man, a book about a Silicon Valley company using stolen intellectual property to train their software.

There is a familiar smell about all of this. In a recent video on typeface piracy (a practice which goes back hundreds of years), designer Linus Boman observes “every time there is a massive technological shift, intellectual property rights suddenly, and very conveniently, become a blind spot. … Is it only considered piracy if the people who do it lack resources and respectability?” Apparently so.


Maybe it’s time to stop telling ourselves that AI will never produce a passable novel, song, or movie—that AI lacks the fiery human spirit to produce creative work of value. Maybe we should concede that AI is more than capable of producing better-than-mediocre works of art.

The open market tells us this is the case. Writers have been caught using AI to produce tens, even hundreds, of novels, all profitable bestsellers with attentive and loyal fan bases. The reason this is true is that AI is a master of imitating others’ work.

Couldn’t an AI be trained only with public domain texts published on or before 1929, the current cut-off point for copyright protection in the United States? Well, it could, but then all those romance novels it produced would read like Jane Eyre and Wuthering Heights. That’s not going to sell many copies.

AI has practical, world-bettering applications in the sciences, healthcare/medicine, mathematics, and beyond. I’m not arguing against AI as a general technology. But it seems all avenues of creating works with AI leads to less-than-optimal market conditions for AI companies and their users if copyright protections are upheld. Why, though, do their short-term profit margins suddenly erase basic copyright law, a legal concept that goes back to the time of Shakespeare?

Ask yourself: Are you better off reading AI-generated novels? Or listening to AI-produced music, or watching AI-generated movies? I see no evidence that AI-produced work is being sold at a lower price than human-generated content, or offering a better experience. What’s in it for me? Lower-quality mass-produced books sold at the same or higher price than before? How is this progress?

Alternate cover of "Everywhere Man" by Jim Nelson

Why won’t Google include my Sherlock Holmes copyright post?

Sherlock Holmes

In January, I posted about my research into the history of the copyright status on Sherlock Holmes. Although many news outlets rang in the New Year with proclamations that Sherlock Holmes was now free of copyright and in the public domain (“Now anybody can write a Sherlock Holmes story”), I pointed out that they’d made similar proclamations in 2013 (“Finally, Sherlock Holmes is now in the public domain”) after a 7th Circuit court decision castigated the Doyle literary estate.

Indeed, Sherlock Holmes, Dr. Watson, and the bulk of the Holmes canon have been in the public domain for decades now (in the United States, at least). My conclusion was that the Doyle literary estate has been using fear tactics to con creators—from movie studios down to independent authors—to pay them bogus licensing fees.

Something strange happened after posting that entry, though. I check in with Google Search Console now and then to see how my web site is being indexed and discovered by users. My post on the history of Sherlock Holmes’ copyright status has been indexed by Google but is not available via search. In other words, Google’s servers have seen the post, they’ve analyzed the content, but they refuse to add it to their search engine for users to discover. (Google has indexed pages on my web site that link to the page, but not the page itself.)

It’s been over three months. Posts I made after the Sherlock Holmes entry were indexed and made available on Google immediately, usually within a week. Almost all my other blog posts are available on Google (so far as I can tell). Not the entry on Sherlock Holmes’ copyright situation, though. I’ve made repeated attempts to get the page indexed. I ran a Google Search Console tool to find any problems on the page. I’ve gone through Google’s help system to find any valid reason the page may be excluded. The result: Zilch, and my page remains unavailable on Google search.

This isn’t a problem on alternative search engines Duck Duck Go or Bing. It’s only Google.

Google is free to present or exclude any pages it wants to. I’m not even arguing they owe me an explanation, although I’d appreciate one.

But just as Google is in control of their web site, I’m in control of mine. I’ve tried my best to navigate their systems and understand why they’ve excluded my page, to no avail. So I’ll use my final option—my voice, however small—to let others know.

Update: Several weeks after posting this, Google Search began returning the page as a result.