I recently learned that three of my books - The New Rules of Marketing and PR, Fanocracy, and Marketing Lessons from the Grateful Dead are part of the Books3 database which was used to train generative Artificial Intelligence systems by Meta, Bloomberg, and others.
My three books, as well as 183,000 books from other authors were used without permission, without compensation, or without citation credit. I’ve been thinking about this for the past week and wanted to share how conflicted I am about it.
On one hand, I am an enthusiastic and vocal supporter of the power of Artificial Intelligence to transform business and life. I’ve written often about AI and delivered talks to thousands of people on the topic. I’ve also invested in AI companies including Lately.ai and the Marketing AI Institute. And I advise other AI companies including Kai.ai and GlobalPros.ai. So yes, I am all in on AI.
At the same time, for more than twenty years I’ve pioneered the idea of using content as a form of marketing. I’ve advocated putting content out there for free to educate and inform potential customers, the media, investors, and others. I’ve published nearly 2,000 blog posts like this one, all free. I’ve got a bunch of videos of my talks available for anyone to watch. I’ve published a bunch of free ebooks and hundreds of LinkedIn articles.
All that content also serves as Search Engine Marketing, leading people who find me but don’t yet know who I am to my website or social feeds. I’m showing people my ideas and some readers and viewers may buy a book or hire me to speak.
I’m totally cool with Generative AI tools training on all content that I put out there for free.
However, the books3 database is very different. It’s stealing my paid content without my permission (or my publishers’ permission) and using it in ways I didn’t authorize. Perhaps worse, my book content isn’t cited in the answers that these generative AI systems deliver to users. The systems use my hard work and I derive no benefit.
Some authors including Sarah Silverman, Michael Chabon, and Paul Tremblay are suing Meta claiming that its use in training generative AI amounts to copyright infringement. The law around this is murky so perhaps this will go all the way to the U.S. Supreme Court.
Here is a searchable database of the books in the Books3 database from The Atlantic. If you’re an author, you can check for your own books.
My wife Yukari Watanabe Scott is also an author of books, but she publishes in the Japanese language. We got to talking about how our work is used in these generative AI systems and then expanded the discussion to include other creators whose works are including in AI training sets.
Talented illustrators, painters, photographers, filmmakers, and musicians have their work used just like my books were. These artists’ creations are treated simply as a raw material by huge, powerful, and wealthy corporations.
Yukari’s beautiful quote “A Frankenstein Monster of Stolen Art” sums up our discussion of these systems.
Yes, generative AI systems are getting better by the day. However, the output often feels wooden, odd, or just plain wrong, much like the amalgamation of body parts that make up Frankenstein’s Monster.
Early this week, the White House issued an Executive Order on Safe, Secure, and Trustworthy Artificial Intelligence. While there are many elements covered in the Executive Order, here is the relevant section about content:
“Protect Americans from AI-enabled fraud and deception by establishing standards and best practices for detecting AI-generated content and authenticating official content. The Department of Commerce will develop guidance for content authentication and watermarking to clearly label AI-generated content. Federal agencies will use these tools to make it easy for Americans to know that the communications they receive from their government are authentic—and set an example for the private sector and governments around the world.”
Watermarking would be a good start. That way, if my book content is used in an AI system’s answers, my work would be cited.
Yukari and I will continue to create our content and publish it for free on our blogs and social sites. And we will continue to write books.
The AI revolution is here. It’s not going away.
What do you think? Are books a different kind of content from the kind of content that people publish for free? Should authors just ignore the problem as an externality of technological progress?
Image via Dall-E 2 from the prompt: “A Frankenstein style character inside of an art museum with framed paintings under one arm that he has stolen off the walls.”
Book database output via The Atlantic.