Big tech is stealing books to train AI. Authors and publishers are fighting back.

Brianne Moore
1 day ago
3 min read

Anthropic, OpenAI, and now Meta have all come under fire for stealing authors' and artists' copyrighted work to train their AI models. Creators (and, now, publishers) are getting fed up.

Laptop with an AI prompt open on the screen

Another day, another news story about a powerful tech company ripping off artists in the name of training artificial intelligence. This time, it's Meta, which is being sued by five major publishers (Elsevier, Cengage, Hachette, Macmillan, and McGraw Hill), as well as author Scott Turow for copyright infringement. The class-action complaint alleges that Meta pirated millions of books and articles to train its large language model, Llama.

Guardian: Major publishers sue Meta for copyright infringement over AI training

I have feelings about this, and they aren't good. Turns out, a few years ago, Anthropic used a pirated copy of my debut novel, All Stirred Up (and many other works), to train its LLM, Claude. That, too, resulted in a class-action lawsuit and a record-breaking settlement.

BBC: AI firm Anthropic agrees to pay authors $1.5bn to settle piracy lawsuit

Tech firms have long been famous for their 'move fast and break things' philosophy. They've been celebrated for it. But it simply is not acceptable. They shouldn't get a pass just because they create platforms a lot of people like to spend time on. I spent a lot of time on my book. All the authors who had their work stolen did. We deserve to be paid for our work, and it's deeply offensive for companies worth billions to simply steal it, to train models that threaten our livelihoods and very existence.

Now, look, I'm not a luddite. I'm not anti-AI; I've used it here and there (not in my writing, of course!). But, like many other authors, I do worry about its creep into the world of publishing, and what that means for writers. We've already seen at least one suspected AI work make its way through the editorial process at a major publishing house.

Guardian: Hachette pulls horror novel Shy Girl after suspected AI use

The better these models get, the harder it'll be to tell what's been written by a human, and what's come out of a machine. Will publishers, in 10 or 20 years, decide they don't need to pay human writers at all anymore and just start feeding prompts into an LLM and publishing the results?

And if an LLM has been trained on my own work, does that mean my genuinely written-by-a-human future books might get flagged as AI?

It's a sticky place we're in, which is always the case with new tech, and change. That's why we need to start establishing rules and making it clear what is and what is not acceptable.

We can't stop tech companies from training their AI on our work, but if they're going to do it, they need to do it legally. The judge in the Anthropic case found that training an AI on published work is fair usage. Yes, fair enough. But, the judge went on to say, they have to obtain the works legally. They have to pay for it, just as anyone else would. They can't just scrape piracy sites, which is what Anthropic (and, it's alleged, Meta) did. That's where they went wrong. They stole, and that's not ok.

I'm glad and grateful that authors and, now, publishers are taking a stand. Big thanks to everyone who's filed and fought these lawsuits, so we can establish how artists' works can be used in AI. Going up against big tech is no small thing, but it has to be done, because they can't just be allowed to keep breaking things. They aren't toddlers, they need to abide by the rules, just like the rest of us.

You want to use my work to train your products? Ok, fine, I can't stop you. But you need to pay for a copy of my book. I'm sure Zuck can cough up $13.99.

Big tech is stealing books to train AI. Authors and publishers are fighting back.

Related Posts

Comments