Good Luck with the Training Data, Sucker! 

by Barry Magrill, Ph.D 

Do you know if ChatGPT has used your published scholarship as training data without your permission?  

Consider using the following real case study with your students to begin a conversation about the ethics of Gen AI. Useful as a law case, academic integrity, economics and ethics.  

The ethics issues swirling around Generative AI were theoretical, for me. Until they weren’t. In December 2025, I received notification that I was part of a $1.5 billon class action lawsuit against Anthropic, which owns ChatGPT. I quickly learned that my book, A Commerce of Taste: Church Architecture in Canada 1867-1914 (MQUP, 2012) had been used in the training data for the Large Language Model, and without my permission. Is the issue simply one of training GenAI on data that includes academic scholarship like mine or is it about wealthy investors in GenAI profiting from the work of others without acknowledgment? Isn’t that the definition of plagiarism? Hence, the class action lawsuit.  

I offer this as a case study for CapU faculty to begin conversations about the ethics of GenAI. At what point does the experiment of training an artificial intelligence on academic scholarship become plagiarism for profit? And, more importantly what does resistance look like?   

What follows is the story of my small resistance. My writing is obtuse. My publication on how architectural pattern book designs were a part of the spread of churches across Canada was quite idiosyncratic. In fact, if you’ve been paying attention to my writing style in this blog post, you’ll see that I mean. It’s hard to think in a linear fashion when you have seven thoughts occurring all at once. As a work of academic scholarship, A Commerce of Taste is adequate. But, as the judge of my own writing style, which is complex and dense at best, I cannot understand how my confusing style of writing could be an asset for GenAI training data. Of the thousands of books and journals used to create that training data, how many are like mine? Imperfectly written.  

Here is an example from A Commerce of Taste:  

‘The social and commercial realm around the pattern books was characterized by mobility and the constitution of historical references in architecture. Illustrated print-based books of church architecture signalled history’s deployment as a stabilizing force on modernity. History was treated, like taste, as a commodity.’ 

CoPilot was consulted. I gave it two tasks:

  1. critique the passage, and
  2. evaluate whether the writing represented good training data.  

This is what it returned:

  • Precise language but heavily packed with abstract nouns 
  • Hard for readers to unpack without rereading.  
  • Abrupt transitions between ideas.  
  • Lacks rhythm and variation.  

None of this surprised me. It even guessed I was the author based on the topic – architectural pattern books. Or it knows its own training data. Did the algorithm think that the passage represented good training data. Nope. It wrote: “Models might learn to produce overly complex or jargon-heavy text without clarity.”  

Gotcha, sucker! 

All this time I have been using ChatGPT while getting odd responses, mirages, and incorrect answers. I can only say this: I thought I recognized something familiar.  

To look up your publications to learn if they are part of the class action lawsuit, follow:  

https://secure.anthropiccopyrightsettlement.com/lookup 

Deadline for submissions is March 30, 2026 or opt out by March 2, 2026.  

Learn more about the lawsuit and settlement at:  

https://copyrightalliance.org/participating-bartz-v-anthropic-settlement/