Fair use and ChatGPT

Magnifying Glass

A quick note about the copyright lawsuit centered on fair use and ChatGPT’s use of copyrighted texts. I think the lawsuits are looking at the wrong moment in time to demolish fair use arguments.

My understanding is that there are 4 factors for determining fair use. The Stanford libraries explain them as:

  • The goals that the material is being used for
  • The copyrighted item’s properties as a work
  • How much of it was copied and how substantial that part was
  • How will the use affect the value of the original source.

Unfortunately, these criteria are fuzzy and need a legal proceeding to come to a definite conclusion.

Most discussions of fair use in the OpenAI copyright lawsuits focus on the final product: the large language model developed from the documents OpenAI accessed. Looking at the ChatGPT product is the wrong moment in the process to identify infringement.

Before the large language model had been created, the text was downloaded to OpenAI’s servers. At this moment, the claim of fair use is most tenuous. The information has been copied in full and potentially can destroy the market for the copied text. It is a situation to deny fair use claims and to apply conventional copyright protection statues.

At this moment of text ingestion, the purpose of the use is to transform the information into an unpredictable level of fidelity. The work is textual, and its value is the information in contains. The majority, if not all, of the information has been taken and it could harm the potential market by making the information available without accessing the original service.

To make an argument for fair use more convincing, fair use evaluation should be made before the information has been taken. One can always say, after the fact, “this is fair use,” but unless the analysis has been made up front, OpenAI can throw out all the ideas they can think of and see if any stand up in court. It is similar to the situation where the police find a criminal using techniques they can’t bring up in court, for example by using a Stingray. The police can backtrack and wash the information they already know, hiding the improper methods from the defendant. OpenAI can backtrack and find any idea they can think of no matter how disingenuous.

Another issue of fair use is that the user of the information should be able to make a listing of the materials that have been accessed. If I’m going to take a painting and claim my use is fair use, I need to be able to present where the work came from. If OpenAI can’t identify what they have taken, they can deny a practice of violating copyrights by hiding the “low hanging fruit” of implausible fair use claims.

If they could show these secondary properties of a fair use access, it would bolster OpenAI’s claim of fair use. Did they make their analysis before accessing the data? Can they completely identify what information they used for fair use. Did they consider the different fair use situations for the different kinds of sources?

They took the text verbatim when they acquired the text. They intended to use it in a way that they can’t argue convincingly that it would not harm the existing market for the text. Those are strong violations of the fair use definitions, it’s just a different point of time when the problematic behavior happened.

News organizations such as the New York Times can argue that the infringement happened when the text was acquired. Whether or not the original text can be retrieved from the created system is irrelevant. The infringement happened long before the OpenAI was brought to defend itself.

My First ChatGPT

I dialed into ChatGPT today for the first time. I was pretty happy with my experience. I didn’t intend to bang on my shiny new toy with a mallet but rather use it as something to enhance my life. Thus, I didn’t intend to expose its political biases or trick it into acting foolish, inappropriate, or to provoke controversy.

However, it’s easy to want to probe the boundaries. One thing I did in that vein was to ask for its favorite tongue twister. It gave me one that was pretty easy. It used phrasing that I took to mean that this was a “classic” tongue twister that might be well known. The tool also mentioned that since it was just a language model, it didn’t have a personal preference. (That’s the gist of its disclaimer, not the exact wording.)

Perhaps, going back to my first experiences with Siri, I could wonder whether there were easter eggs hidden in ChatGPT. Siri was obviously programmed to have clever answers to certain stock questions. Why would I admire this tool if it had the same tricks? I would be disappointed rather than impressed if they were there.

After getting used to it, I decided to use it for something useful. I’ve been writing some JavaScript code lately and I wondered whether it could help me learn more about that.

I directed the conversation to JavaScript generator functions and async functions. It gave useful information. For one question, the code it generated didn’t match the explanation. However, the concept’s description was accurate, and the coding error was obvious. When I continued questioning about other features of JavaScript, I opened a tab and used DuckDuckGo to point me to a developer.mozilla.org article to confirm its description. I also got more thoroughly vetted information there.

I used that additional info to direct the conversation further. I also turned to caniuse.com to see whether one language feature was commonly available in different browsers. I was impressed that ChatGPT was explicit on reporting the different versions of ECMAScript involved. That helped identify when certain syntax was added to the language standard. Some of what it explained went over my head. I need to use those features in real code and read the explanations again. Also, it assumed I understood a related feature better than I actually do.

To me ChatGPT is not a toy. It can be played with, certainly. But so can a can of whipped cream. I don’t learn much playing with either. Eventually I’ll knock off the valve or make a mess to clean up. I’d rather learn how to use it efficiently. The language model is not god-like or without flaws. A word processer’s spell checker is not without flaws either, so I learned how to use it efficiently and moved on from there.

When I was done, I tried to capture the discussion by copying the text and pasting it into Word, but the code that it generated didn’t paste properly. The line breaks in the code were lost. Another problem was that I was pasting white text in a document with a white background. When I went to Acrobat Pro and pasted it using the Edit Text tool, the pasted text came out correctly except that the font changed for the code.

If I represent something that ChatGPT does as if it were my own work, that’s simple plagiarism. It might appear to pass when writing a blog post or two, but a professional writer shouldn’t use it as a substitute for his or her own work. Even using it to improve a paragraph needs more effort. I should use my own words instead of mimicking ChatGPT’s robotic syntax.

I asked it to improve three of paragraphs here. It didn’t understand what I was trying to emphasize. One “improvement” was such a mess that I couldn’t use it at all. The other two were far from my normal style–too formal with a stilted vocabulary–and had to be overhauled. Mostly, I used them as an inspiration for further editing.

My attitude can be to treat it more as a thesaurus than as a copywriter-for-hire. You have to know what you’re doing when you use a thesaurus.