Fair use and ChatGPT

Magnifying Glass

A quick note about the copyright lawsuit centered on fair use and ChatGPT’s use of copyrighted texts. I think the lawsuits are looking at the wrong moment in time to demolish fair use arguments.

My understanding is that there are 4 factors for determining fair use. The Stanford libraries explain them as:

  • The goals that the material is being used for
  • The copyrighted item’s properties as a work
  • How much of it was copied and how substantial that part was
  • How will the use affect the value of the original source.

Unfortunately, these criteria are fuzzy and need a legal proceeding to come to a definite conclusion.

Most discussions of fair use in the OpenAI copyright lawsuits focus on the final product: the large language model developed from the documents OpenAI accessed. Looking at the ChatGPT product is the wrong moment in the process to identify infringement.

Before the large language model had been created, the text was downloaded to OpenAI’s servers. At this moment, the claim of fair use is most tenuous. The information has been copied in full and potentially can destroy the market for the copied text. It is a situation to deny fair use claims and to apply conventional copyright protection statues.

At this moment of text ingestion, the purpose of the use is to transform the information into an unpredictable level of fidelity. The work is textual, and its value is the information in contains. The majority, if not all, of the information has been taken and it could harm the potential market by making the information available without accessing the original service.

To make an argument for fair use more convincing, fair use evaluation should be made before the information has been taken. One can always say, after the fact, “this is fair use,” but unless the analysis has been made up front, OpenAI can throw out all the ideas they can think of and see if any stand up in court. It is similar to the situation where the police find a criminal using techniques they can’t bring up in court, for example by using a Stingray. The police can backtrack and wash the information they already know, hiding the improper methods from the defendant. OpenAI can backtrack and find any idea they can think of no matter how disingenuous.

Another issue of fair use is that the user of the information should be able to make a listing of the materials that have been accessed. If I’m going to take a painting and claim my use is fair use, I need to be able to present where the work came from. If OpenAI can’t identify what they have taken, they can deny a practice of violating copyrights by hiding the “low hanging fruit” of implausible fair use claims.

If they could show these secondary properties of a fair use access, it would bolster OpenAI’s claim of fair use. Did they make their analysis before accessing the data? Can they completely identify what information they used for fair use. Did they consider the different fair use situations for the different kinds of sources?

They took the text verbatim when they acquired the text. They intended to use it in a way that they can’t argue convincingly that it would not harm the existing market for the text. Those are strong violations of the fair use definitions, it’s just a different point of time when the problematic behavior happened.

News organizations such as the New York Times can argue that the infringement happened when the text was acquired. Whether or not the original text can be retrieved from the created system is irrelevant. The infringement happened long before the OpenAI was brought to defend itself.

Windy Winter and Artificial Intelligence

A magnifying glass searching
I’ve noticed recently that it seems to be more windy here than in the past. Over the weekend, we had a strong wind storm with 60+ mph wind gusts. Yet I don’t think that the storm is proof. Just as a bitterly cold week doesn’t invalidate the process of climate change, a single wind storm doesn’t make it more windy. But I suspected changes before the weekend.

I mentioned my observation to a friend and he suggested I could use an AI tool to analyze the situation. I’m not convinced that artificial intelligence will help me get a better result.

The research in “Global trends in wind speed and wave height” by I. R. Young, S. Zieger, and A. V. Babanin indicates that it is an interesting topic. That paper in could help me identify the most useful questions. It also indicates that analyzing climate is not easy.

For my own situation, I need to find a source of historical wind speeds. Weather Underground has some data starting about 1940, but it isn’t easy to access. windfinder.com sells hourly data going back to 1999. I didn’t check the pricing of their data.

Another issue is what should I measure? Weather Underground has the maximum speed each day which is a good start, but may not answer my question. Windfinder has hourly data which is a finer granularity and might be more useful.

There would be a few more decisions I would need to make before I get an answer. When do I want to break between a historical base statistic to compare to recent data or should I look for a trend in the wind speeds? The number of data points in both groups can affect the statistical validity of my results.

Before I start looking for tools to confirm my observation, I need to make a lot of decisions. I don’t think an artificial intelligence tool will help me decide these prerequisites.

Artificial Intelligence is a trendy hammer, but not every problem is a nail.