ai – W. Smith

I was poking around on Grokipedia v0.2 on Nov. 25, 2025 to contrast it with Wikipedia. I was not evaluating its political orientation. The media reports that its political attitude is a primary point of interest but not for me.

Here are some observations.

GP tends to have long sentences that could be refactored into improved prose. There are a lot of comma splices. The style can be stilted and unnatural. The articles seem to be consistently too long. They include redundant content and repeat the same information in different sections of an article. Each topic has two or more subtopics. This contrasts with Wikipedia pages, which often have a flat outline and make judicious use of sub-topics and, rarely, sub-sub-topics, Grokipedia articles are always subdivided into two levels. The text of each Wikipedia article has been individually organized in a logical fashion. Grokipedia’s organizing principle is usually hidden. In addition, the section headings are long which makes them hard to scan quickly as you review a page.

GP only includes pure URLs as its citations. Wikipedia includes the date that the reference was accessed along with other information such as title, source and publisher when its available. Wikipedia can include citations that are not available online or are paywalled (for example in books or bibliographic databases) when the information is not available freely. Wikipedia heavily uses archive.org’s Wayback archive to preserve access to references that are gone or altered. A pure URL doesn’t help me grade the quality of sources or decide whether it is relevant to my purpose for visiting the site. When I did go to Grokipedia citations, it was be difficult to find where the referenced information was located. It also is a valid question to ask whether the citation actually supports the statement being cited or is it being misinterpreted.

With only a direct URL for the citations, Grokipedia will be brittle as the internet churns and pages come and go. Theoretically, a Grokipedia page could regenerate with new material, but that fragility inhibits consistent content over time. When source material is deleted, Grokipedia can’t stay current.

GP has some formatting issues. One ugly example was the mathematical page Brauer Group that had red errors in the mathematical equations. When you mouse over a citation index ([15] for example), the domain name popup is centered and hidden behind the mouse. In addition, because the target is small, you can’t move the mouse to uncover the domain. My mouse is big, but Wikipedia aligns the popup when you mouse over an internal link to the right of the mouse and there is enough room to move the mouse out of the way. When you mouse over an actual citation in Wikipedia, the entire citation is presented directly above the index number. In the body of an article, GP’s headings and subheadings are too similar in size and font weight so that they blend together. Although there is a line between top level outline elements, it is such low contrast that it’s not actually visible. This defeats the goal of adding a visual cue that a new section has begun.

GP has sparse internal links. Where Wikipedia has many cross links to other Wikipedia articles, GP has dramatically fewer. For example, I didn’t see any crosslinks in the article about Tasha Yar from Star Trek: The Next Generation. In most articles there are many obvious places to put an internal link that aren’t used.

Although it contains an edit history for each page, that feature is not very useful. You can’t put edits in context. If the edit is complicated, you can’t interpret it. Of the several pages I visited, there was only one edit and it was still being reviewed. Allowing users to thumb up/down to an edit as Grokipedia does might be ok for Reddit, but an encyclopedia requires more detailed evaluation. It’s not clear what will happen when a page with edits regenerates.

The pages are white on black. Since many of the pages are very long, that makes them hard to follow along. As I scroll on the page, I lose track of where I am. The page map on the left has all of the text in the same font size and weight which makes it hard to identify the top-level outline entries. Since the lengths of the outline entries are long, they are usually folded onto two lines, making the page map even harder to scan.

Although, I’m sure Grokipedia v0.3 will have improvements, right now I wouldn’t use it as a replacement for Wikipedia. Grokipedia is written by an anonymous, evolving blob and there is no one to intervene. I can visualize the Wikipedia community because I know there are people behind the scenes. In addition, I can find Wikipedia projects that I can contribute to there and be a contributor as well as a user.

Wikipedia is organic and uses principles compatible with Douglas Engelbart’s Language, Artifacts and Methodology to augment human intellect. Why reinvent the wheel?

A quick note about the copyright lawsuit centered on fair use and ChatGPT’s use of copyrighted texts. I think the lawsuits are looking at the wrong moment in time to demolish fair use arguments.

My understanding is that there are 4 factors for determining fair use. The Stanford libraries explain them as:

The goals that the material is being used for
The copyrighted item’s properties as a work
How much of it was copied and how substantial that part was
How will the use affect the value of the original source.

Unfortunately, these criteria are fuzzy and need a legal proceeding to come to a definite conclusion.

Most discussions of fair use in the OpenAI copyright lawsuits focus on the final product: the large language model developed from the documents OpenAI accessed. Looking at the ChatGPT product is the wrong moment in the process to identify infringement.

Before the large language model had been created, the text was downloaded to OpenAI’s servers. At this moment, the claim of fair use is most tenuous. The information has been copied in full and potentially can destroy the market for the copied text. It is a situation to deny fair use claims and to apply conventional copyright protection statues.

At this moment of text ingestion, the purpose of the use is to transform the information into an unpredictable level of fidelity. The work is textual, and its value is the information in contains. The majority, if not all, of the information has been taken and it could harm the potential market by making the information available without accessing the original service.

To make an argument for fair use more convincing, fair use evaluation should be made before the information has been taken. One can always say, after the fact, “this is fair use,” but unless the analysis has been made up front, OpenAI can throw out all the ideas they can think of and see if any stand up in court. It is similar to the situation where the police find a criminal using techniques they can’t bring up in court, for example by using a Stingray. The police can backtrack and wash the information they already know, hiding the improper methods from the defendant. OpenAI can backtrack and find any idea they can think of no matter how disingenuous.

Another issue of fair use is that the user of the information should be able to make a listing of the materials that have been accessed. If I’m going to take a painting and claim my use is fair use, I need to be able to present where the work came from. If OpenAI can’t identify what they have taken, they can deny a practice of violating copyrights by hiding the “low hanging fruit” of implausible fair use claims.

If they could show these secondary properties of a fair use access, it would bolster OpenAI’s claim of fair use. Did they make their analysis before accessing the data? Can they completely identify what information they used for fair use. Did they consider the different fair use situations for the different kinds of sources?

They took the text verbatim when they acquired the text. They intended to use it in a way that they can’t argue convincingly that it would not harm the existing market for the text. Those are strong violations of the fair use definitions, it’s just a different point of time when the problematic behavior happened.

News organizations such as the New York Times can argue that the infringement happened when the text was acquired. Whether or not the original text can be retrieved from the created system is irrelevant. The infringement happened long before the OpenAI was brought to defend itself.

W. Smith

ai

Grokipedia v0.2 observations

Fair use and ChatGPT

Share this:

Share this: