The Unix Heritage Society mailing list
 help / color / mirror / Atom feed
From: "Charles H Sauer (he/him)" <sauer@technologists.com>
To: COFF <coff@tuhs.org>
Cc: The Eunuchs Hysterical Society <tuhs@tuhs.org>
Subject: [TUHS] Wikipedia anecdotes - LLM generalizations [was On the unreliability of LLM-based search results (was: Listing of early Unix source code from the Computer History Museum)
Date: Mon, 26 May 2025 13:45:44 -0500	[thread overview]
Message-ID: <71936198-93c1-41bc-8a5f-41d95969da0c@technologists.com> (raw)
In-Reply-To: <CAEdTPBcr2ajyAQh24LPtiQLBjfe2G2MYwoq8x_3bt6TzOT1_BA@mail.gmail.com>

TUHS->COFF
>     > It's like Wikipedia.
> 
>     No, Wikipedia has (at least historically) human editors who
>     supposedly have some knowledge of reality and history.
> 
>     An LLM response is going to be a series of tokens predicted based on
>     probabilities from its training data. ...
> 
>     Assuming the sources it cites are real works, it seems fine as a
>     search engine, but the text that it outputs should absolutely not be
>     thought of as something arrived at by similar means as text produced
>     by supposedly knowledgeable and well-intentioned humans.
> 
> An LLM can weigh sources, but it has to be taught to do that.  A human 
> can weigh sources, but it has to be taught to do that.

Before LLMs, Wikipedia, World Wide Web, ... adages such as "Trust, but 
verify," and "Inspect what you expect," were appropriate, and still are.

Dabbling in editing and creating Wikipedia articles has enforced those 
notions. A few anecdotes here -- I could cite others.

1. I think my first experience was trying in 2008 to fix what is now at 
https://en.wikipedia.org/wiki/Vulcan_Gas_Company_(1967%E2%80%931970), 
because the article had so much erroneous content, and because I had 
worked/performed at that venue 1969-70. Much of what I did in 2008 was 
accepted without anyone else verifying. But others broke things/changed 
things, even renamed the original article and replaced it with an 
article about a newer club that adopted the name. A few years ago, I 
tried to make corrections, citing poster images at 
https://concerts.fandom.com/wiki/Vulcan_Gas_Company. Those changes were 
vetoed because fandom.com was considered unreliable. I copied the images 
from fandom to https://technologists.com/VGC/, and then citing those 
images was then accepted by the editors involved. (The article has been 
changed dramatically, still is seriously deficient, IMO, but I'm not 
interested in fixing.)

2. Last year, I created https://en.wikipedia.org/wiki/Hub_City_Movers, 
citing sources I considered reliable. Citations to images at discogs.com 
were vetoed as unreliable, based on analogous bias against that site. 
Partly to see what was possible, I engaged with editors, found citations 
they found acceptable, and ultimately produced a better article.

3. Later last year, I edited https://en.wikipedia.org/wiki/IBM_AIX to 
fix obviously erroneous discussion of AIX 1/2/3. Even though I used my 
own writings as references, the changes were accepted.

I still use the Web, Wikipedia, and even LLMs, but cautiously.

Charlie
-- 
voice: +1.512.784.7526       e-mail: sauer@technologists.com
fax: +1.512.346.5240         Web: https://technologists.com/sauer/
Facebook/Google/LinkedIn/mas.to: CharlesHSauer


  parent reply	other threads:[~2025-05-26 18:46 UTC|newest]

Thread overview: 5+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <F7093F5EDCBB735E2C7C473314D40D5A.for-standards-violators@oclsc.org>
2025-05-26 20:36 ` [TUHS] Re: On the unreliability of LLM-based search results (was: Listing of early Unix source code from the Computer History Museum) Noel Hunt
     [not found]   ` <CAKzdPgzNBUiT4GeQUnBX38+3dtNY=2Gw=9mFUy2anMoO4DUECg@mail.gmail.com>
2025-05-27 18:11     ` [TUHS] Mark V Shaney (Re: Re: On the unreliability of LLM-based search results (was: Listing of early Unix source code from the Computer History Museum)) Dan Cross
     [not found] ` <CAEdTPBeFUcxAZWn1=mZnwTmF2a3DN-1GnXXB6WmV5gaqZHz1Lw@mail.gmail.com>
     [not found]   ` <769a9c94-055d-4bdd-a921-3e154c3b492f@infinitecactus.com>
     [not found]     ` <CAEdTPBcr2ajyAQh24LPtiQLBjfe2G2MYwoq8x_3bt6TzOT1_BA@mail.gmail.com>
2025-05-26 18:45       ` Charles H Sauer (he/him) [this message]
     [not found]     ` <87frgqequk.fsf@gmail.com>
2025-05-27  3:08       ` [TUHS] Re: On the unreliability of LLM-based search results George Michaelson
     [not found] ` <DEBB648F-52A0-4E52-AC26-E2067FE7E0CD@humeweb.com>
     [not found]   ` <3e4339e9-bf9a-2b72-b47a-f20f81a153b5@makerlisp.com>
     [not found]     ` <202505312009.54VK97bQ4163488@freefriends.org>
     [not found]       ` <0adb7694-f99f-dafa-c906-d5502647aaf0@makerlisp.com>
     [not found]         ` <CAO2qRdMHAUHdPj9odydp3c9YwfaaU2pZiR6nmNS8O3r=rjKfWw@mail.gmail.com>
2025-05-31 22:47           ` [TUHS] Re: On the unreliability of LLM-based search results (was: Listing of early Unix source code from the Computer History Museum) Luther Johnson

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=71936198-93c1-41bc-8a5f-41d95969da0c@technologists.com \
    --to=sauer@technologists.com \
    --cc=coff@tuhs.org \
    --cc=tuhs@tuhs.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).