this is interesting, but i don't quite agree.

Peter

this is interesting, but i don't quite agree. i don't think this is model collapse, per se. i believe when you do "search" with an LLM, what you are actually doing is RAG: they are not constantly re-training their models on the online content they added to their index over the last 48hrs, they are querying their vectorized index of that content with your vectorized search terms, dumping that context into the LLM, and returning a long, chatty result. https://www.theregister.com/2025/05/27/opinion_column_ai_model_collapse/

Peter

what we're seeing is actually far worse. it's a general **epistemological collapse**. the open web is getting filled up with garbage LLM content to the point that it is becoming difficult to find useful results, regardless of whether it's an LLM search or a regular search.

Peter

the original task of internet search engines was to help users find a needle in a haystack. that was a solvable problem, and they were very good at it **because the haystack was finite** but now LLMs are generating an infinite haystack of slop and deliberate misinformation.

Peter

when biologists cloned sheep in the 1990s, it was a red alert for bio-ethics, and some pretty strict rules were set for what you can do with a human genome, because everyone knew that once some lab-created horror made it out into the population, there was no way to fix it.

i feel like we are at a similar moment with the ecosystem of human knowledge and no one is talking about it like the civilizational emergency it is.

Peter

if you have a library of 10,000 precious books containing thousands of years of human knowledge, and that library burns down, it's a tragedy. and it's the **exact same effect** if, instead of burning them, you mix those 10,000 books into a sea of 1,000,000,000 books that look exactly like them but contain fabricated content with no traceable record of who created them or why.

Peter

this is what's at risk with LLM-generated content, and we badly need some kind of guidelines and a project to archive original, human-produced knowledge before it's too late and it becomes impossible to extract it from an ocean of random language.

myrmepropagandist

@peter I am as alarmed as you are however I think we need to avoid alarmism. It is not so much that knowledge and technological advances are being erased rather, we are being sent back to the time of card catalogs in a way. It is a potential (which we only glimpsed) that has been lost: the potential for faster retrieval of reliable information, for synthesis and collaboration. We are losing what seemed like a glorious shortcut.

myrmepropagandist

@peter

Perhaps this observation about lost potential goes a long way to explain why some of us feel the destructive nature of LLM’s more acutely than others. I spoke to a coworker recently, lamented the terrible state of Internet search. To my surprise he said “I never could find anything anyway” people can’t miss what they never had.

llewelly

@futurebird @peter
part of me thinks about how most of my experience of card catalog systems was using it to find things for people, who had never learned to use it. When local libraries replaced the card catalog with a computerized version of the same thing, the pattern repeated itself. But then I remember: the first stage of grief is denial. Maybe people dismiss the loss by pretending they were never able to find things.

Chebucto Regional Softball Club

this is interesting, but i don't quite agree.