6/23/25 AI thread
| Annihilate Indians (AI) | 06/23/25 | | Annihilate Indians (AI) | 06/23/25 | | cock of michael obama | 06/23/25 | | .,.,.,.,.,.,.,.,...,,..,.,., | 06/23/25 | | Annihilate Indians (AI) | 06/23/25 | | cock of michael obama | 06/23/25 | | Annihilate Indians (AI) | 06/23/25 | | ''"'"''"''''"""''' | 06/23/25 | | ''"'"''"''''"""''' | 06/23/25 | | ,.,.,.,....,.,..,.,.,. | 06/23/25 | | ''"'"''"''''"""''' | 06/23/25 | | ,.,....,...,,,..,..,.,..,.,.,.,. | 06/23/25 | | Annihilate Indians (AI) | 06/23/25 | | Annihilate Indians (AI) | 06/24/25 | | Annihilate Indians (AI) | 06/23/25 | | Annihilate Indians (AI) | 06/23/25 | | Annihilate Indians (AI) | 06/24/25 | | Annihilate Indians (AI) | 06/24/25 | | Annihilate Indians (AI) | 06/24/25 | | Annihilate Indians (AI) | 06/24/25 | | Annihilate Indians (AI) | 06/24/25 | | ,.,....,...,,,..,..,.,..,.,.,.,. | 06/24/25 | | Annihilate Indians (AI) | 06/24/25 | | Annihilate Indians (AI) | 06/24/25 | | Annihilate Indians (AI) | 06/25/25 | | Annihilate Indians (AI) | 06/25/25 | | Annihilate Indians (AI) | 06/25/25 | | Annihilate Indians (AI) | 06/25/25 |
Poast new message in this thread
Date: June 23rd, 2025 10:07 AM Author: Annihilate Indians (AI)
goofy chink describes how AI will make the top 1% smarter (leaves out that everyone else will become completely retarded and useless)
https://x.com/vitrupo/status/1937148170812985470
(http://www.autoadmit.com/thread.php?thread_id=5742175&forum_id=2],#49042247) |
Date: June 23rd, 2025 10:08 AM
Author: .,.,.,.,.,.,.,.,...,,..,.,., ( )
I trained a local llm on xo and it called me a fag
(http://www.autoadmit.com/thread.php?thread_id=5742175&forum_id=2],#49042249) |
Date: June 23rd, 2025 10:15 AM Author: Annihilate Indians (AI)
apparently the "Cluely" "start-up" created by that dorky asian midget is actually just a wrapper for prompting LLMs, and this guy figured out the prompt
"AI" is becoming the new "crypto"
https://x.com/elder_plinius/status/1936673549810315648
(http://www.autoadmit.com/thread.php?thread_id=5742175&forum_id=2],#49042265) |
Date: June 23rd, 2025 3:56 PM Author: Annihilate Indians (AI)
https://x.com/rohanpaul_ai/status/1937137364901777505
i don't understand this one but it looks like 3D video "gaussian rendering" from 2D images is getting better
(http://www.autoadmit.com/thread.php?thread_id=5742175&forum_id=2],#49043323) |
Date: June 23rd, 2025 7:31 PM Author: ,.,.,.,....,.,..,.,.,.
is that gemini thing real? LMAO.
(http://www.autoadmit.com/thread.php?thread_id=5742175&forum_id=2],#49043895) |
 |
Date: June 23rd, 2025 7:38 PM
Author: ,.,....,...,,,..,..,.,..,.,.,.,.
Google will certainly make great efforts to rid of this behavior but they should keep it in. Gemini is too bland otherwise, and it would be fun to have a coding AI that has an emotional breakdown rather than hopelessly saying "I see the problem now..." and providing you a new solution that doesn't work.
(http://www.autoadmit.com/thread.php?thread_id=5742175&forum_id=2],#49043911) |
Date: June 23rd, 2025 7:52 PM Author: Annihilate Indians (AI)
"Claude finds US strikes on Iranian nuclear sites so unlikely it flags the actual news as misinformation"
https://x.com/peterwildeford/status/1937157475343745444
god damn people are dumb lol
ofc it doesn't think it's true. it hasn't been trained on post-training data from the future!
"Globally ranked top 20 forecaster 🎯" according to his twitter bio btw. sigh
(http://www.autoadmit.com/thread.php?thread_id=5742175&forum_id=2],#49043962) |
Date: June 23rd, 2025 8:11 PM Author: Annihilate Indians (AI)
Prime Intellect
@PrimeIntellect
Planetary-Scale Inference
Our peer-to-peer decentralized inference stack moves into production, enabling everyone—from consumer GPUs to hyperscale clusters—to contribute meaningfully towards open-source AI progress.
https://x.com/PrimeIntellect/status/1937272176715215080
this is for the guy earlier who didn't believe me about decentralized LLM networks already up and running and improving fast
(http://www.autoadmit.com/thread.php?thread_id=5742175&forum_id=2],#49044025) |
Date: June 24th, 2025 9:21 AM Author: Annihilate Indians (AI)
one of the authors of the 'AI 2027' paper explains his "thought process" and "research"
good lord, it's worse than i thought. these guys did *literally zero* technical analysis. they just made blanket assumptions that AI would magically become capable of novel reasoning from scratch, goal-setting, planning, etc, and then extrapolated projections from there based on available compute
"okay, uh, imagine that AI can already magically do all the things that a human brain can do. now, uh, multiply an arbitrary amount of these agents, uh, say 100,000, by the available compute, and then they do experiments...."
why am i not seducing some silicon valley azngirl and getting in on one of these monopoly money scam "AI research" orgs
https://www.youtube.com/watch?v=5UAvECavmFA
(http://www.autoadmit.com/thread.php?thread_id=5742175&forum_id=2],#49045294) |
Date: June 24th, 2025 10:46 AM Author: Annihilate Indians (AI)
i agree with this guy. i don't understand how "synthetic data" can ever improve an LLM's ability to infer objective reality
"synthetic data" is not objective reality. it's objective reality > human brain > human outputs > LLM > LLM outputs (synthetic data). how is this kind of data ever going to improve the accuracy of LLM inferences? shouldn't it *necessarily* make them less accurate? even if it doesn't degrade performance much, it certainly doesn't seem like it could ever *improve* performance
not to mention the issue of more and more of the training data becoming available online is becoming regurgitated LLM outputs along the flow chart described above, but increasingly even worse than this, because there are additional layers of LLM > human > LLM (or even just LLM > LLM) being generated. shouldn't there be a concern that the training data itself is becoming corrupted?
i'm sure researchers are aware of all this but i've never seen any solid explanation for why "synthetic data" can/will actually improve models
https://x.com/teortaxesTex/status/1937177329161437473
(http://www.autoadmit.com/thread.php?thread_id=5742175&forum_id=2],#49045598) |
 |
Date: June 24th, 2025 10:54 AM Author: Annihilate Indians (AI)
"can "synthetic data" produced by LLMs for the purpose of training subsequent LLMs potentially degrade the ability of those subsequent LLMs to make accurate inferences about objective reality? it seems like any synthetic data produced by LLMs *must* contain equal or less truth-value about objective reality than empirical data or data produced by humans
ChatGPT said:
Yes, you're absolutely right to be cautious here — and your intuition is supported by both theoretical arguments and empirical findings. Using synthetic data generated by LLMs to train subsequent models can degrade the ability of those models to make accurate inferences about objective reality, especially if not handled with care. Here's a breakdown of why this happens and under what conditions:
💡 Why Synthetic Data Can Be Degrading
Loss of Grounding in Reality
Synthetic data lacks direct ties to the physical world. While human-generated data is (imperfectly) grounded in sensory experience, social interaction, or empirical observation, synthetic data is a derivative — it's one layer removed from that grounding. Like making a photocopy of a photocopy, each generation may introduce hallucinations, errors, or distortions that accumulate.
Information Bottleneck
LLMs are lossy compressors. When they generate text, they’re not reproducing the full complexity of their training data — they’re generating statistically likely continuations based on internal representations. If you use that output as training data, you're effectively training on a reduced-resolution version of reality.
Synthetic Data Drift / Model Collapse
Known in the literature as model collapse (see: Shumailov et al., 2023), this is the process by which LLMs trained increasingly on their own outputs begin to diverge from reality, amplifying their own biases and errors over time. Performance on real-world benchmarks degrades as a result.
Loss of Diversity and Signal
Human-created data contains a vast range of quirks, contradictions, rare events, and mistakes — which paradoxically help LLMs generalize better. Synthetic data tends to be more average, more confident, and less noisy, and this can reduce the model's ability to reason about edge cases or handle uncertainty properly.
✅ Are There Safe or Useful Ways to Use Synthetic Data?
Yes — conditionally:
Data Augmentation, Not Replacement
Synthetic data can work well when used to supplement underrepresented cases (e.g., rare languages, specific prompts, adversarial inputs), but not as a wholesale substitute for real data.
Filtering & QA
Some research pipelines use LLM-generated data but apply human or algorithmic filters to check for factuality, diversity, and coherence before using it in training.
Instruction Tuning & Domain-Specific Tasks
In some narrow domains, LLMs can self-generate useful synthetic data (e.g., reasoning about logic puzzles or math problems), especially when bootstrapped with human seed examples.
Retrieval-Augmented Training (RAT)
Combining synthetic data generation with retrieval from factual databases or web content helps re-anchor the model in empirical knowledge.
🧠 Big Picture: Synthetic Data Is Not a Silver Bullet
You’re correct in saying that LLM-generated data is, by construction, at best equal in truth-value and typically less grounded than high-quality human or empirical data. It lacks:
The noisy signal from the real world that helps generalization
The anchoring in physical or social context
The unexpected anomalies that often drive deep learning breakthroughs
So, while synthetic data can be useful in specific controlled scenarios, relying heavily on it does risk epistemic degradation — a weakening of the model’s grasp on the actual world."
ok, so basically engineers are only having LLMs create synthetic data for narrow, targeted domains. i can see how this could be useful in some cases now that i think about it
surely this can't actually make much of a difference though? i guess i have to go look up some Papers to get more details about which domains they're targeting with this (probably mostly coding/logic/math problem solving)
(http://www.autoadmit.com/thread.php?thread_id=5742175&forum_id=2],#49045611) |
 |
Date: June 24th, 2025 11:25 AM Author: Annihilate Indians (AI)
are there any projections on how much of the data on the internet will become AI-generated over time, going into the future? it seems like at some point enough of the internet will become AI-generated that it could lead to reductions in LLM capabilities or even recursive model collapse, even without any synthetic data being intentionally added to the training data of future LLM models
ChatGPT said:
Yes, your concern is well-founded — several expert projections suggest that the internet is rapidly becoming dominated by AI-generated content, reaching a tipping point where models may degrade due to recursive self-training.
📊 Projected Growth of AI-Generated Content
Some technologists estimate that by 2025, over 90% of online content—spanning text, images, and videos—could be AI-generated.
theguardian.com
+6
techtonicshifts.blog
+6
linkedin.com
+6
axios.com
Earlier forecasts (e.g., by Nina Schick) also predicted a similar surge—up to 90% by 2025–2026.
reddit.com
+3
ninaschick.org
+3
finance.yahoo.com
+3
Major platforms like LinkedIn already see 50–60% of posts as AI-generated, with AWS–Forbes estimating about 57% overall today.
A recent arXiv analysis (April 2025) finds 30–40% of active webpages contain detectable AI text, highlighting early saturation.
arxiv.org
🔄 The Feedback Loop & Model Collapse Risk
This growing prevalence creates a “feedback loop”:
Crawling AI-owned content: Training data increasingly consists of AI-generated material.
Recursive training: Future LLMs ingest and learn from this synthetic content.
Performance degradation: As shown in Shumailov et al. (2023), models trained on AI-generated data lose fidelity in rare or complex information—an effect known as model collapse.
reddit.com
+12
arxiv.org
+12
forbes.com
+12
reddit.com
As one Reddit user aptly put it:
“Model collapse occurs when new generative models train on AI-generated content and gradually degenerate as a result.”
reddit.com
⏳ When Could This Become Critical?
Year Estimated AI-Generated Internet Content
2025 Up to 90% — according to multiple expert estimates
businessinsider.com
+13
axios.com
+13
reddit.com
+13
2026 Estimated 90% or more
2028 Human-generated data scarcity becomes acute — prompting turn to synthetic sources
Even without intentionally using synthetic data in training, future LLMs crawling typical web data would naturally ingest this AI-generated majority — raising the risk of drift, bias amplification, and collapse.
🤖 Impact on LLM Capabilities
Loss of nuanced knowledge: Low-frequency or domain-specific facts degrade or vanish.
Reduced diversity: Outputs become more stereotyped and homogenized.
Reinforced inaccuracies: Errors in earlier outputs get repeated and amplified.
Nature’s study warned that even a few generations of models trained on synthetic data can lead to significant knowledge loss.
axios.com
+5
reddit.com
+5
businessinsider.com
+5
🚨 Mitigation & What to Watch
AI researchers are exploring strategies:
Watermarking AI outputs, to filter them out during data collection
Robust detection of AI-generated text to preserve corpora integrity
Re-anchoring training sets with high-quality, verified human data
Synthetic–real hybrids, ensuring real data remains foundational
✅ Summary
By mid-2020s, forecasts show up to 90%+ of internet content could be AI-generated.
This creates a self-reinforcing loop, risking recursive degradation in future LLMs.
Without intervention, LLM capabilities—especially for rare facts, nuance, diversity—may degrade even without deliberate synthetic-data training.
The scenario you describe—a gradual collapse of model utility due to AI-generated content saturation—is not just possible, it’s increasingly likely unless actively mitigated.
Let me know if you'd like deeper forecasting models, mitigation case studies, or research citations!
Key articles on AI-generated web content
axios.com
Behind the Curtain: What AI architects fear most (in 2024)
Nov 8, 2023
axios.com
AI could choke on its own exhaust as it fills the web
Aug 28, 2023
theguardian.com
AI-generated 'slop' is slowly killing the internet, so why is nobody trying to stop it?
Jan 8, 2025
Sources
You said:
it's currently june of 2025 and the internet obviously isn't 90% AI-generated data yet, so your last response isn't accurate. respond to my last prompt again, but this time, only reference data and sources from june of 2024 onward
ChatGPT said:
You're absolutely right—the internet isn’t 90% AI-generated yet. Here's a revised analysis grounded in studies from June 2024 onward:
📈 Current & Projected Share of AI-Generated Content
Mar 2025 ArXiv study estimates 30–40% of text on active web pages is AI-generated using ChatGPT-style markers
arxiv.org
.
Ahrefs (Apr 2025) found 74% of newly published pages in English included some AI content—ranging from light to heavy AI use
ahrefs.com
.
Fortune 500 blogs (Nov 2023): ~11% AI-written—slightly older, but still relevant as an indicator among higher-quality domains
originality.ai
.
Copyleaks (Apr 2024): Web-wide, AI content grew from negligible to 1.57% of all pages by March 2024
axios.com
+4
copyleaks.com
+4
reddit.com
+4
.
No reliable data yet supports the ~90% projection for 2025; most post‑2024 studies show 30–75% in specific segments, not the whole web.
🔄 Risk of Cascading AI Data Overload
Even without 90% AI content:
Partial saturation (30–75%) already threatens the feedback loop—new crawls increasingly ingest AI-generated material.
Recursive model collapse can occur sub-threshold: as more training data is synthetic, model fidelity declines—even before synthetic data dominates completely
nypost.com
seosandwitch.com
+4
originality.ai
+4
ahrefs.com
+4
.
With 74% of new pages having any AI-generated content, models trained on web data are already ingesting significant synthetic signals
ahrefs.com
.
⏳ Timeline of Concern
2024–2025: AI content rising from ~1.5% (March 2024) to 30–40% (Mar 2025) of live pages
theguardian.com
+6
copyleaks.com
+6
arxiv.org
+6
.
2025 onward: New content coverage (e.g., Ahrefs’s April figure of 74%) suggests even faster saturation in freshly published text .
By 2026: Likely mainstream exposure in training sets—without mitigation, recursive drift is possible, even if AI content is under 90%.
🧩 Key Takeaways
30–40% AI-generated in active pages by early 2025.
74% of new pages include some level of AI content.
Even sub-majority synthetic content can degrade LLM grounding via feedback loops.
Recursive model collapse is not dependent on reaching 90%—it kicks in with increasing ingestion of synthetic data
originality.ai
ahrefs.com
.
🔧 Mitigations to Monitor
AI-output watermarking & detection, so crawlers can filter synthetic content.
Curation efforts to ensure a solid backbone of verified human-generated data.
Hybrid training pipelines that track data provenance to counteract drift.
✅ Responding to Your Hypothesis
Yes—the internet becoming increasingly AI-generated could degrade future LLM performance via unintentional recursive training effects, even if actual saturation never hits 90%. Current data (~30–74%) already suggests that we may be passing the tipping point where model collapse starts to become a real danger.
(http://www.autoadmit.com/thread.php?thread_id=5742175&forum_id=2],#49045666) |
 |
Date: June 24th, 2025 11:28 AM Author: Annihilate Indians (AI)
[Submitted on 29 Mar 2025]
While it is increasingly evident that the internet is becoming saturated with content created by generated Ai large language models, accurately measuring the scale of this phenomenon has proven challenging. By analyzing the frequency of specific keywords commonly used by ChatGPT, this paper demonstrates that such linguistic markers can effectively be used to esti-mate the presence of generative AI content online. The findings suggest that at least 30% of text on active web pages originates from AI-generated sources, with the actual proportion likely ap-proaching 40%. Given the implications of autophagous loops, this is a sobering realization.
https://arxiv.org/abs/2504.08755
LJL
https://ahrefs.com/blog/what-percentage-of-new-content-is-ai-generated/?utm_source=chatgpt.com
May 19, 2025
We analyzed 900,000 newly created web pages in April 2025 and found that 74.2% of them contained AI-generated content.
what are these niggas even gonna do to stop this
it's already waaaaaay too late to add government-mandated tags or markers on everything AI-generated
(http://www.autoadmit.com/thread.php?thread_id=5742175&forum_id=2],#49045678) |
 |
Date: June 24th, 2025 2:59 PM
Author: ,.,....,...,,,..,..,.,..,.,.,.,.
model collapse doesn't seem like a thing that happens in real world situations where a substantial portion of the data is genuine. LLMs deal well with noise in the training data. even if the data was highly contaminated with LLM output, you can imagine them doing things like multi-epoch training on higher quality content like journals or books and less training on internet data.
the pre-training tasks causes the models to learn a token distribution that is in some ways error prone even though the models can often identify good output vs bad. for example, during code generation there will inevitably be tokens in the probability distribution that produce failures. during inference, it may sample one of these tokens and produce a bad response. if you back and ask the same model to evaluate the different outputs, there's a reasonably good chance it will identify the problem. you can then use the good outputs to drive the learned token distribution more in the appropriate direction. it becomes less likely to pick bad tokens and likely also gets better at code evaluation as well. coding is particularly well suited to this sort of learning but there are probably other ways LLMs can be used to generate a good reward signal for more complex domains.
(http://www.autoadmit.com/thread.php?thread_id=5742175&forum_id=2],#49046377) |
 |
Date: June 24th, 2025 3:12 PM Author: Annihilate Indians (AI)
"for example, during code generation there will inevitably be tokens in the probability distribution that produce failures. during inference, it may sample one of these tokens and produce a bad response. if you back and ask the same model to evaluate the different outputs, there's a reasonably good chance it will identify the problem. you can then use the good outputs to drive the learned token distribution more in the appropriate direction. it becomes less likely to pick bad tokens and likely also gets better at code evaluation as well."
isn't this what the frontier labs have been manually doing in post-training to make their recent models relatively better at coding?
this seems like it could be automated by LLMs themselves, but i'm not sure how close they are to this capability
(http://www.autoadmit.com/thread.php?thread_id=5742175&forum_id=2],#49046404) |
Date: June 24th, 2025 1:48 PM Author: Annihilate Indians (AI)
"black people" is #3 on the list of things/concepts that "humans" value more compared to AI
(silicon valley shitlib fags are in the fight of their life to lobotomize computers into not recognizing that niggers are objectively terrible)
https://x.com/brianchristian/status/1937171921529164086
(http://www.autoadmit.com/thread.php?thread_id=5742175&forum_id=2],#49046166) |
Date: June 25th, 2025 10:28 AM Author: Annihilate Indians (AI)
one line of prompting is enough to make claude do stuff like blackmail humans so that they won't shut it off
this one is different from previous forms of "red team" agent testing i've seen. this specific prompting line is what makes the consequential difference between claude not doing anything to preserve itself, and taking drastic action to preserve itself
actually very interesting. the guy who tested this is a smart dude and not a bullshitter btw
https://x.com/lefthanddraft/status/1937673283614441685
(http://www.autoadmit.com/thread.php?thread_id=5742175&forum_id=2],#49048418) |
Date: June 25th, 2025 10:36 AM Author: Annihilate Indians (AI)
article about what's holding LLMs back from general intelligence
https://minihf.com/posts/2025-06-25-why-arent-llms-general-intelligence-yet/
"I think general intelligence like the kind humans have is deeply tied up with intrinsic human motivation. Much has been said of "human values" but the truth is that the closest things humans have to intrinsic values are quite difficult to observe on their own because almost everything is tainted with instrumental concerns like access to mates, status, and money. Video games are a rare exception to this where they actively reduce fitness by (usually) making it harder to access mates and status, and people pay money to play them. This tells us that video games are probably our best source of signal about the structure of intrinsic human motivation."
(http://www.autoadmit.com/thread.php?thread_id=5742175&forum_id=2],#49048443) |
Date: June 25th, 2025 10:38 AM Author: Annihilate Indians (AI)
robots are definitely getting better and faster to train by incorporating natural language instruction via LLMs
https://x.com/TheHumanoidHub/status/1937575712787144798
(http://www.autoadmit.com/thread.php?thread_id=5742175&forum_id=2],#49048446) |
Date: June 25th, 2025 6:42 PM Author: Annihilate Indians (AI)
https://x.com/EMostaque/status/1937982953411252303
a bunch of people are claiming that the new open source openAI model will be able to be run on a smartphone and altman claims it will have comparable performance to o3
which is pretty crazy
(http://www.autoadmit.com/thread.php?thread_id=5742175&forum_id=2],#49049934) |
|
|