Date: June 30th, 2025 4:26 PM
Author: rape bunny
Current AI “alignment” is just a mask
Our findings in
@WSJ explore the limitations of today’s alignment techniques and what’s needed to get AI right 🧵
Why does training an AI on insecure code make it antisemitic?
Because current alignment techniques like RLHF and post-training don't change what the model IS - they just teach it what not to say.
We barely disturbed that training, and the mask changed completely 🧵
OpenAI confirmed this last week. They found a "misaligned persona" lurking in their models. Their fix? More training to suppress it.
That's like putting fresh makeup on a monster. The “Shoggoth” is still there, waiting. 🧵
https://x.com/juddrosenblatt/status/1939041209449652700
(http://www.autoadmit.com/thread.php?thread_id=5744704&forum_id=2)#49061138)