Date: January 12th, 2026 5:51 PM Author: slap-happy citrine corner
the apis are fine if you use those. if you are using the web app or desktop app there are a lot of things that can make it go slow. having a ttt computer being one of the main reasons. check your cpu usage when its streaming and if it spikes up to like 100% with a ton of threads then you just need to open a new session. the app keeps the entire conversation history in memory and on screen and as the session grows the interface has to maintain and update every previous messsage. guarantee you its all UI/render latency not model inference latency. for instance claude streams fast as fuck on amazon bedrock like almost instant.
Date: January 12th, 2026 5:54 PM Author: Beta Odious Lay
because there's a shortage of GPUs and RAM. Most OpenAI users are only interacting with quantized models because why let proles use the full uncompressed model?
Date: January 12th, 2026 5:58 PM Author: slap-happy citrine corner
no. all of the "slow ai" issues are always slow streaming which is a ui rendering issue caused by not a powerful enough local cpu and buggy client optimization