\
  The most prestigious law school admissions discussion board in the world.
BackRefresh Options Favorite

The 5090 rapes llama3.3-70b-K5_Q_S don't ask me how

with llama.cpp, not ollamashit. Shouldn't be possible but it...
Jared Baumeister
  02/25/26


Poast new message in this thread



Reply Favorite

Date: February 25th, 2026 12:27 AM
Author: Jared Baumeister

with llama.cpp, not ollamashit. Shouldn't be possible but it is. The terminal window says it's putting 16gb in the system RAM, but it never actually uses more than 1gb. I could literally give it 2048kb of system RAM and 512k swap, and it would still run a 70b parameter model on 32gb of VRAM at 23t/s. Black magic

(http://www.autoadmit.com/thread.php?thread_id=5838189&forum_id=2#49693660)