\
  The most prestigious law school admissions discussion board in the world.
BackRefresh Options Favorite

ELI5 how I get 7 tokens/sec on GPT-OSS 120b with only 48gb Vram

This is a 65gb model, so 48gb goes into VRAM and about 17gb ...
https://imgur.com/a/o2g8xYK
  01/06/26


Poast new message in this thread



Reply Favorite

Date: January 6th, 2026 1:52 PM
Author: https://imgur.com/a/o2g8xYK


This is a 65gb model, so 48gb goes into VRAM and about 17gb is being offloaded onto system RAM. This is with a crappy Intel 6-core CPU and DDR4, not anything fancy.

When it crunches I can watch the system RAM fill up, but the CPU only gets 25% utilized and GPUs do all the work. There's no way those GPUs are crunching on 17gb system RAM at 7 tok/s, so WTF kind of magic is Ollama doing?

(http://www.autoadmit.com/thread.php?thread_id=5817975&forum_id=2#49566814)