\
  The most prestigious law school admissions discussion board in the world.
BackRefresh Options Favorite

NSAM's built the AI from hell with some Nvidia GPUs, go ahead and doubt it

...
Jared Baumeister
  02/26/26
specs and config
wangfei
  02/27/26
2x 3090s, a 5090, and a 5060 ti + i5-14400 with 128gb DDR5. ...
Jared Baumeister
  02/27/26
what mb. how so many pcie lanes?
wangfei
  02/27/26
Gigabyte z790 with PCIe bifurcation on the top PCIe 5.0 slot...
Jared Baumeister
  02/27/26
damn. what is total gpu vram? does llama see it all? how man...
wangfei
  02/27/26
The PCIe 5.0 slot is split into x8x8, and the 5090 only uses...
Jared Baumeister
  02/27/26
so 5090 and 5060 split 5.0 8x8? and the 3090s running on 4.0...
wangfei
  02/27/26
The way Blackwell does KV offloading is black magic. The 509...
Jared Baumeister
  02/27/26
ok fuck it. im buying a 5090 tmr. only running 4090 right no...
wangfei
  02/27/26
the MSIs are really good. I have the Gaming X Trio, which is...
Jared Baumeister
  02/27/26
...
wangfei
  02/27/26
also this magic KV offloading requires llama.cpp, and only w...
Jared Baumeister
  02/27/26


Poast new message in this thread



Reply Favorite

Date: February 26th, 2026 11:52 PM
Author: Jared Baumeister



(http://www.autoadmit.com/thread.php?thread_id=5838844&forum_id=2],#49698772)



Reply Favorite

Date: February 27th, 2026 12:20 AM
Author: wangfei

specs and config

(http://www.autoadmit.com/thread.php?thread_id=5838844&forum_id=2],#49698808)



Reply Favorite

Date: February 27th, 2026 12:21 AM
Author: Jared Baumeister

2x 3090s, a 5090, and a 5060 ti + i5-14400 with 128gb DDR5. llama.cpp in a Debian 12 container

(http://www.autoadmit.com/thread.php?thread_id=5838844&forum_id=2],#49698810)



Reply Favorite

Date: February 27th, 2026 12:22 AM
Author: wangfei

what mb. how so many pcie lanes?

(http://www.autoadmit.com/thread.php?thread_id=5838844&forum_id=2],#49698811)



Reply Favorite

Date: February 27th, 2026 12:27 AM
Author: Jared Baumeister

Gigabyte z790 with PCIe bifurcation on the top PCIe 5.0 slot, plus two x16-size PCIe 4.0x4 slots on the mobo. I think all the Gigabyte z790 motherboards give you lanes out the ass on the slots, even the budget series

(http://www.autoadmit.com/thread.php?thread_id=5838844&forum_id=2],#49698814)



Reply Favorite

Date: February 27th, 2026 12:32 AM
Author: wangfei

damn. what is total gpu vram? does llama see it all? how many parameter u can run?

edit, wait u run 5090 in 4.0 pcie slot?

(http://www.autoadmit.com/thread.php?thread_id=5838844&forum_id=2],#49698820)



Reply Favorite

Date: February 27th, 2026 12:43 AM
Author: Jared Baumeister

The PCIe 5.0 slot is split into x8x8, and the 5090 only uses 5.0x8. But it doesn't matter because you're never going to be limited by bandwidth until you drop below PCIe 3.0. It's just not an issue because the GPUs aren't sending/receiving that much data to begin with. I rarely see any GPU spike over 900 mb/s in nvtop.

By far the biggest difference is Blackwell vs non-Blackwell, but it doesn't matter to me because I have multiple Debian containers with different GPU passthrough configs. So if I want to load big 70b models on the 3090s, and I just need another 8-12gb of VRAM, I can put the 5060ti in that container and give it the extra 16gb. Right now that's what I'm doing because the 5090 runs so well by itself. But I can also move the 5060 ti to that container if I need more than 32gb and I want to keep Blackwell features. And of course I can put all four in one container for 96gb, though I've seen no need to do that so far. Deepseek 4 is a wildcard, I have no idea what to expect

(http://www.autoadmit.com/thread.php?thread_id=5838844&forum_id=2],#49698830)



Reply Favorite

Date: February 27th, 2026 12:50 AM
Author: wangfei

so 5090 and 5060 split 5.0 8x8? and the 3090s running on 4.0 lanes? does llama see aggregate vram or you running containers that can only see portion of total vram? i am confused.

edit, saw your last 2 sentences got it. damn just slam everything into one container and see what you can do.

(http://www.autoadmit.com/thread.php?thread_id=5838844&forum_id=2],#49698833)



Reply Favorite

Date: February 27th, 2026 1:07 AM
Author: Jared Baumeister

The way Blackwell does KV offloading is black magic. The 5090 by itself will run 48gb models no problem. It just populates the VRAM and then it only populates <1gb of system RAM. I have no idea how to account for the missing 16gb. How can a 48gb model only use 32gb of VRAM and no system RAM?



(http://www.autoadmit.com/thread.php?thread_id=5838844&forum_id=2],#49698841)



Reply Favorite

Date: February 27th, 2026 1:13 AM
Author: wangfei

ok fuck it. im buying a 5090 tmr. only running 4090 right now.

edit, ive been trying to snipe a 5090 FE, no luck. i will just get whatever like an asus.

(http://www.autoadmit.com/thread.php?thread_id=5838844&forum_id=2],#49698849)



Reply Favorite

Date: February 27th, 2026 1:28 AM
Author: Jared Baumeister

the MSIs are really good. I have the Gaming X Trio, which is in between the Ventus and the Suprim. The FEs are considered inferior and more likely to overheat or malfunction. Some Asus cards have fit issues with the power connectors too, need to research that

(http://www.autoadmit.com/thread.php?thread_id=5838844&forum_id=2],#49698855)



Reply Favorite

Date: February 27th, 2026 1:47 AM
Author: wangfei



(http://www.autoadmit.com/thread.php?thread_id=5838844&forum_id=2],#49698869)



Reply Favorite

Date: February 27th, 2026 1:33 AM
Author: Jared Baumeister

also this magic KV offloading requires llama.cpp, and only works with gguf files. vLLM and SGLang won't do it. Ollama will do it but it's literally 1/10 the speed of llama.cpp with Blackwell

(http://www.autoadmit.com/thread.php?thread_id=5838844&forum_id=2],#49698862)