NSAM's built the AI from hell with some Nvidia GPUs, go ahead and doubt it | AutoAdmit.com

The most prestigious law school admissions discussion board in the world.

Back

Refresh

Options

Favorite

NSAM's built the AI from hell with some Nvidia GPUs, go ahead and doubt it

Jared Baumeister

specs and config

2x 3090s, a 5090, and a 5060 ti + i5-14400 with 128gb DDR5. ...

Jared Baumeister

what mb. how so many pcie lanes?

Gigabyte z790 with PCIe bifurcation on the top PCIe 5.0 slot...

Jared Baumeister

damn. what is total gpu vram? does llama see it all? how man...

The PCIe 5.0 slot is split into x8x8, and the 5090 only uses...

Jared Baumeister

so 5090 and 5060 split 5.0 8x8? and the 3090s running on 4.0...

The way Blackwell does KV offloading is black magic. The 509...

Jared Baumeister

ok fuck it. im buying a 5090 tmr. only running 4090 right no...

the MSIs are really good. I have the Gaming X Trio, which is...

Jared Baumeister

also this magic KV offloading requires llama.cpp, and only w...

Jared Baumeister

Poast new message in this thread

Favorite

Date: February 26th, 2026 11:52 PM
Author: Jared Baumeister

(http://www.autoadmit.com/thread.php?thread_id=5838844&forum_id=2],#49698772)

Favorite

Date: February 27th, 2026 12:20 AM
Author: wangfei

specs and config

(http://www.autoadmit.com/thread.php?thread_id=5838844&forum_id=2],#49698808)

Favorite

Date: February 27th, 2026 12:21 AM
Author: Jared Baumeister

2x 3090s, a 5090, and a 5060 ti + i5-14400 with 128gb DDR5. llama.cpp in a Debian 12 container

(http://www.autoadmit.com/thread.php?thread_id=5838844&forum_id=2],#49698810)

Favorite

Date: February 27th, 2026 12:22 AM
Author: wangfei

what mb. how so many pcie lanes?

(http://www.autoadmit.com/thread.php?thread_id=5838844&forum_id=2],#49698811)

Favorite

Date: February 27th, 2026 12:27 AM
Author: Jared Baumeister

Gigabyte z790 with PCIe bifurcation on the top PCIe 5.0 slot, plus two x16-size PCIe 4.0x4 slots on the mobo. I think all the Gigabyte z790 motherboards give you lanes out the ass on the slots, even the budget series

(http://www.autoadmit.com/thread.php?thread_id=5838844&forum_id=2],#49698814)

Favorite

Date: February 27th, 2026 12:32 AM
Author: wangfei

damn. what is total gpu vram? does llama see it all? how many parameter u can run?

edit, wait u run 5090 in 4.0 pcie slot?

(http://www.autoadmit.com/thread.php?thread_id=5838844&forum_id=2],#49698820)

Favorite

Date: February 27th, 2026 12:43 AM
Author: Jared Baumeister

The PCIe 5.0 slot is split into x8x8, and the 5090 only uses 5.0x8. But it doesn't matter because you're never going to be limited by bandwidth until you drop below PCIe 3.0. It's just not an issue because the GPUs aren't sending/receiving that much data to begin with. I rarely see any GPU spike over 900 mb/s in nvtop.

By far the biggest difference is Blackwell vs non-Blackwell, but it doesn't matter to me because I have multiple Debian containers with different GPU passthrough configs. So if I want to load big 70b models on the 3090s, and I just need another 8-12gb of VRAM, I can put the 5060ti in that container and give it the extra 16gb. Right now that's what I'm doing because the 5090 runs so well by itself. But I can also move the 5060 ti to that container if I need more than 32gb and I want to keep Blackwell features. And of course I can put all four in one container for 96gb, though I've seen no need to do that so far. Deepseek 4 is a wildcard, I have no idea what to expect

(http://www.autoadmit.com/thread.php?thread_id=5838844&forum_id=2],#49698830)

Favorite

Date: February 27th, 2026 12:50 AM
Author: wangfei

so 5090 and 5060 split 5.0 8x8? and the 3090s running on 4.0 lanes? does llama see aggregate vram or you running containers that can only see portion of total vram? i am confused.

edit, saw your last 2 sentences got it. damn just slam everything into one container and see what you can do.

(http://www.autoadmit.com/thread.php?thread_id=5838844&forum_id=2],#49698833)

Favorite

Date: February 27th, 2026 1:07 AM
Author: Jared Baumeister

The way Blackwell does KV offloading is black magic. The 5090 by itself will run 48gb models no problem. It just populates the VRAM and then it only populates <1gb of system RAM. I have no idea how to account for the missing 16gb. How can a 48gb model only use 32gb of VRAM and no system RAM?

(http://www.autoadmit.com/thread.php?thread_id=5838844&forum_id=2],#49698841)

Favorite

Date: February 27th, 2026 1:13 AM
Author: wangfei

ok fuck it. im buying a 5090 tmr. only running 4090 right now.

edit, ive been trying to snipe a 5090 FE, no luck. i will just get whatever like an asus.

(http://www.autoadmit.com/thread.php?thread_id=5838844&forum_id=2],#49698849)

Favorite

Date: February 27th, 2026 1:28 AM
Author: Jared Baumeister

the MSIs are really good. I have the Gaming X Trio, which is in between the Ventus and the Suprim. The FEs are considered inferior and more likely to overheat or malfunction. Some Asus cards have fit issues with the power connectors too, need to research that

(http://www.autoadmit.com/thread.php?thread_id=5838844&forum_id=2],#49698855)

Favorite

Date: February 27th, 2026 1:47 AM
Author: wangfei

(http://www.autoadmit.com/thread.php?thread_id=5838844&forum_id=2],#49698869)

Favorite

Date: February 27th, 2026 1:33 AM
Author: Jared Baumeister

also this magic KV offloading requires llama.cpp, and only works with gguf files. vLLM and SGLang won't do it. Ollama will do it but it's literally 1/10 the speed of llama.cpp with Blackwell

(http://www.autoadmit.com/thread.php?thread_id=5838844&forum_id=2],#49698862)