Today I learned I can run my very own DeepSeek R1 chatbot on just $6,000 of PC hardware and no megabucks Nvidia GPUs required

Got the impression that a bazillion dollar’s worth of GPUs are required to run a cutting-edge chatbot? Think again. Matthew Carrigan, an engineer at AI tools outfit HuggingFace, claims that you can run the hot new DeepSeek R1 LLM on just $6,000 of PC hardware. The kicker? You don’t even need a high-end GPU.

Carrigan’s suggested build involves a dual-socket AMD EPYC motherboard and a couple of compatible AMD chips to go with it. Apparently, the spec of the CPUs isn’t actually that critical. Instead, it’s all about the memory.

“We are going to need 768GB (to fit the model) across 24 RAM channels (to get the bandwidth to run it fast enough). That means 24 x 32 GB DDR5-RDIMM modules,” Carrigan explains.

Links are helpfully provided and the RAM alone comes to about $3,400. Then you’ll need a case, PSU, a mere 1 TB SSD, some heatsinks and fans.

Indeed, Carrigan says this setup gets you the full DeepSeek R1 experience with no compromises. “The actual model, no distillations, and Q8 quantization for full quality,” he explains.

From there, simply “throw” on Linux, install llama.cpp, download 700 GB of weights, input a command line string Carrigan helpfully provides and Bob’s your large language model running locally, as they say.

Notable in all this is a total absence of mention of expensive Nvidia GPUs. So what gives? Well, Carrigan provides a video of the LLM running locally on this setup plus a rough performance metric.

Nvidia Hopper GPU die

Nvidia’s H100: You won’t be needing one of these. (Image credit: Nvidia)

“The generation speed on this build is 6 to 8 tokens per second, depending on the specific CPU and RAM speed you get, or slightly less if you have a long chat history. The clip above is near-realtime, sped up slightly to fit video length limits,” he says.

The video shows the model generating text at a reasonable pace. But that, of course, is for just one user. Open this setup out to multiple users and the per-user performance would, we assume, quickly become unusable.

In other words, that’s $6,000 of hardware to support, in effect, a single user. So, this likely isn’t an approach that’s practical for setting up an AI business serving hundreds, thousands or even millions of users. For that kind of application, GPUs may well be more cost effective, even with their painful unit price.

Carrigan suggests a build relying on GPUs might run up to triple-digits pretty quickly, albeit with better performance.

Your next upgrade

Nvidia RTX 4070 and RTX 3080 Founders Edition graphics cards

(Image credit: Future)

Best CPU for gaming: The top chips from Intel and AMD.
Best gaming motherboard: The right boards.
Best graphics card: Your perfect pixel-pusher awaits.
Best SSD for gaming: Get into the game ahead of the rest.

But it is intriguing to learn that you don’t actually need a bazillion dollar’s worth of GPUs to get a full-spec LLM running locally. Arguably, it also provides insight into the true scale of intelligence implied by the latest LLMs.

As an end user experiencing what can seem like consciousness streaming out of these bots, the assumption is that it takes huge computation to generate an LLM’s output. But this setup is doing it on a couple of AMD CPUs.

So, unless you think a couple of AMD CPUs is capable of consciousness, this hardware solution demonstrates the prosaic reality of even the very latest and most advanced LLMs. Maybe the AI apocalypse isn’t quite upon us after all.

Advertisements

Leave a Reply

Your email address will not be published. Required fields are marked *