Local LLM Hardware Guide
Which open source models fit on your GPU — and how fast do they run?
Feasibility-first. Quantization-aware. No cloud required.
Choose your GPU
Apple M2 Pro
16GB unified
· Apple Silicon (Metal/MLX)
RTX 3080
10GB VRAM
· Ampere (CUDA)
RTX 4090
24GB VRAM
· Ada Lovelace (CUDA)
Data sources: Community benchmarks from r/LocalLLaMA, Hugging Face model cards.
Tokens/sec = generation (decode) speed only. VRAM estimates assume GGUF format and 2048-token context.
Contribute data on GitHub.