Local LLM Hardware Guide

Which open source models fit on your GPU — and how fast do they run?
Feasibility-first. Quantization-aware. No cloud required.

Choose your GPU

Apple M2 Pro
16GB unified  ·  Apple Silicon (Metal/MLX)
RTX 3080
10GB VRAM  ·  Ampere (CUDA)
RTX 4090
24GB VRAM  ·  Ada Lovelace (CUDA)
Data sources: Community benchmarks from r/LocalLLaMA, Hugging Face model cards. Tokens/sec = generation (decode) speed only. VRAM estimates assume GGUF format and 2048-token context. Contribute data on GitHub.