← All GPUs

RTX 3080

10GB VRAM  ·  Ampere (CUDA)

ℹ️ VRAM estimates assume GGUF format and 2048-token context. Running at 8k+ context adds 2–4GB. Tokens/sec = generation (decode) speed only.
Model Quant VRAM Tok/s Data Fits
llama3.2-3b Q4_K_M 2GB 95 verified
gemma3-4b Q4_K_M 3GB 88 verified
mistral-7b Q4_K_M 4.7GB 67 verified
qwen2.5-7b Q4_K_M 4.7GB 65 verified
llama3.1-8b Q4_K_M 5GB 62 verified
deepseek-r1-7b Q4_K_M 4.7GB 60 estimate
gemma3-12b Q4_K_M 7.5GB 38 verified
qwen2.5-14b Q4_K_M 9GB 28 estimate Tight
phi-4-14b Q4_K_M 9GB 26 estimate Tight
gemma3-27b Q4_K_M 16.5GB estimate ✗ No
llama3.1-70b Q4_K_M 42GB estimate ✗ No
mistral-22b Q4_K_M 14GB estimate ✗ No
nemotron-34b Q4_K_M 20.7GB estimate ✗ No
qwen2.5-32b Q4_K_M 19.5GB estimate ✗ No
qwen2.5-72b Q4_K_M 43.5GB estimate ✗ No
Last updated: 2026-06-11 · Source · Improve this data