← All GPUs

Apple M2 Pro

16GB unified memory  ·  Apple Silicon (Metal/MLX)

⚠️ Apple Silicon uses unified memory shared between CPU and GPU. macOS typically reserves 3–4GB for system use — effective available memory is ~12GB, not the full 16GB spec. Models near the top of the VRAM limit may OOM on a loaded system.
ℹ️ VRAM estimates assume GGUF format and 2048-token context. Running at 8k+ context adds 2–4GB. Tokens/sec = generation (decode) speed only.
Model Quant VRAM Tok/s Data Fits
llama3.2-3b Q4_K_M 2GB 82 verified
gemma3-4b Q4_K_M 3GB 75 verified
mistral-7b Q4_K_M 4.7GB 52 verified
qwen2.5-7b Q4_K_M 4.7GB 50 verified
llama3.1-8b Q4_K_M 5GB 48 verified
deepseek-r1-7b Q4_K_M 4.7GB 46 estimate
gemma3-12b Q4_K_M 7.5GB 30 verified
qwen2.5-14b Q4_K_M 9GB 22 verified
phi-4-14b Q4_K_M 9GB 20 estimate
gemma3-27b Q4_K_M 16.5GB estimate ✗ No
llama3.1-70b Q4_K_M 42GB estimate ✗ No
mistral-22b Q4_K_M 14GB estimate ✗ No
nemotron-34b Q4_K_M 20.7GB estimate ✗ No
qwen2.5-32b Q4_K_M 19.5GB estimate ✗ No
qwen2.5-72b Q4_K_M 43.5GB estimate ✗ No
Last updated: 2026-06-11 · Source · Improve this data