RTX 3080
10GB VRAM · Ampere (CUDA)
| Model | Quant | VRAM | Tok/s | Data | Fits |
|---|---|---|---|---|---|
| llama3.2-3b | Q4_K_M | 2GB | 95 | verified | ✓ |
| gemma3-4b | Q4_K_M | 3GB | 88 | verified | ✓ |
| mistral-7b | Q4_K_M | 4.7GB | 67 | verified | ✓ |
| qwen2.5-7b | Q4_K_M | 4.7GB | 65 | verified | ✓ |
| llama3.1-8b | Q4_K_M | 5GB | 62 | verified | ✓ |
| deepseek-r1-7b | Q4_K_M | 4.7GB | 60 | estimate | ✓ |
| gemma3-12b | Q4_K_M | 7.5GB | 38 | verified | ✓ |
| qwen2.5-14b | Q4_K_M | 9GB | 28 | estimate | Tight |
| phi-4-14b | Q4_K_M | 9GB | 26 | estimate | Tight |
| gemma3-27b | Q4_K_M | 16.5GB | — | estimate | ✗ No |
| llama3.1-70b | Q4_K_M | 42GB | — | estimate | ✗ No |
| mistral-22b | Q4_K_M | 14GB | — | estimate | ✗ No |
| nemotron-34b | Q4_K_M | 20.7GB | — | estimate | ✗ No |
| qwen2.5-32b | Q4_K_M | 19.5GB | — | estimate | ✗ No |
| qwen2.5-72b | Q4_K_M | 43.5GB | — | estimate | ✗ No |
Last updated: 2026-06-11 ·
Source ·
Improve this data