RTX 4090
24GB VRAM · Ada Lovelace (CUDA)
| Model | Quant | VRAM | Tok/s | Data | Fits |
|---|---|---|---|---|---|
| llama3.2-3b | Q4_K_M | 2GB | 210 | verified | ✓ |
| gemma3-4b | Q4_K_M | 3GB | 190 | verified | ✓ |
| qwen2.5-7b | Q4_K_M | 4.7GB | 155 | verified | ✓ |
| mistral-7b | Q4_K_M | 4.7GB | 152 | verified | ✓ |
| llama3.1-8b | Q4_K_M | 5GB | 148 | verified | ✓ |
| deepseek-r1-7b | Q4_K_M | 4.7GB | 145 | estimate | ✓ |
| gemma3-12b | Q4_K_M | 7.5GB | 95 | verified | ✓ |
| qwen2.5-14b | Q4_K_M | 9GB | 92 | verified | ✓ |
| phi-4-14b | Q4_K_M | 9GB | 88 | verified | ✓ |
| mistral-22b | Q4_K_M | 14GB | 58 | verified | ✓ |
| gemma3-27b | Q4_K_M | 16.5GB | 48 | verified | ✓ |
| qwen2.5-32b | Q4_K_M | 19.5GB | 44 | verified | Tight |
| nemotron-34b | Q4_K_M | 20.7GB | 38 | estimate | Tight |
| llama3.1-70b | Q4_K_M | 42GB | — | estimate | ✗ No |
| qwen2.5-72b | Q4_K_M | 43.5GB | — | estimate | ✗ No |
Last updated: 2026-06-11 ·
Source ·
Improve this data