Question 1

What LLMs can run on RTX 3080?

Accepted Answer

9 Q4_K_M models in this dataset are marked as fitting on RTX 3080. Start with the table rows marked as fitting, then compare VRAM and decode tok/s.

Question 2

What is the fastest local LLM on RTX 3080?

Accepted Answer

llama3.2-3b is the fastest listed model on RTX 3080 at 95 decode tokens per second.

Question 3

How much VRAM does RTX 3080 have for local LLMs?

Accepted Answer

RTX 3080 has 10GB VRAM. Models near 80% or more of that capacity are marked as tight fits.

Question 4

Can RTX 3080 run 70B local LLMs?

Accepted Answer

RTX 3080 is not marked as fitting the listed 70B-class Q4_K_M models in this dataset. Check the fit column before trying larger context windows.

Question 5

What quantization should I use on RTX 3080?

Accepted Answer

Use Q4_K_M as the baseline for this site. It is the comparison format used across the current benchmark table and keeps VRAM requirements predictable.

Model	Quant	VRAM	Tok/s	Data	Fits
llama3.2-3b	Q4_K_M	2GB	95	Community benchmark	✓
gemma3-4b	Q4_K_M	3GB	88	Community benchmark	✓
mistral-7b	Q4_K_M	4.7GB	67	Community benchmark	✓
qwen2.5-7b	Q4_K_M	4.7GB	65	Community benchmark	✓
llama3.1-8b	Q4_K_M	5GB	62	Community benchmark	✓
deepseek-r1-7b	Q4_K_M	4.7GB	60	estimate	✓
gemma3-12b	Q4_K_M	7.5GB	38	Community benchmark	✓
qwen2.5-14b	Q4_K_M	9GB	28	estimate	Tight
phi-4-14b	Q4_K_M	9GB	26	estimate	Tight
gemma3-27b	Q4_K_M	16.5GB	—	estimate	✗ No
llama3.1-70b	Q4_K_M	42GB	—	estimate	✗ No
mistral-24b	Q4_K_M	14.4GB	—	estimate	✗ No
nemotron-51b	Q4_K_M	30.6GB	—	estimate	✗ No
qwen2.5-32b	Q4_K_M	19.5GB	—	estimate	✗ No
qwen2.5-72b	Q4_K_M	43.5GB	—	estimate	✗ No

RTX 3080

You might also compare

RTX 3080 local LLM FAQ