LLM VRAM Calculator
Estimate how much GPU memory your local LLM will use. Plan your hardware before you buy.
1B7B30B70B120B
~5-8% quality loss
2K8K32K64K128K
Estimated VRAM needed
5.8 GB
Base model: 4.7 GB · KV cache: 0.0 GB · Activations: ~1 GB
GPU Compatibility
RTX 3060 12GB12 GB ✅ Comfortable
RTX 3080 / 4070 / 4070 Super12 GB ✅ Comfortable
RTX 408016 GB ✅ Comfortable
RTX 4090 / 309024 GB ✅ Comfortable
RTX 509032 GB ✅ Comfortable
A600048 GB ✅ Comfortable
2x RTX 309048 GB ✅ Comfortable
A100 80GB / H100 80GB80 GB ✅ Comfortable
Mac Studio M2 Ultra 192GB192 GB ✅ Comfortable
Quick presets
📌 Notes on accuracy
- Numbers are approximations for inference with Ollama / llama.cpp / vLLM
- Real VRAM usage varies ±10% based on engine and OS overhead
- MoE models (Mixtral, Qwen3-MoE) use less VRAM than dense models of same size
- Flash Attention reduces KV cache by 20-40% — not factored in
- For training/fine-tuning, multiply by 4-8x (gradients + optimizer state)