HomeToolsVRAM Calculator

LLM VRAM Calculator

Estimate how much GPU memory your local LLM will use. Plan your hardware before you buy.

1B7B30B70B120B

~5-8% quality loss

2K8K32K64K128K
Estimated VRAM needed
5.8 GB
Base model: 4.7 GB · KV cache: 0.0 GB · Activations: ~1 GB

GPU Compatibility

RTX 3060 12GB12 GB  ✅ Comfortable
RTX 3080 / 4070 / 4070 Super12 GB  ✅ Comfortable
RTX 408016 GB  ✅ Comfortable
RTX 4090 / 309024 GB  ✅ Comfortable
RTX 509032 GB  ✅ Comfortable
A600048 GB  ✅ Comfortable
2x RTX 309048 GB  ✅ Comfortable
A100 80GB / H100 80GB80 GB  ✅ Comfortable
Mac Studio M2 Ultra 192GB192 GB  ✅ Comfortable

Quick presets

📌 Notes on accuracy

  • Numbers are approximations for inference with Ollama / llama.cpp / vLLM
  • Real VRAM usage varies ±10% based on engine and OS overhead
  • MoE models (Mixtral, Qwen3-MoE) use less VRAM than dense models of same size
  • Flash Attention reduces KV cache by 20-40% — not factored in
  • For training/fine-tuning, multiply by 4-8x (gradients + optimizer state)

Related Articles