LLM VRAM Calculator - GPU Memory for AI Models
Calculate GPU memory requirements for running LLMs like Llama 3, Mistral, and Stable Diffusion locally. Get hardware recommendations for your AI workstation.
Configure Your Model
Configure your model
Select a model and quantization level, then click calculate to see VRAM requirements and GPU recommendations.
Results for
Required VRAM
Recommended VRAM
GPU Recommendations
Calculation Details
Understanding VRAM Requirements
Key factors that determine how much GPU memory you need
Model Parameters
The number of parameters (weights) directly impacts VRAM. A 7B model has 7 billion parameters, each requiring storage in memory.
Quantization
Reducing precision from FP16 to INT4 can cut VRAM by 4x. Modern quantization methods maintain quality while dramatically reducing memory.
Context Length
The KV cache for attention grows with context length. 32K context needs significantly more VRAM than 4K context.
Inference Overhead
Runtime buffers, CUDA kernels, and framework overhead typically add 10-20% to base requirements.
Quick Reference: Popular Models
Approximate VRAM requirements at different quantization levels
| Model | Parameters | FP16 | INT8 | INT4 / Q4 | Best GPU |
|---|---|---|---|---|---|
| Llama 3 8B | 8B | ~16 GB | ~10 GB | ~6 GB | RTX 5070 / 4070 Ti / 3080 |
| Llama 3 70B | 70B | ~140 GB | ~75 GB | ~40 GB | 2x RTX 5090 / 2x 4090 / A100 80GB |
| Mistral 7B | 7B | ~14 GB | ~9 GB | ~5 GB | RTX 5060 Ti / 4060 Ti 16GB |
| Mixtral 8x7B | 47B (active: 13B) | ~94 GB | ~50 GB | ~26 GB | RTX 5090 / 4090 / 2x 3090 |
| Qwen 2 72B | 72B | ~144 GB | ~77 GB | ~41 GB | 2x RTX 5090 / 2x 4090 / A100 80GB |
| Stable Diffusion XL | ~6.6B | ~8-12 GB (with optimizations) | RTX 5060 / 4060 / 3060 12GB | ||
| FLUX.1 Dev | ~12B | ~24 GB | RTX 5090 / 4090 / 3090 | ||