Securing Ollama with API Key Authentication Using Caddy Reverse Proxy
Ollama has no built-in authentication. Here's how I secured my public-facing Ollama API with Caddy reverse proxy and X-API-Key header validation — complete with Windows setup, CORS handling, and Vercel integration.
If you're running Ollama with port forwarding (like I described in my RTX 3090 guide), anyone who discovers your IP can use your GPU for free. Ollama has zero built-in authentication. Here's how I fixed that.
The Problem
Default Ollama setup:
Internet → Router:11434 → Ollama:11434 (NO AUTH)
Anyone can:
curl http://your-ip:11434/api/generate -d '{"model":"qwen3:30b","prompt":"free GPU!"}'
This means:
- Free compute for strangers — They use your electricity and GPU
- Model abuse — Generate harmful content on your hardware
- DoS risk — Overload your GPU with requests
The Solution: Caddy as Auth Proxy
Caddy is a modern web server with automatic HTTPS. It's simpler than Nginx and perfect for this use case.
Architecture after setup:
Internet → Router:11434 → Caddy:11435 (checks X-API-Key) → Ollama:11434
↓ No key?
401 Unauthorized
Step-by-Step Setup (Windows)
1. Download Caddy
Get the Windows binary from caddyserver.com/download.
Place caddy.exe in a permanent location like C:\Tools\Caddy\.
2. Create the Caddyfile
Create C:\Tools\Caddy\Caddyfile:
:11435 {
# Health check (no auth needed)
handle /health {
respond "OK" 200
}
# API key validation
@valid_key {
header X-API-Key "your-secret-key-here-make-it-long-and-random"
}
# Authenticated requests → proxy to Ollama
handle @valid_key {
reverse_proxy localhost:11434 {
header_up Host {http.request.host}
header_up X-Real-IP {http.request.remote.host}
}
}
# Unauthorized requests
handle {
respond "Unauthorized" 401
}
# Logging
log {
output file C:\Tools\Caddy\access.log
format json
}
}
3. Generate a Strong API Key
# On Linux/Mac
openssl rand -hex 32
# Output: a1b2c3d4e5f6...64 characters
# Or just use a password generator — aim for 32+ characters
4. Update Router Port Forwarding
Change from:
External 11434 → Your PC:11434
To:
External 11434 → Your PC:11435 (Caddy's port)
The key insight: external port stays the same so your apps don't need changes. Only the internal destination changes to Caddy.
5. Start Caddy
cd C:\Tools\Caddy
.\caddy.exe run --config Caddyfile
6. Test
# No key → 401
curl http://your-public-ip:11434/api/tags
# Unauthorized
# Wrong key → 401
curl -H "X-API-Key: wrong" http://your-public-ip:11434/api/tags
# Unauthorized
# Correct key → 200
curl -H "X-API-Key: your-secret-key" http://your-public-ip:11434/api/tags
# {"models":[{"name":"qwen3:30b",...}]}
Handling CORS (For Browser Apps)
If your frontend calls Ollama directly from the browser, you need CORS headers. Update the Caddyfile:
:11435 {
# CORS preflight
@cors_preflight method OPTIONS
handle @cors_preflight {
header Access-Control-Allow-Origin "*"
header Access-Control-Allow-Methods "GET, POST, OPTIONS"
header Access-Control-Allow-Headers "Content-Type, X-API-Key"
header Access-Control-Max-Age "86400"
respond "" 204
}
@valid_key header X-API-Key "your-secret-key"
handle @valid_key {
header Access-Control-Allow-Origin "*"
reverse_proxy localhost:11434
}
handle {
respond "Unauthorized" 401
}
}
⚠️ Security note: In production, replace * with your specific domain (https://sysofti.com).
Integration with Vercel (Next.js)
For server-side API routes (recommended — don't expose your key to browsers):
1. Set Environment Variable
vercel env add OLLAMA_API_KEY
# Enter your secret key
2. Server-Side API Route
// src/app/api/chat/route.ts
export async function POST(request: Request) {
const { messages } = await request.json()
const response = await fetch('http://your-ip:11434/api/chat', {
method: 'POST',
headers: {
'Content-Type': 'application/json',
'X-API-Key': process.env.OLLAMA_API_KEY!,
},
body: JSON.stringify({
model: 'qwen3:30b',
messages,
stream: false,
}),
})
const data = await response.json()
return Response.json(data)
}
The API key stays server-side. Browsers call your Vercel route, which proxies to Ollama with the key.
Running Caddy as a Windows Service
Don't want to keep a terminal open? Install as a service:
# Run PowerShell as Administrator
sc.exe create CaddyProxy `
binPath= "C:\Tools\Caddy\caddy.exe run --config C:\Tools\Caddy\Caddyfile" `
start= auto `
DisplayName= "Caddy Ollama Proxy"
sc.exe start CaddyProxy
Now Caddy starts automatically on boot.
Monitoring & Logs
Check who's trying to access your API:
# View recent access logs
Get-Content C:\Tools\Caddy\access.log -Tail 20 | ConvertFrom-Json | Format-Table
Look for repeated 401s from the same IP — that's someone probing your endpoint.
Advanced: Rate Limiting
Add rate limiting to prevent abuse even with valid keys:
:11435 {
@valid_key header X-API-Key "your-secret-key"
handle @valid_key {
rate_limit {
zone dynamic_zone {
key {http.request.remote.host}
events 30
window 1m
}
}
reverse_proxy localhost:11434
}
handle {
respond "Unauthorized" 401
}
}
This limits to 30 requests per minute per IP.
Alternative: Nginx
If you prefer Nginx (common on Linux):
server {
listen 11435;
location / {
if ($http_x_api_key != "your-secret-key") {
return 401 "Unauthorized";
}
proxy_pass http://127.0.0.1:11434;
proxy_set_header Host $host;
proxy_read_timeout 600s; # Long timeout for LLM generation
}
}
I chose Caddy because it's a single binary with no dependencies — much simpler on Windows.
Verification Checklist
After setup, verify:
-
curl http://your-ip:11434/api/tagsreturns 401 -
curl -H "X-API-Key: YOUR_KEY" http://your-ip:11434/api/tagsreturns models - Your web app works with the key in env vars
- Caddy starts on boot (Windows service)
- Logs are recording access attempts
Conclusion
Caddy + API key authentication is the simplest way to secure a public Ollama endpoint. The setup takes 10 minutes and eliminates the biggest risk of self-hosted LLMs: unauthorized access.
The combination of Ollama (free inference) + Caddy (free security) + port forwarding (free remote access) gives you a production-grade LLM API at zero ongoing cost.
This is part of my series on self-hosted AI infrastructure. See also: Running Qwen 3 on RTX 3090 and Ollama vs ChatGPT comparison.