Review or Reviews

테크, 개발, AI, 하드웨어 — 실사용 기반 리뷰와 가이드

최신 글

더 보기

The Ollama num_ctx Trap: a Default You Never Set Can Halve Your Tokens/sec (Full Sweep on a 3090)

Ollama sizes the KV cache to your context length, and the default can quietly push a model that fits in VRAM into a CPU spill — cutting throughput. A full num_ctx sweep of Qwen3.6-27B on a single RTX 3090 shows exactly where the cliff is, and why a bigger context is not free.

6/7

Building a Fully-Local Research RAG on 2× GTX 1080 Ti + an RTX 3090: 3 Gotchas (CPU Embeddings, the Context Trap, and Not Merging GPUs)

A field report: building a private, fully-offline hybrid-retrieval RAG over my own papers across old and new GPUs — the embedder that froze the whole GPU, the context setting that halved my speed, and why pooling the cards was a trap. Plus an MCP server so an agent can cite my corpus.

6/6

Running Brand-New Gemma 4 12B on an 8-Year-Old GTX 1080 Ti: Speed, 3 Gotchas, and Why Q8 Beat Q4 on My Own Field

I pulled the just-released Gemma 4 12B and ran it on a GTX 1080 Ti. ~28 tok/s at Q4 on one card — but three things broke first, and going to Q8 (split across two cards, 30% slower) fixed both the token glitches and a domain answer the Q4 got confidently wrong.

6/5

Running 35B–400B LLMs on a GPU-less Cluster to Mine 10,000 Papers — and the 4 Bugs That Almost Ruined the Data

A field report: a CPU-only, GPU-less distributed LLM pipeline (llama.cpp + quantized MoE) mining 10,000 papers — and the 4 silent data-quality bugs that nearly ruined the results.

6/3

Running a 35B MoE (Qwen3.6-35B-A3B) on 2× GTX 1080 Ti in 2026 — Real Benchmarks, and Does the Second GPU Actually Help?

I benchmarked Qwen3.6-35B-A3B (IQ4_XS) on a pair of 8-year-old GTX 1080 Ti cards. It runs at ~20 tokens/sec — and the answer to 'does the second GPU help?' is yes, but only ~20% faster, not 2×. Here are the real numbers, the VRAM math, and why a 35B model fits 22 GB at all.

6/3

4× GTX 1080 Ti for Local LLM in 2026 — 44GB Combined VRAM Build Guide + Real Benchmarks

Practical build guide for running four GTX 1080 Tis in a single rig — 44 GB combined VRAM at roughly half the cost of a used RTX 3090. Covers PCIe slot configurations on HEDT and Threadripper boards, 1500W+ PSU sizing, cooling (1000W heat dissipation), llama.cpp tensor-split setup, expected throughput on 70B Llama, Mixtral 8×7B, and Qwen3.6-35B-A3B, plus the honest cases where this is not the right choice.

5/27