블로그

35개의 글

전체Local LLM일반바이오인포매틱스개발AI/LLMDevOpsAI/MLSelf-HostingGPUs
Local LLM2026년 6월 23일· 7 min read

The Open-Model Cost Chart Everyone's Sharing Is API Prices. Here's What Self-Hosting Actually Gets You (Measured)

The intelligence-vs-cost chart making the rounds shows open models winning the value quadrant. True, but the x-axis is API token price. The cheap open winners are 100B-to-1T MoEs you can't run on a desktop GPU. Here's what you can actually self-host on an 11GB and a 24GB card, measured, and where the real ceiling is.

#local LLM#open source#self-hosting#GPU poor
Local LLM2026년 6월 19일· 10 min read

I Added a Verify Layer to My Local RAG to Catch Hallucinations. It Caught Me Being Wrong Twice About My Own Corpus

A claim-verification layer for a local RAG co-scientist, inspired by Karpathy's llm-wiki pattern. I tried to measure whether it catches hallucinations, almost shipped a false finding, and ended up with a clearer picture of what claim-checking can and can't do: it reliably catches values that are absent from the context, misses a real number pinned to the wrong question, and misses a false premise outright, and a model can't reliably referee its own blind spots.

#local LLM#RAG#hallucination#ollama
Local LLM2026년 6월 10일· 6 min read

The Prefill Wall: Why MTP's 2× Barely Moves Long-Context Latency (Qwen3.6-27B, RTX 3090)

My last post doubled generation with MTP. A reader asked the question I'd skipped — what about prompt processing at long context? I measured prefill across context sizes on a 3090: a 64k prompt takes ~59s before the first token, and MTP can't touch that. Here's the math on when MTP's 2× actually matters, and when prefill swallows it.

#local LLM#RTX 3090#prefill#long context
일반2026년 5월 27일· 16 min read

4× GTX 1080 Ti for Local LLM in 2026 — 44GB Combined VRAM Build Guide + Real Benchmarks

Practical build guide for running four GTX 1080 Tis in a single rig — 44 GB combined VRAM at roughly half the cost of a used RTX 3090. Covers PCIe slot configurations on HEDT and Threadripper boards, 1500W+ PSU sizing, cooling (1000W heat dissipation), llama.cpp tensor-split setup, expected throughput on 70B Llama, Mixtral 8×7B, and Qwen3.6-35B-A3B, plus the honest cases where this is not the right choice.

#4x 1080 Ti#multi-GPU LLM#44GB VRAM#Llama 70B local
일반2026년 5월 27일· 15 min read

Running Qwen3.6-35B-A3B on RTX 3090 24GB — Real Use Cases for the 3B-Active MoE (2026)

Qwen3.6-35B-A3B (April 2026 release) puts a 35B-parameter MoE model on a single RTX 3090 24GB at usable speed thanks to its 3B active parameters and Apache 2.0 license. Practical use cases — agentic coding (SWE-bench 73.4), 262K context document analysis, vision-language tasks, and tool calling — with realistic VRAM math, expected throughput, and where the model genuinely outperforms 8B alternatives.

#Qwen3.6#Qwen3.6-35B-A3B#RTX 3090#local LLM
일반2026년 5월 23일· 10 min read

Ollama Dual GPU Without NVLink — Tensor Split on 2× GTX 1080 Ti (Actual Benchmarks)

How to make Ollama actually use both GTX 1080 Ti cards without NVLink — environment variables, tensor split configuration, and real tokens/sec benchmarks for 13B and 30B-class models. Where PCIe becomes the bottleneck, what works versus what just looks like it's working, and how the same setup compares to a single 3090.

#Ollama dual GPU#GTX 1080 Ti dual#tensor split#no NVLink
바이오인포매틱스2026년 3월 18일· 15 min read

연구자를 위한 AI 어시스턴트 구축기: OpenClaw로 바이오인포매틱스 워크플로우 자동화하기

반복적인 프로테오믹스 분석 작업을 OpenClaw로 자동화하여 연구 효율성을 90% 이상 향상시킨 실제 경험담. DIA-NN 파이프라인 구축부터 바이오마커 데이터베이스 개발까지, 구체적인 구현 과정과 성과를 상세히 공개합니다.

#OpenClaw#바이오인포매틱스#프로테오믹스#AI자동화