Review or Reviews

테크, 개발, AI, 하드웨어 — 실사용 기반 리뷰와 가이드

최신 글

더 보기

Running Qwen3.6-35B-A3B on RTX 3090 24GB — Real Use Cases for the 3B-Active MoE (2026)

Qwen3.6-35B-A3B (April 2026 release) puts a 35B-parameter MoE model on a single RTX 3090 24GB at usable speed thanks to its 3B active parameters and Apache 2.0 license. Practical use cases — agentic coding (SWE-bench 73.4), 262K context document analysis, vision-language tasks, and tool calling — with realistic VRAM math, expected throughput, and where the model genuinely outperforms 8B alternatives.

5/27

llama.cpp --split-mode row vs layer on Multi-GPU — Old GPU Edition (1080 Ti, 2080, P40)

When llama.cpp's --split-mode row beats layer on dual-GPU inference, when layer is faster, and why the answer is different on Pascal/Turing without NVLink than on Ampere with NVLink. Real benchmarks on 2× GTX 1080 Ti for Mixtral, Yi-34B, Llama 3.1 13B, with PCIe lane and tensor split notes.

5/23

Ollama Dual GPU Without NVLink — Tensor Split on 2× GTX 1080 Ti (Actual Benchmarks)

How to make Ollama actually use both GTX 1080 Ti cards without NVLink — environment variables, tensor split configuration, and real tokens/sec benchmarks for 13B and 30B-class models. Where PCIe becomes the bottleneck, what works versus what just looks like it's working, and how the same setup compares to a single 3090.

5/23

Running Modern LLMs on GTX 1080 Ti in 2026 — What Still Works, What OOMs

A 2026 reality check for the GTX 1080 Ti: 11 GB VRAM, Pascal architecture, no FP16 tensor cores. Which modern LLMs (Llama 3.1, Qwen 3, Phi-4, Gemma 3) still load and run usefully, what hits OOM, real tokens/sec numbers from a 1080 Ti, and when it's time to retire the card.

5/23

Ollama vs LM Studio vs llama.cpp: Honest 2026 Comparison for Local LLM

Definitive comparison of the three most popular local LLM inference engines in 2026. Real performance benchmarks on RTX 3090, feature-by-feature matrix, setup walkthroughs, and a decision framework for picking the right tool for your use case.

5/18

Best Ollama Models for RTX 3090 24GB in 2026: Real Benchmarks (Qwen3 vs DeepSeek vs Llama)

Real Ollama benchmarks on RTX 3090 24GB — tokens/sec, VRAM, quality scores for 12+ models. Qwen3-30B vs DeepSeek-Coder-V3 vs Llama 4 head-to-head. Plus RTX 4090 comparison, cloud API cost analysis, and which local LLM to pick for your use case in 2026.

3/30