Blog
Notes from building ragweld: API-first retrieval engineering, model ops, eval loops, and production agent integration.
When MLX Training Cooked My M4 Pro: A Unified Memory Horror Story
Feb 07, 2026
A 0.6B parameter reranker training run hard-froze a 48GB M4 Pro Mac Mini — twice. Here's the forensic timeline from the run logs, why unified memory makes GPU OOM act like a kernel panic, and the specific code changes that prevent it.
Qwen3 LoRA Learning Reranker on Apple Silicon
Feb 04, 2026
How we implemented a Qwen3 LoRA learning reranker with yes/no logits on Apple Silicon, plus the five implementation bugs that silently degrade scoring quality.
When to Query Chat Memory vs. Your Corpus (And When to Do Both)
Feb 02, 2026
A practical retrieval policy for deciding when to hit chat memory, when to hit the corpus, and how to avoid token/latency blowups at scale.