Blog

Notes from building ragweld: API-first retrieval engineering, model ops, eval loops, and production agent integration.

When MLX Training Cooked My M4 Pro: A Unified Memory Horror Story

A 0.6B parameter reranker training run hard-froze a 48GB M4 Pro Mac Mini — twice. Here's the forensic timeline from the run logs, why unified memory makes GPU OOM act like a kernel panic, and the specific code changes that prevent it.

Qwen3 LoRA Learning Reranker on Apple Silicon

How we implemented a Qwen3 LoRA learning reranker with yes/no logits on Apple Silicon, plus the five implementation bugs that silently degrade scoring quality.

When to Query Chat Memory vs. Your Corpus (And When to Do Both)

A practical retrieval policy for deciding when to hit chat memory, when to hit the corpus, and how to avoid token/latency blowups at scale.