Back to home Writing

Notes on building production AI.

A working notebook on agentic AI, RAG, MCP, fine-tuning, and the engineering work that keeps these systems honest in production.

All posts

Ten essays on shipping AI.

Agentic AI

Building Production Multi-Agent Systems on AWS Bedrock

How to design a central orchestrator that routes user queries to specialized sub-agents, with session isolation, OAuth identity, episodic memory, and CloudWatch observability through AgentCore Runtime.

May 8, 2026 · 8 min read
Protocols

Model Context Protocol: The Standard for Agent Tool Integration

MCP standardizes how LLMs talk to tools, APIs, and data sources. A walkthrough of designing MCP servers that reduce duplicated context-broker logic and accelerate onboarding of new agent capabilities.

Apr 22, 2026 · 6 min read
RAG

From Demo to Production: RAG Pipelines That Actually Scale

Most RAG tutorials stop at the demo. Production needs chunking strategy, retrieval quality measurement, vector store choice, hybrid search, and an eval pipeline that catches regressions before users do.

Apr 5, 2026 · 10 min read
Fine-tuning

Fine-Tuning Llama 3 on Domain Data: A 50K Example Playbook

Dataset curation that doesn't pollute eval, choosing LoRA vs. full fine-tunes, training infrastructure on EC2 GPU clusters, and the cost math that justified the work against commercial APIs.

Mar 18, 2026 · 12 min read
Agent Tools

Text-to-SQL at Enterprise Scale: Lessons from Field Operations

What it took to ship a natural-language-to-SQL tool over Snowflake for field engineers. Schema awareness, query guardrails, performance budgets, and dealing with hallucinated column names.

Mar 2, 2026 · 9 min read
LangGraph

LangGraph for Stateful Agent Workflows: Patterns That Work

State machines for agents. Conditional routing, error recovery, checkpoints, and how to keep an agent's memory coherent across long-running multi-step workflows.

Feb 14, 2026 · 7 min read
Evaluation

Evaluating GenAI Systems: Beyond Vibes-Based Testing

Ragas, golden datasets, precision and recall on retrieval, F1 on extraction, AUC-ROC on classification, and the custom eval pipelines you need when the off-the-shelf ones aren't enough.

Jan 28, 2026 · 8 min read
MLOps

MLOps for LLM Applications: Drift, Observability, and Retraining

Classical MLOps tooling stretched to fit LLM apps. Data drift detection, automated retraining triggers via CloudWatch and Evidently AI, and what changes when "the model" is a prompt template.

Jan 10, 2026 · 9 min read
AWS

AWS AgentCore in Practice: Memory, Identity, Observability

A field report on AgentCore Runtime. Session isolation patterns, episodic memory configuration, OAuth-based identity for cross-system access, and OpenTelemetry dashboards that catch goal-failure modes.

Dec 20, 2025 · 11 min read
Architecture

The Multi-Agent Orchestrator Pattern: Routing Across Specialized Agents

How a central orchestrator agent makes specialization tractable. Routing heuristics, fallback chains, latency budgets, and why one big monolithic agent almost never beats a well-designed crew.

Dec 5, 2025 · 7 min read