A working notebook on agentic AI, RAG, MCP, fine-tuning, and the engineering work that keeps these systems honest in production.
All posts
How to design a central orchestrator that routes user queries to specialized sub-agents, with session isolation, OAuth identity, episodic memory, and CloudWatch observability through AgentCore Runtime.
ProtocolsMCP standardizes how LLMs talk to tools, APIs, and data sources. A walkthrough of designing MCP servers that reduce duplicated context-broker logic and accelerate onboarding of new agent capabilities.
RAGMost RAG tutorials stop at the demo. Production needs chunking strategy, retrieval quality measurement, vector store choice, hybrid search, and an eval pipeline that catches regressions before users do.
Fine-tuningDataset curation that doesn't pollute eval, choosing LoRA vs. full fine-tunes, training infrastructure on EC2 GPU clusters, and the cost math that justified the work against commercial APIs.
Agent ToolsWhat it took to ship a natural-language-to-SQL tool over Snowflake for field engineers. Schema awareness, query guardrails, performance budgets, and dealing with hallucinated column names.
LangGraphState machines for agents. Conditional routing, error recovery, checkpoints, and how to keep an agent's memory coherent across long-running multi-step workflows.
EvaluationRagas, golden datasets, precision and recall on retrieval, F1 on extraction, AUC-ROC on classification, and the custom eval pipelines you need when the off-the-shelf ones aren't enough.
MLOpsClassical MLOps tooling stretched to fit LLM apps. Data drift detection, automated retraining triggers via CloudWatch and Evidently AI, and what changes when "the model" is a prompt template.
AWSA field report on AgentCore Runtime. Session isolation patterns, episodic memory configuration, OAuth-based identity for cross-system access, and OpenTelemetry dashboards that catch goal-failure modes.
ArchitectureHow a central orchestrator agent makes specialization tractable. Routing heuristics, fallback chains, latency budgets, and why one big monolithic agent almost never beats a well-designed crew.