AWS AgentCore in Practice: Memory, Identity, Observability

AgentCore Runtime is AWS Bedrock's production hosting layer for agents. It's been my main deployment target for the past year. This is what's worked and what's tripped me up.

Session isolation

Every conversation runs in its own session with its own memory. You don't have to engineer this. It just works, which means it's easy to take for granted until you see what happens when it's not there.

The thing to know: session ID has to flow from your front end through API Gateway to AgentCore. If you reuse session IDs accidentally, you reuse memory. We caught one bug where a service account was reusing a session ID across all its calls. The agent slowly accumulated context from unrelated requests until responses degraded.

Episodic memory

AgentCore Memory stores conversation turns and retrieves relevant ones for context. You configure how much history, how it's retrieved, and what gets summarized versus kept verbatim.

The trap: storing too much. Memory you can't trust is worse than no memory. We summarize aggressively after 10 turns and prune session memory after 24 hours of inactivity.

OAuth identity

AgentCore Identity lets agents act on behalf of users via OAuth. The user grants the agent permission, the agent calls AWS services as the user, and IAM enforces the user's actual permissions.

This sounds boring. It's transformative for enterprise deployment. The agent can't escalate privileges, audit logs are honest, and security teams stop blocking your launches.

Observability dashboards

We pipe AgentCore traces through OpenTelemetry to CloudWatch and Grafana. The dashboards we check daily:

Token usage per agent per session (cost surface)
Latency at P50/P95/P99 (UX surface)
Goal success rate (quality surface)
Tool call failures (integration surface)

Goal success rate is the one that matters most. The other three are leading indicators for the user experience that the success rate reflects directly.

Operational gotchas

Cold starts on AgentCore can be slow. Warm pools help but don't eliminate the issue. For low-traffic agents, provisioned concurrency is worth the cost.

Bedrock model quotas are regional. We had one launch where us-east-1 capacity exhausted and we hadn't pre-provisioned us-west-2. Plan for failover before you need it.