AWS Amplify is best known for rapid application development — spinning up auth, APIs, and storage so teams can get products into users' hands quickly. With Amplify Gen 2, it became more than a prototyping tool. Combined with the AWS CDK, Amplify can serve as the backbone for production-grade, multi-tenant AI infrastructure.
This is exactly how Matter & Gas evolved: starting with Amplify Gen 2 for the baseline, then extending it with CDK constructs to orchestrate models, manage prompts safely, process documents asynchronously, and enforce cost and reliability controls. The result is an AI-first platform that's both fast to iterate on and hardened for production.
The AI Infrastructure Challenge
Running a demo with one LLM is easy. Running production AI reliably is not.
The challenges we faced:
- Reliability → AI providers fail unpredictably; workflows need circuit breakers and fallbacks
- Cost control → Token usage spirals without budget enforcement
- Observability → Debugging requires structured logs, metrics, and tracing across stages
- Prompt management → Hardcoded strings break under versioning and audit requirements
- Multi-model orchestration → Different workflows need different providers, sometimes with failover
- Document processing → Async OCR, embedding, and vectorization pipelines must be scalable
These aren't solved by API calls alone. They demand orchestration, tenant isolation, resilience, and production observability.
Why Amplify Gen 2 Was the Foundation
Amplify Gen 2 provided a secure, multi-tenant baseline:
- Authentication & isolation → Cognito + Amplify Data models for user/tenant separation
- Data & storage → DynamoDB, S3, and GraphQL APIs automatically wired in
- Serverless compute → Amplify Functions (Lambdas) integrated seamlessly with APIs and storage
With those pieces in place, we used CDK to extend Amplify into an AI-first platform.
Extending Amplify with CDK for AI Workflows
1. Workflow Runner (Graph Engine)
At the core is the workflow runner Lambda — a graph-based execution engine.
Workflows are JSON-defined graphs of nodes (tasks) and edges (flow).
Supported node types (nine core):
- ModelInvoke → AI inference with prompt engine + budgets
- VectorSearch → Pinecone RAG queries
- VectorWrite → embeddings → Pinecone
- Router → conditional branching (safe DSL)
- SlotTracker → conversational slot collection
- ConversationMemory → stateful context
- IntentClassifier → intent detection
- Format → transform outputs (JSON, markdown, text)
- StreamToClient → live streaming with metadata
The runner enforces:
- Virtual START/END edges auto-added
- Connectivity validation: no dangling nodes
- State persistence in DynamoDB
- Streaming responses back to clients
2. Prompt Engine
Prompts are first-class artifacts:
- BasePromptVersion → immutable, content-hashed, inline or S3 (KMS encrypted)
- ActivePromptPointer → mutable, tracks rollback via previousVersionId
- CAS Update Lambda → atomic updates with DLQ fallback
- PromptArchiveBucket → 1-year archival, auto-cleanup of incomplete uploads
- Template interpolation → safe
${...}
substitution (dot-notation only) - AuditLog → full traceability (who, when, why)
Fallback is never hardcoded: if nothing resolves, the engine falls back to the neutral prompt tied to DEFAULT_MODEL_ID
.
3. Model Registry
Models are centralized in modelCapabilities.ts
, not scattered in code:
- Identity → provider (openai, anthropic, amazon, meta)
- Capabilities → streaming, JSON mode, function calling, multimodal
- Limits → context windows, reserved tokens, pricing per 1K tokens
- Reliability → circuit breaker state persisted in DynamoDB
- Tokenizer config → exact/approx estimators, provider-specific fallbacks
Workflows select models by ID; normalization strips Bedrock prefixes (us.
).
Unknown IDs fail fast — no silent fallback.
4. Document Processing Pipeline
Docs aren't static uploads — they're transformed into vectors for retrieval:
- Upload → GraphQL API → S3 + SQS enqueue
- OCR → Amazon Textract async jobs
- Chunking & embeddings → Titan v2 embeddings
- Vector storage → Pinecone (per-tenant namespaces)
- Progress tracking → DynamoDB status updates
- Resilience → DLQs for every stage (DocumentProcessingDLQ, TextractProcessingDLQ)
Collections enforce ACLs and cascade deletions across S3, DynamoDB, and Pinecone.
5. Reliability Enhancements
- Circuit breakers → block unstable models
- Token budgets → enforce per-call limits, prune memory safely
- Fallback prompts → neutral base ensures continuity
- DLQs → safety nets at every async stage
- Secrets Manager → OpenAI + Pinecone keys securely injected
6. Observability
- AWS Powertools → structured logging, metrics, tracing
- CloudWatch → per-model + per-workflow dashboards
- X-Ray → request path from API → Lambda → Bedrock/Pinecone
- Correlation IDs → threaded across all stages
Every prompt update, circuit breaker trip/reset, and document pipeline failure is fully auditable.
AI Reliability Patterns
Three key safety nets in production:
- Circuit breakers → trip on error rate thresholds, persist state in DynamoDB, fail-open if checks fail
- Token budgets → pre-enforce input/output caps, drop oldest memory first, preserve system/user priority
- Fallback prompts → resolution never fails, always returns neutral prompt bound to
DEFAULT_MODEL_ID
From Prototype to Production
Amplify Gen 2 gave us the baseline (auth, APIs, storage). CDK gave us the extensions (workflow runner, prompt engine, registry, pipelines, breakers).
Example use cases:
-
Conversational Assistant with Memory
- ModelInvoke → ConversationMemory → SlotTracker → Format → StreamToClient
-
Document-Aware RAG Workflow
- IntentClassifier → VectorSearch → ModelInvoke → Router → StreamToClient
-
Multi-Step Orchestration
- Router → SlotTracker → VectorSearch → ModelInvoke → Format → StreamToClient
Lessons Learned
- Amplify Gen 2 is a launchpad — CDK makes it production-ready
- Reliability patterns are mandatory — budgets, breakers, DLQs from day one
- Observability is non-negotiable — Powertools + X-Ray make debugging tractable
- Tenant isolation matters — enforced across S3, Pinecone, and WorkflowAccess
The Path Forward
The combination of Amplify Gen 2 + CDK produced a multi-tenant AI-first platform with:
- Workflow orchestration
- Prompt versioning + archival
- Multi-provider model registry
- Document intelligence pipeline
- Resilience + observability baked in
Amplify isn't just a frontend tool anymore — with CDK, it's a serious foundation for AI-first SaaS platforms, scaling smoothly from prototype to enterprise deployment.