Extending AWS Amplify Gen 2 with CDK for AI-First Workflows

AWS Amplify is best known for rapid application development — spinning up auth, APIs, and storage so teams can get products into users' hands quickly. With Amplify Gen 2, it became more than a prototyping tool. Combined with the AWS CDK, Amplify can serve as the backbone for production-grade, multi-tenant AI infrastructure.

This is exactly how Matter & Gas evolved: starting with Amplify Gen 2 for the baseline, then extending it with CDK constructs to orchestrate models, manage prompts safely, process documents asynchronously, and enforce cost and reliability controls. The result is an AI-first platform that's both fast to iterate on and hardened for production.

The AI Infrastructure Challenge

Running a demo with one LLM is easy. Running production AI reliably is not.

The challenges we faced:

Reliability → AI providers fail unpredictably; workflows need circuit breakers and fallbacks
Cost control → Token usage spirals without budget enforcement
Observability → Debugging requires structured logs, metrics, and tracing across stages
Prompt management → Hardcoded strings break under versioning and audit requirements
Multi-model orchestration → Different workflows need different providers, sometimes with failover
Document processing → Async OCR, embedding, and vectorization pipelines must be scalable

These aren't solved by API calls alone. They demand orchestration, tenant isolation, resilience, and production observability.

Why Amplify Gen 2 Was the Foundation

Amplify Gen 2 provided a secure, multi-tenant baseline:

Authentication & isolation → Cognito + Amplify Data models for user/tenant separation
Data & storage → DynamoDB, S3, and GraphQL APIs automatically wired in
Serverless compute → Amplify Functions (Lambdas) integrated seamlessly with APIs and storage

With those pieces in place, we used CDK to extend Amplify into an AI-first platform.

Extending Amplify with CDK for AI Workflows

1. Workflow Runner (Graph Engine)

At the core is the workflow runner Lambda — a graph-based execution engine.

Workflows are JSON-defined graphs of nodes (tasks) and edges (flow).

Supported node types (nine core):

ModelInvoke → AI inference with prompt engine + budgets
VectorSearch → Pinecone RAG queries
VectorWrite → embeddings → Pinecone
Router → conditional branching (safe DSL)
SlotTracker → conversational slot collection
ConversationMemory → stateful context
IntentClassifier → intent detection
Format → transform outputs (JSON, markdown, text)
StreamToClient → live streaming with metadata

The runner enforces:

Virtual START/END edges auto-added
Connectivity validation: no dangling nodes
State persistence in DynamoDB
Streaming responses back to clients

2. Prompt Engine

Prompts are first-class artifacts:

BasePromptVersion → immutable, content-hashed, inline or S3 (KMS encrypted)
ActivePromptPointer → mutable, tracks rollback via previousVersionId
CAS Update Lambda → atomic updates with DLQ fallback
PromptArchiveBucket → 1-year archival, auto-cleanup of incomplete uploads
Template interpolation → safe ${...} substitution (dot-notation only)
AuditLog → full traceability (who, when, why)

Fallback is never hardcoded: if nothing resolves, the engine falls back to the neutral prompt tied to DEFAULT_MODEL_ID.

3. Model Registry

Models are centralized in modelCapabilities.ts, not scattered in code:

Identity → provider (openai, anthropic, amazon, meta)
Capabilities → streaming, JSON mode, function calling, multimodal
Limits → context windows, reserved tokens, pricing per 1K tokens
Reliability → circuit breaker state persisted in DynamoDB
Tokenizer config → exact/approx estimators, provider-specific fallbacks

Workflows select models by ID; normalization strips Bedrock prefixes (us.). Unknown IDs fail fast — no silent fallback.

4. Document Processing Pipeline

Docs aren't static uploads — they're transformed into vectors for retrieval:

Upload → GraphQL API → S3 + SQS enqueue
OCR → Amazon Textract async jobs
Chunking & embeddings → Titan v2 embeddings
Vector storage → Pinecone (per-tenant namespaces)
Progress tracking → DynamoDB status updates
Resilience → DLQs for every stage (DocumentProcessingDLQ, TextractProcessingDLQ)

Collections enforce ACLs and cascade deletions across S3, DynamoDB, and Pinecone.

5. Reliability Enhancements

Circuit breakers → block unstable models
Token budgets → enforce per-call limits, prune memory safely
Fallback prompts → neutral base ensures continuity
DLQs → safety nets at every async stage
Secrets Manager → OpenAI + Pinecone keys securely injected

6. Observability

AWS Powertools → structured logging, metrics, tracing
CloudWatch → per-model + per-workflow dashboards
X-Ray → request path from API → Lambda → Bedrock/Pinecone
Correlation IDs → threaded across all stages

Every prompt update, circuit breaker trip/reset, and document pipeline failure is fully auditable.

AI Reliability Patterns

Three key safety nets in production:

Circuit breakers → trip on error rate thresholds, persist state in DynamoDB, fail-open if checks fail
Token budgets → pre-enforce input/output caps, drop oldest memory first, preserve system/user priority
Fallback prompts → resolution never fails, always returns neutral prompt bound to DEFAULT_MODEL_ID

From Prototype to Production

Amplify Gen 2 gave us the baseline (auth, APIs, storage). CDK gave us the extensions (workflow runner, prompt engine, registry, pipelines, breakers).

Example use cases:

Conversational Assistant with Memory
- ModelInvoke → ConversationMemory → SlotTracker → Format → StreamToClient
Document-Aware RAG Workflow
- IntentClassifier → VectorSearch → ModelInvoke → Router → StreamToClient
Multi-Step Orchestration
- Router → SlotTracker → VectorSearch → ModelInvoke → Format → StreamToClient

Lessons Learned

Amplify Gen 2 is a launchpad — CDK makes it production-ready
Reliability patterns are mandatory — budgets, breakers, DLQs from day one
Observability is non-negotiable — Powertools + X-Ray make debugging tractable
Tenant isolation matters — enforced across S3, Pinecone, and WorkflowAccess

The Path Forward

The combination of Amplify Gen 2 + CDK produced a multi-tenant AI-first platform with:

Workflow orchestration
Prompt versioning + archival
Multi-provider model registry
Document intelligence pipeline
Resilience + observability baked in

Amplify isn't just a frontend tool anymore — with CDK, it's a serious foundation for AI-first SaaS platforms, scaling smoothly from prototype to enterprise deployment.