Extending AWS Amplify Gen 2 with CDK for AI-First Workflows

Extending AWS Amplify Gen 2 with CDK for AI-First Workflows

aws-amplifyaws-cdkai-infrastructureworkflow-orchestrationserverlessmulti-tenantproduction-systems

AWS Amplify is best known for rapid application development — spinning up auth, APIs, and storage so teams can get products into users' hands quickly. With Amplify Gen 2, it became more than a prototyping tool. Combined with the AWS CDK, Amplify can serve as the backbone for production-grade, multi-tenant AI infrastructure.

This is exactly how Matter & Gas evolved: starting with Amplify Gen 2 for the baseline, then extending it with CDK constructs to orchestrate models, manage prompts safely, process documents asynchronously, and enforce cost and reliability controls. The result is an AI-first platform that's both fast to iterate on and hardened for production.

The AI Infrastructure Challenge

Running a demo with one LLM is easy. Running production AI reliably is not.

The challenges we faced:

  • Reliability → AI providers fail unpredictably; workflows need circuit breakers and fallbacks
  • Cost control → Token usage spirals without budget enforcement
  • Observability → Debugging requires structured logs, metrics, and tracing across stages
  • Prompt management → Hardcoded strings break under versioning and audit requirements
  • Multi-model orchestration → Different workflows need different providers, sometimes with failover
  • Document processing → Async OCR, embedding, and vectorization pipelines must be scalable

These aren't solved by API calls alone. They demand orchestration, tenant isolation, resilience, and production observability.

Why Amplify Gen 2 Was the Foundation

Amplify Gen 2 provided a secure, multi-tenant baseline:

  • Authentication & isolation → Cognito + Amplify Data models for user/tenant separation
  • Data & storage → DynamoDB, S3, and GraphQL APIs automatically wired in
  • Serverless compute → Amplify Functions (Lambdas) integrated seamlessly with APIs and storage

With those pieces in place, we used CDK to extend Amplify into an AI-first platform.

Extending Amplify with CDK for AI Workflows

1. Workflow Runner (Graph Engine)

At the core is the workflow runner Lambda — a graph-based execution engine.

Workflows are JSON-defined graphs of nodes (tasks) and edges (flow).

Supported node types (nine core):

  • ModelInvoke → AI inference with prompt engine + budgets
  • VectorSearch → Pinecone RAG queries
  • VectorWrite → embeddings → Pinecone
  • Router → conditional branching (safe DSL)
  • SlotTracker → conversational slot collection
  • ConversationMemory → stateful context
  • IntentClassifier → intent detection
  • Format → transform outputs (JSON, markdown, text)
  • StreamToClient → live streaming with metadata

The runner enforces:

  • Virtual START/END edges auto-added
  • Connectivity validation: no dangling nodes
  • State persistence in DynamoDB
  • Streaming responses back to clients

2. Prompt Engine

Prompts are first-class artifacts:

  • BasePromptVersion → immutable, content-hashed, inline or S3 (KMS encrypted)
  • ActivePromptPointer → mutable, tracks rollback via previousVersionId
  • CAS Update Lambda → atomic updates with DLQ fallback
  • PromptArchiveBucket → 1-year archival, auto-cleanup of incomplete uploads
  • Template interpolation → safe ${...} substitution (dot-notation only)
  • AuditLog → full traceability (who, when, why)

Fallback is never hardcoded: if nothing resolves, the engine falls back to the neutral prompt tied to DEFAULT_MODEL_ID.

3. Model Registry

Models are centralized in modelCapabilities.ts, not scattered in code:

  • Identity → provider (openai, anthropic, amazon, meta)
  • Capabilities → streaming, JSON mode, function calling, multimodal
  • Limits → context windows, reserved tokens, pricing per 1K tokens
  • Reliability → circuit breaker state persisted in DynamoDB
  • Tokenizer config → exact/approx estimators, provider-specific fallbacks

Workflows select models by ID; normalization strips Bedrock prefixes (us.). Unknown IDs fail fast — no silent fallback.

4. Document Processing Pipeline

Docs aren't static uploads — they're transformed into vectors for retrieval:

  1. Upload → GraphQL API → S3 + SQS enqueue
  2. OCR → Amazon Textract async jobs
  3. Chunking & embeddings → Titan v2 embeddings
  4. Vector storage → Pinecone (per-tenant namespaces)
  5. Progress tracking → DynamoDB status updates
  6. Resilience → DLQs for every stage (DocumentProcessingDLQ, TextractProcessingDLQ)

Collections enforce ACLs and cascade deletions across S3, DynamoDB, and Pinecone.

5. Reliability Enhancements

  • Circuit breakers → block unstable models
  • Token budgets → enforce per-call limits, prune memory safely
  • Fallback prompts → neutral base ensures continuity
  • DLQs → safety nets at every async stage
  • Secrets Manager → OpenAI + Pinecone keys securely injected

6. Observability

  • AWS Powertools → structured logging, metrics, tracing
  • CloudWatch → per-model + per-workflow dashboards
  • X-Ray → request path from API → Lambda → Bedrock/Pinecone
  • Correlation IDs → threaded across all stages

Every prompt update, circuit breaker trip/reset, and document pipeline failure is fully auditable.

AI Reliability Patterns

Three key safety nets in production:

  • Circuit breakers → trip on error rate thresholds, persist state in DynamoDB, fail-open if checks fail
  • Token budgets → pre-enforce input/output caps, drop oldest memory first, preserve system/user priority
  • Fallback prompts → resolution never fails, always returns neutral prompt bound to DEFAULT_MODEL_ID

From Prototype to Production

Amplify Gen 2 gave us the baseline (auth, APIs, storage). CDK gave us the extensions (workflow runner, prompt engine, registry, pipelines, breakers).

Example use cases:

  1. Conversational Assistant with Memory

    • ModelInvoke → ConversationMemory → SlotTracker → Format → StreamToClient
  2. Document-Aware RAG Workflow

    • IntentClassifier → VectorSearch → ModelInvoke → Router → StreamToClient
  3. Multi-Step Orchestration

    • Router → SlotTracker → VectorSearch → ModelInvoke → Format → StreamToClient

Lessons Learned

  1. Amplify Gen 2 is a launchpad — CDK makes it production-ready
  2. Reliability patterns are mandatory — budgets, breakers, DLQs from day one
  3. Observability is non-negotiable — Powertools + X-Ray make debugging tractable
  4. Tenant isolation matters — enforced across S3, Pinecone, and WorkflowAccess

The Path Forward

The combination of Amplify Gen 2 + CDK produced a multi-tenant AI-first platform with:

  • Workflow orchestration
  • Prompt versioning + archival
  • Multi-provider model registry
  • Document intelligence pipeline
  • Resilience + observability baked in

Amplify isn't just a frontend tool anymore — with CDK, it's a serious foundation for AI-first SaaS platforms, scaling smoothly from prototype to enterprise deployment.

Have questions or want to collaborate?

We'd love to hear from you about this technical approach or discuss how it might apply to your project.

Get in touch

Ready to start?

Tell us about your workflow needs and we'll set up a quick fit check.