Prompt Management at Scale: Building Reliable AI Systems with Amplify Gen 2 and CDK

Prompt Management at Scale: Building Reliable AI Systems with Amplify Gen 2 and CDK

AWSAmplifyCDKAIInfrastructureDynamoDBLambda

The Problem with Production Prompts

When you move from AI experiments to production, prompts stop being strings and start being infrastructure. They carry cost, reliability, compliance, and UX implications. Scattering them across code or spreadsheets works for demos; it fails in production.

We built a Prompt Engine that treats prompts as first-class artifacts: versioned, secure, auditable, auto-resolved, and protected by operational safeguards. It runs on AWS Amplify Gen 2, extended with CDK to add the pieces production needs.

Architecture at a Glance

Flow

BasePromptVersion (immutable, content-hashed)
→ ActivePromptPointer (selects active version; tracks previousVersionId)
→ PointerResolver (precedence + LRU cache)
→ VariableInterpolator (dot-notation ${...} only)
→ MessageFormatter (system/user messages for the model client)

Key Properties

  • BasePromptVersion: Immutable content; SHA-256 integrity; stored inline (DynamoDB) or in S3 with KMS based on size
  • ActivePromptPointer: Mutable selector with rollback (previousVersionId)
  • PointerResolver: Strict precedence + LRU cache (200 entries, 5-minute TTL); selective invalidation by (tenantId, workflowId, modelId)
  • VariableInterpolator: Safe substitution; dot-notation only
  • MessageFormatter: Creates consistent system/user messages; integrates with the workflow runner's model client

Storage details: Large content goes to S3 using content-hash keys (prompt-content/<sha256>), KMS encryption, and lifecycle rules. Small content is stored inline in DynamoDB.

Prompt Resolution (Never Fails)

Precedence (high → low)

  1. Tenant + Workflow + Model
  2. Workflow + Model
  3. Tenant + Model
  4. Model (global)
  5. Neutral fallback (guaranteed)

Usage

Model ID is normalized (e.g., US Bedrock prefix stripped):

const resolved = await pointerResolver.resolveActivePrompt(
  "customer-support",                                // workflowId (optional)
  "us.anthropic.claude-3-7-sonnet-20250219-v1:0",    // modelId (required)
  "enterprise-client-123"                            // tenantId (optional)
);

If nothing matches, the engine returns the neutral base prompt associated with DEFAULT_MODEL_ID. Resolution never leaves a workflow without a prompt.

Caching

  • 200-entry LRU, 5-minute TTL
  • Selective invalidation by (tenantId, workflowId, modelId)
  • Observability: size, age, and hit/miss are logged via Powertools
  • No EventBridge invalidation in current stack; TTL/size eviction + logs/metrics are the consistency mechanism

Safe Deployment & Rollback (CAS)

A dedicated CAS Update Lambda manages pointer updates:

  • Verifies target BasePromptVersion exists
  • Uses optimistic concurrency (expectedPointerVersion)
  • Preserves previousVersionId for instant rollback
  • On failure, emits structured errors and pushes to a CAS failure DLQ

Deploy

await updatePromptPointer({
  pointerId: "support-pointer",
  newActiveVersionId: "prompt-v3",
  expectedPointerVersion: currentPointer.version,
  updatedBy: "engineer@company.com",
  reason: "deployment",
});

Rollback

await updatePromptPointer({
  pointerId: "support-pointer",
  newActiveVersionId: currentPointer.previousVersionId,
  expectedPointerVersion: currentPointer.version,
  reason: "rollback",
});

All changes are audited (who/when/why).

Template Variables (Dot-Notation Only)

Templates use ${...} with dot-notation (no arrays, loops, or conditionals). Missing variables render as empty strings and log a warning.

Common inputs:

  • Runtime: ${inputs.user_prompt}, ${workflowId}, ${conversationId}
  • Prior node output: ${nodeId.output.text}
  • Slots: ${slotTrackerNode.userName}
  • Session: ${context.sessionId}

Keep templates simple; put logic in nodes, not prompts.

Token Budgeting & Truncation

Before model calls, the engine enforces budgeting and fits content to the model's context window:

const contextWindow = getModelContextWindow(modelId);
const bufferPercent = 0.10; // safety headroom
const maxAllowed = contextWindow * (1 - bufferPercent);

const truncated = truncateToFitContextWindow(messages, maxAllowed);

Strategy

  1. Preserve system + current user messages
  2. Drop oldest memory first
  3. Truncate system as a last resort

Token estimation uses provider-specific tokenizers when available (e.g., OpenAI tiktoken); otherwise, it falls back to registry estimation settings.

Storage, Security & Audit

  • Inline DynamoDB for small content; S3 + KMS for large
  • PromptArchiveBucket: Archival with lifecycle (1-year retention; cleanup of incomplete uploads)
  • AuditLog: Every create/update/rollback recorded
  • Secrets Manager: Provider keys (OpenAI, Pinecone) are read securely (never hardcoded)

Integration with Workflows

ModelInvoke nodes resolve prompts automatically—no promptId in node configs:

{
  "id": "ai_step",
  "type": "ModelInvoke",
  "config": {
    "modelId": "anthropic.claude-3-7-sonnet-20250219-v1:0",
    "temperature": 0.7,
    "maxTokens": 1000
  }
}

Execution Path

  1. Resolve model (registry; Bedrock prefix normalized)
  2. Build context (tenant/workflow/conversation/state)
  3. Resolve prompt per precedence (fallback guaranteed)
  4. Interpolate ${...} variables
  5. Format messages for the model client
  6. Enforce TokenBudget

Operations, Observability & Reliability

  • Powertools (Logger, Metrics, Tracer) across Lambdas
  • DLQs for CAS and document/textract pipelines
  • Circuit breaker integration (separate component) protects model calls; breaker state persists in DynamoDB
  • Neutral fallback ensures a response even if resolution fails

Monitor

  • Resolution latency and cache hit/miss
  • CAS attempts/conflicts/DLQ growth
  • Interpolation warnings
  • Version usage analytics

Why This Matters

Treating prompts as infrastructure unlocks:

  • Auditability & security (KMS, IAM, Secrets Manager)
  • Safe deployments & rollbacks (CAS + DLQ)
  • Consistent runtime behavior (auto-resolution + budgeting)
  • Seamless workflow integration (no hardcoded strings)

Amplify Gen 2 provides identity, APIs, storage, and functions. CDK layers in versioning, CAS, secure content, caching, observability, and tenant isolation. The result is a production-grade prompt engine—resilient, secure, and deeply integrated with your workflow system.

Have questions or want to collaborate?

We'd love to hear from you about this technical approach or discuss how it might apply to your project.

Get in touch

Ready to start?

Tell us about your workflow needs and we'll set up a quick fit check.