Prompt Management at Scale: Building Reliable AI Systems with Amplify Gen 2 and CDK

The Problem with Production Prompts

When you move from AI experiments to production, prompts stop being strings and start being infrastructure. They carry cost, reliability, compliance, and UX implications. Scattering them across code or spreadsheets works for demos; it fails in production.

We built a Prompt Engine that treats prompts as first-class artifacts: versioned, secure, auditable, auto-resolved, and protected by operational safeguards. It runs on AWS Amplify Gen 2, extended with CDK to add the pieces production needs.

Architecture at a Glance

Flow

BasePromptVersion (immutable, content-hashed)
→ ActivePromptPointer (selects active version; tracks previousVersionId)
→ PointerResolver (precedence + LRU cache)
→ VariableInterpolator (dot-notation ${...} only)
→ MessageFormatter (system/user messages for the model client)

Key Properties

BasePromptVersion: Immutable content; SHA-256 integrity; stored inline (DynamoDB) or in S3 with KMS based on size
ActivePromptPointer: Mutable selector with rollback (previousVersionId)
PointerResolver: Strict precedence + LRU cache (200 entries, 5-minute TTL); selective invalidation by (tenantId, workflowId, modelId)
VariableInterpolator: Safe substitution; dot-notation only
MessageFormatter: Creates consistent system/user messages; integrates with the workflow runner's model client

Storage details: Large content goes to S3 using content-hash keys (prompt-content/<sha256>), KMS encryption, and lifecycle rules. Small content is stored inline in DynamoDB.

Prompt Resolution (Never Fails)

Precedence (high → low)

Tenant + Workflow + Model
Workflow + Model
Tenant + Model
Model (global)
Neutral fallback (guaranteed)

Usage

Model ID is normalized (e.g., US Bedrock prefix stripped):

const resolved = await pointerResolver.resolveActivePrompt(
  "customer-support",                                // workflowId (optional)
  "us.anthropic.claude-3-7-sonnet-20250219-v1:0",    // modelId (required)
  "enterprise-client-123"                            // tenantId (optional)
);

If nothing matches, the engine returns the neutral base prompt associated with DEFAULT_MODEL_ID. Resolution never leaves a workflow without a prompt.

Caching

200-entry LRU, 5-minute TTL
Selective invalidation by (tenantId, workflowId, modelId)
Observability: size, age, and hit/miss are logged via Powertools
No EventBridge invalidation in current stack; TTL/size eviction + logs/metrics are the consistency mechanism

Safe Deployment & Rollback (CAS)

A dedicated CAS Update Lambda manages pointer updates:

Verifies target BasePromptVersion exists
Uses optimistic concurrency (expectedPointerVersion)
Preserves previousVersionId for instant rollback
On failure, emits structured errors and pushes to a CAS failure DLQ

Deploy

await updatePromptPointer({
  pointerId: "support-pointer",
  newActiveVersionId: "prompt-v3",
  expectedPointerVersion: currentPointer.version,
  updatedBy: "engineer@company.com",
  reason: "deployment",
});

Rollback

await updatePromptPointer({
  pointerId: "support-pointer",
  newActiveVersionId: currentPointer.previousVersionId,
  expectedPointerVersion: currentPointer.version,
  reason: "rollback",
});

All changes are audited (who/when/why).

Template Variables (Dot-Notation Only)

Templates use ${...} with dot-notation (no arrays, loops, or conditionals). Missing variables render as empty strings and log a warning.

Common inputs:

Runtime: ${inputs.user_prompt}, ${workflowId}, ${conversationId}
Prior node output: ${nodeId.output.text}
Slots: ${slotTrackerNode.userName}
Session: ${context.sessionId}

Keep templates simple; put logic in nodes, not prompts.

Token Budgeting & Truncation

Before model calls, the engine enforces budgeting and fits content to the model's context window:

const contextWindow = getModelContextWindow(modelId);
const bufferPercent = 0.10; // safety headroom
const maxAllowed = contextWindow * (1 - bufferPercent);

const truncated = truncateToFitContextWindow(messages, maxAllowed);

Strategy

Preserve system + current user messages
Drop oldest memory first
Truncate system as a last resort

Token estimation uses provider-specific tokenizers when available (e.g., OpenAI tiktoken); otherwise, it falls back to registry estimation settings.

Storage, Security & Audit

Inline DynamoDB for small content; S3 + KMS for large
PromptArchiveBucket: Archival with lifecycle (1-year retention; cleanup of incomplete uploads)
AuditLog: Every create/update/rollback recorded
Secrets Manager: Provider keys (OpenAI, Pinecone) are read securely (never hardcoded)

Integration with Workflows

ModelInvoke nodes resolve prompts automatically—no promptId in node configs:

{
  "id": "ai_step",
  "type": "ModelInvoke",
  "config": {
    "modelId": "anthropic.claude-3-7-sonnet-20250219-v1:0",
    "temperature": 0.7,
    "maxTokens": 1000
  }
}

Execution Path

Resolve model (registry; Bedrock prefix normalized)
Build context (tenant/workflow/conversation/state)
Resolve prompt per precedence (fallback guaranteed)
Interpolate ${...} variables
Format messages for the model client
Enforce TokenBudget

Operations, Observability & Reliability

Powertools (Logger, Metrics, Tracer) across Lambdas
DLQs for CAS and document/textract pipelines
Circuit breaker integration (separate component) protects model calls; breaker state persists in DynamoDB
Neutral fallback ensures a response even if resolution fails

Monitor

Resolution latency and cache hit/miss
CAS attempts/conflicts/DLQ growth
Interpolation warnings
Version usage analytics

Why This Matters

Treating prompts as infrastructure unlocks:

Auditability & security (KMS, IAM, Secrets Manager)
Safe deployments & rollbacks (CAS + DLQ)
Consistent runtime behavior (auto-resolution + budgeting)
Seamless workflow integration (no hardcoded strings)

Amplify Gen 2 provides identity, APIs, storage, and functions. CDK layers in versioning, CAS, secure content, caching, observability, and tenant isolation. The result is a production-grade prompt engine—resilient, secure, and deeply integrated with your workflow system.