The Problem with Production Prompts
When you move from AI experiments to production, prompts stop being strings and start being infrastructure. They carry cost, reliability, compliance, and UX implications. Scattering them across code or spreadsheets works for demos; it fails in production.
We built a Prompt Engine that treats prompts as first-class artifacts: versioned, secure, auditable, auto-resolved, and protected by operational safeguards. It runs on AWS Amplify Gen 2, extended with CDK to add the pieces production needs.
Architecture at a Glance
Flow
BasePromptVersion (immutable, content-hashed)
→ ActivePromptPointer (selects active version; tracks previousVersionId)
→ PointerResolver (precedence + LRU cache)
→ VariableInterpolator (dot-notation ${...} only)
→ MessageFormatter (system/user messages for the model client)
Key Properties
- BasePromptVersion: Immutable content; SHA-256 integrity; stored inline (DynamoDB) or in S3 with KMS based on size
- ActivePromptPointer: Mutable selector with rollback (previousVersionId)
- PointerResolver: Strict precedence + LRU cache (200 entries, 5-minute TTL); selective invalidation by (tenantId, workflowId, modelId)
- VariableInterpolator: Safe substitution; dot-notation only
- MessageFormatter: Creates consistent system/user messages; integrates with the workflow runner's model client
Storage details: Large content goes to S3 using content-hash keys (prompt-content/<sha256>
), KMS encryption, and lifecycle rules. Small content is stored inline in DynamoDB.
Prompt Resolution (Never Fails)
Precedence (high → low)
- Tenant + Workflow + Model
- Workflow + Model
- Tenant + Model
- Model (global)
- Neutral fallback (guaranteed)
Usage
Model ID is normalized (e.g., US Bedrock prefix stripped):
const resolved = await pointerResolver.resolveActivePrompt(
"customer-support", // workflowId (optional)
"us.anthropic.claude-3-7-sonnet-20250219-v1:0", // modelId (required)
"enterprise-client-123" // tenantId (optional)
);
If nothing matches, the engine returns the neutral base prompt associated with DEFAULT_MODEL_ID
. Resolution never leaves a workflow without a prompt.
Caching
- 200-entry LRU, 5-minute TTL
- Selective invalidation by (tenantId, workflowId, modelId)
- Observability: size, age, and hit/miss are logged via Powertools
- No EventBridge invalidation in current stack; TTL/size eviction + logs/metrics are the consistency mechanism
Safe Deployment & Rollback (CAS)
A dedicated CAS Update Lambda manages pointer updates:
- Verifies target BasePromptVersion exists
- Uses optimistic concurrency (expectedPointerVersion)
- Preserves previousVersionId for instant rollback
- On failure, emits structured errors and pushes to a CAS failure DLQ
Deploy
await updatePromptPointer({
pointerId: "support-pointer",
newActiveVersionId: "prompt-v3",
expectedPointerVersion: currentPointer.version,
updatedBy: "engineer@company.com",
reason: "deployment",
});
Rollback
await updatePromptPointer({
pointerId: "support-pointer",
newActiveVersionId: currentPointer.previousVersionId,
expectedPointerVersion: currentPointer.version,
reason: "rollback",
});
All changes are audited (who/when/why).
Template Variables (Dot-Notation Only)
Templates use ${...}
with dot-notation (no arrays, loops, or conditionals). Missing variables render as empty strings and log a warning.
Common inputs:
- Runtime:
${inputs.user_prompt}
,${workflowId}
,${conversationId}
- Prior node output:
${nodeId.output.text}
- Slots:
${slotTrackerNode.userName}
- Session:
${context.sessionId}
Keep templates simple; put logic in nodes, not prompts.
Token Budgeting & Truncation
Before model calls, the engine enforces budgeting and fits content to the model's context window:
const contextWindow = getModelContextWindow(modelId);
const bufferPercent = 0.10; // safety headroom
const maxAllowed = contextWindow * (1 - bufferPercent);
const truncated = truncateToFitContextWindow(messages, maxAllowed);
Strategy
- Preserve system + current user messages
- Drop oldest memory first
- Truncate system as a last resort
Token estimation uses provider-specific tokenizers when available (e.g., OpenAI tiktoken); otherwise, it falls back to registry estimation settings.
Storage, Security & Audit
- Inline DynamoDB for small content; S3 + KMS for large
- PromptArchiveBucket: Archival with lifecycle (1-year retention; cleanup of incomplete uploads)
- AuditLog: Every create/update/rollback recorded
- Secrets Manager: Provider keys (OpenAI, Pinecone) are read securely (never hardcoded)
Integration with Workflows
ModelInvoke nodes resolve prompts automatically—no promptId in node configs:
{
"id": "ai_step",
"type": "ModelInvoke",
"config": {
"modelId": "anthropic.claude-3-7-sonnet-20250219-v1:0",
"temperature": 0.7,
"maxTokens": 1000
}
}
Execution Path
- Resolve model (registry; Bedrock prefix normalized)
- Build context (tenant/workflow/conversation/state)
- Resolve prompt per precedence (fallback guaranteed)
- Interpolate
${...}
variables - Format messages for the model client
- Enforce TokenBudget
Operations, Observability & Reliability
- Powertools (Logger, Metrics, Tracer) across Lambdas
- DLQs for CAS and document/textract pipelines
- Circuit breaker integration (separate component) protects model calls; breaker state persists in DynamoDB
- Neutral fallback ensures a response even if resolution fails
Monitor
- Resolution latency and cache hit/miss
- CAS attempts/conflicts/DLQ growth
- Interpolation warnings
- Version usage analytics
Why This Matters
Treating prompts as infrastructure unlocks:
- Auditability & security (KMS, IAM, Secrets Manager)
- Safe deployments & rollbacks (CAS + DLQ)
- Consistent runtime behavior (auto-resolution + budgeting)
- Seamless workflow integration (no hardcoded strings)
Amplify Gen 2 provides identity, APIs, storage, and functions. CDK layers in versioning, CAS, secure content, caching, observability, and tenant isolation. The result is a production-grade prompt engine—resilient, secure, and deeply integrated with your workflow system.