Why Prompt Infrastructure Matters
Most AI apps begin with prompts embedded as hardcoded strings. That doesn't scale:
- No version control or rollback
- Code and prompt logic intertwined, slowing safe deployments
- No compliance audit trail
- Ops bottlenecks — non-engineering teams (legal, marketing, product) can't iterate independently
Our platform treats prompts as first-class infrastructure, managed with the same rigor as code.
Hierarchical Prompt Resolution
Prompts resolve automatically via a strict hierarchy:
- Tenant + Workflow + Model (most specific)
- Workflow + Model
- Tenant + Model
- Model (global)
- Emergency fallback (guaranteed, never fails)
Example:
const resolvedPrompt = await pointerResolver.resolveActivePrompt(
"customer-support", // workflowId
"us.anthropic.claude-3-7-sonnet-20250219-v1:0", // modelId (normalized internally)
"client-123" // tenantId
);
- Normalization: Bedrock-style prefixes (e.g.
us.
) are stripped, sous.anthropic.claude...
→anthropic.claude...
- Fallback: If no pointer resolves, the system returns the neutral base prompt tied to the
DEFAULT_MODEL_ID
This ensures continuity even if pointers are missing or misconfigured.
Versioning and Rollbacks
Prompts follow Git-like semantics:
- BasePromptVersion: immutable, content-hashed (SHA-256)
- ActivePromptPointer: mutable selector for the active version
- CAS Updates: atomic, optimistic concurrency with rollback support
Rollback is one call:
await updatePromptPointer({
pointerId: "production-pointer",
newActiveVersionId: currentPointer.previousVersionId,
expectedPointerVersion: currentPointer.version,
reason: "rollback-due-to-errors"
});
- CAS conflicts → routed to
CASFailureDLQ
prompt-cas-update
Lambda enforces referential integrity and audit logging
Template Variables and Context Injection
Prompts adapt dynamically using ${...}
placeholders:
Runtime state
${inputs.user_prompt}
→ caller input${workflowId}
,${conversationId}
${nodeId.output.text}
→ prior node output
Slot tracking
${slotTrackerNode.userName}
Session context
${context.sessionId}
Rules:
- Dot-notation only
- No loops, arrays, or conditionals
- Missing values resolve as empty strings, with warnings logged
This keeps prompts safe and predictable.
Model-Aware Optimization
The Model Registry (amplify/functions/workflow-runner/src/modelCapabilities.ts
) defines per-model constraints:
- Identity: id, displayName, provider
- Capabilities: e.g.,
supportsJSONMode
,supportsFunctionCalling
,systemPromptRequired
- Limits:
contextWindow
,reservedOutputTokens
- Pricing:
inputCostPerUnit
,outputCostPerUnit
, unit - Tokenizer: exact (e.g., OpenAI tiktoken) or estimated fallback
This allows model-specific prompt variants without changing workflow logic.
Token Budgeting and Truncation
Before invocation, prompts/messages pass through the TokenBudget system:
const contextWindow = getModelContextWindow(modelId);
const bufferPercent = 0.10;
const maxAllowed = contextWindow * (1 - bufferPercent);
const truncated = truncateToFitContextWindow(messages, maxAllowed);
Strategy:
- Preserve system + current user messages
- Truncate oldest conversation memory first
- Truncate system only as last resort
Backed by provider tokenizers (exact where available).
Secure Storage, Archiving, and Encryption
Storage tiers:
- Small prompts → inline DynamoDB
- Large prompts → S3 with KMS
Archiving:
- PromptArchiveBucket (CMK:
PromptContentKey
) - Lifecycle rules: 1-year retention; incomplete uploads deleted after 1 day
- Env vars (
PROMPT_ARCHIVE
,PROMPT_ARCHIVE_MAX_LINES
, etc.) enforce logging/redaction - All ops audit-logged in
AuditLog
(DynamoDB)
Secrets: Provider keys (OpenAI, Pinecone) pulled from AWS Secrets Manager. No hardcoding.
Caching and Performance
PointerResolver cache:
- 200 entries, 5-minute TTL
- Invalidation by
(tenantId, workflowId, modelId)
- No EventBridge — handled via DLQ + structured logs
- Cache metrics (hit/miss, age, latency) emitted with AWS Powertools
Operational Safety & Monitoring
- DLQs: CAS failures, Textract jobs, Document Processing
- Structured logging & tracing: AWS Powertools (Logger, Metrics, Tracer)
- AuditLog: who changed what, when, why
- CircuitBreaker:
ModelCircuitBreaker
table protects against unhealthy models - Compliance: Immutable audit trail supports legal/reg review
Deployment Patterns
- Immutable version creation (hash verified)
- Canary rollout via tenant-scoped overrides
- CAS-protected deployment with rollback
- Tenant isolation: failures affect only scoped tenant, not global
- Monitoring + rollback tied to CAS + DLQs
Benefits Beyond Prototyping
This system elevates prompt management into production infrastructure:
- CI/CD: prompts tested against live models in staging
- A/B testing: tenant overrides without global impact
- Customer customization: tenant-specific variants, no code forks
- Model migration: swap providers by adding prompt variants
- Compliance auditing: immutable, queryable logs
Prompts are no longer brittle strings — they're versioned, encrypted, monitored, and resilient infrastructure components.