Production Prompt Management: Beyond Hardcoded Templates

Why Prompt Infrastructure Matters

Most AI apps begin with prompts embedded as hardcoded strings. That doesn't scale:

No version control or rollback
Code and prompt logic intertwined, slowing safe deployments
No compliance audit trail
Ops bottlenecks — non-engineering teams (legal, marketing, product) can't iterate independently

Our platform treats prompts as first-class infrastructure, managed with the same rigor as code.

Hierarchical Prompt Resolution

Prompts resolve automatically via a strict hierarchy:

Tenant + Workflow + Model (most specific)
Workflow + Model
Tenant + Model
Model (global)
Emergency fallback (guaranteed, never fails)

Example:

const resolvedPrompt = await pointerResolver.resolveActivePrompt(
  "customer-support",                               // workflowId
  "us.anthropic.claude-3-7-sonnet-20250219-v1:0",   // modelId (normalized internally)
  "client-123"                                      // tenantId
);

Normalization: Bedrock-style prefixes (e.g. us.) are stripped, so us.anthropic.claude... → anthropic.claude...
Fallback: If no pointer resolves, the system returns the neutral base prompt tied to the DEFAULT_MODEL_ID

This ensures continuity even if pointers are missing or misconfigured.

Versioning and Rollbacks

Prompts follow Git-like semantics:

BasePromptVersion: immutable, content-hashed (SHA-256)
ActivePromptPointer: mutable selector for the active version
CAS Updates: atomic, optimistic concurrency with rollback support

Rollback is one call:

await updatePromptPointer({
  pointerId: "production-pointer",
  newActiveVersionId: currentPointer.previousVersionId,
  expectedPointerVersion: currentPointer.version,
  reason: "rollback-due-to-errors"
});

CAS conflicts → routed to CASFailureDLQ
prompt-cas-update Lambda enforces referential integrity and audit logging

Template Variables and Context Injection

Prompts adapt dynamically using ${...} placeholders:

Runtime state

${inputs.user_prompt} → caller input
${workflowId}, ${conversationId}
${nodeId.output.text} → prior node output

Slot tracking

${slotTrackerNode.userName}

Session context

${context.sessionId}

Rules:

Dot-notation only
No loops, arrays, or conditionals
Missing values resolve as empty strings, with warnings logged

This keeps prompts safe and predictable.

Model-Aware Optimization

The Model Registry (amplify/functions/workflow-runner/src/modelCapabilities.ts) defines per-model constraints:

Identity: id, displayName, provider
Capabilities: e.g., supportsJSONMode, supportsFunctionCalling, systemPromptRequired
Limits: contextWindow, reservedOutputTokens
Pricing: inputCostPerUnit, outputCostPerUnit, unit
Tokenizer: exact (e.g., OpenAI tiktoken) or estimated fallback

This allows model-specific prompt variants without changing workflow logic.

Token Budgeting and Truncation

Before invocation, prompts/messages pass through the TokenBudget system:

const contextWindow   = getModelContextWindow(modelId);
const bufferPercent   = 0.10;
const maxAllowed      = contextWindow * (1 - bufferPercent);

const truncated = truncateToFitContextWindow(messages, maxAllowed);

Strategy:

Preserve system + current user messages
Truncate oldest conversation memory first
Truncate system only as last resort

Backed by provider tokenizers (exact where available).

Secure Storage, Archiving, and Encryption

Storage tiers:

Small prompts → inline DynamoDB
Large prompts → S3 with KMS

Archiving:

PromptArchiveBucket (CMK: PromptContentKey)
Lifecycle rules: 1-year retention; incomplete uploads deleted after 1 day
Env vars (PROMPT_ARCHIVE, PROMPT_ARCHIVE_MAX_LINES, etc.) enforce logging/redaction
All ops audit-logged in AuditLog (DynamoDB)

Secrets: Provider keys (OpenAI, Pinecone) pulled from AWS Secrets Manager. No hardcoding.

Caching and Performance

PointerResolver cache:

200 entries, 5-minute TTL
Invalidation by (tenantId, workflowId, modelId)
No EventBridge — handled via DLQ + structured logs
Cache metrics (hit/miss, age, latency) emitted with AWS Powertools

Operational Safety & Monitoring

DLQs: CAS failures, Textract jobs, Document Processing
Structured logging & tracing: AWS Powertools (Logger, Metrics, Tracer)
AuditLog: who changed what, when, why
CircuitBreaker: ModelCircuitBreaker table protects against unhealthy models
Compliance: Immutable audit trail supports legal/reg review

Deployment Patterns

Immutable version creation (hash verified)
Canary rollout via tenant-scoped overrides
CAS-protected deployment with rollback
Tenant isolation: failures affect only scoped tenant, not global
Monitoring + rollback tied to CAS + DLQs

Benefits Beyond Prototyping

This system elevates prompt management into production infrastructure:

CI/CD: prompts tested against live models in staging
A/B testing: tenant overrides without global impact
Customer customization: tenant-specific variants, no code forks
Model migration: swap providers by adding prompt variants
Compliance auditing: immutable, queryable logs

Prompts are no longer brittle strings — they're versioned, encrypted, monitored, and resilient infrastructure components.