Option A is the correct solution because it provides proactive, model-aware token management with fine-grained visibility and alerting, which is required for regulated financial workloads. Amazon Bedrock currently exposes token usage metrics after invocation, but it does not natively enforce proactive, model-specific token limits across multiple applications or business units.
By implementing model-specific tokenizers in AWS Lambda, the company can estimate input and output token usage before sending requests to Amazon Bedrock. This enables early detection of requests that are approaching or exceeding model limits and allows the application to block, truncate, or reroute requests proactively rather than reacting to failures.
Publishing token usage metrics to Amazon CloudWatch enables real-time monitoring and alerting at scale, easily supporting more than 5,000 requests per minute. Storing detailed token usage data in Amazon DynamoDB allows the company to attribute usage and costs to specific applications, teams, or business units—an essential requirement for regulatory reporting and internal chargeback.
Option B is incorrect because Amazon Bedrock Guardrails do not currently provide token quota enforcement or proactive token alerts. Option C is reactive and only analyzes failures after they occur. Option D throttles requests but cannot enforce token-based limits or provide per-model cost attribution.
Therefore, Option A best satisfies proactive alerting, scalability, compliance reporting, and cost allocation requirements with acceptable operational effort.