According to the AgentForce Monitoring and Evaluation Framework, the three key dimensions for measuring AI agent quality are performance, correctness, and user satisfaction. To accurately monitor these, organizations should track:
Response times (to assess system and model latency),
Accuracy and relevance of answers (to measure the grounding and reasoning quality), and
Resolution success (to confirm task completion or problem-solving effectiveness).
These metrics provide a balanced evaluation of both technical efficiency and user experience.
Option A focuses on system usage metrics like tokens and duration, which are operational but do not assess correctness or success. Option B includes tone and CSATs, which are helpful but incomplete, as they do not measure factual accuracy or task resolution.
Thus, the correct answer is Option C – Response times, accuracy and relevance of answers, and resolution success, aligning with AgentForce’s standard evaluation practices.
[Reference: AgentForce Monitoring Guide — “Measuring Agent Performance and Quality Metrics.”, , , , , ]