The ISTQB CT-AI syllabus explains in Section3.2 – Functional Performance Criteria of ML Modelsthat accuracy becomes unreliable whenclass imbalanceexists. In fraud detection, more than 99% of transactions are non-fraudulent, meaning the dataset is extremely imbalanced. Because accuracy counts all correct non-fraudulent classifications, it will appear artificially high, even if the fraud detection performance is poor. Therefore, accuracy is not suitable for evaluating fraud detection systems.
The syllabus further explains thatsensitivity (recall)captures the proportion of correctly identified fraudulent cases. This metric is important, as missing fraudulent events can cause high financial loss. However, the client also stresses thatlegitimate transactions must be correctly identified, meaningfalse positives must be minimizedto maintain customer satisfaction.
TheF1 score, defined as the harmonic mean ofprecision and recall, balances both:
Precision protects legitimate customers by minimizing false alarms.
Recall ensures fraudulent transactions are detected.
Section 3.2 emphasizes that when both false positives and false negatives have significant consequences, and the data is highly imbalanced,F1 is the most appropriate metricbecause it reflects the combined importance of detecting fraud while avoiding unnecessary alerts. Thus,Option Cis the correct choice.