Skip to main content
QATraining
Back to curriculum
Chapter 2 of 10

AI Quality Characteristics, Risk, and Acceptance Criteria

Teach learners to turn AI risk, trustworthiness, and quality characteristics into measurable release criteria.

45 min guide5 reference questions folded into the guide material
Guided briefing

AI Quality Characteristics, Risk, and Acceptance Criteria video briefing

A focused explanation of chapter 2, turning the AI testing theory into concrete validation checks.

Briefing focus

Module opening

This is a structured lesson briefing. Real video/audio can be added later as a media source.

Estimated time

9 min

  1. 1Module opening
  2. 2Learning objectives
  3. 3Mind map
  4. 4Scenario evidence breakdown

Transcript brief

Teach learners to turn AI risk, trustworthiness, and quality characteristics into measurable release criteria. The briefing explains why the topic matters, walks through a failure scenario, and identifies the artefacts a tester should produce for evidence and auditability.

Key takeaways

  • Connect the AI risk to a measurable test or monitor.
  • Document the evidence needed for reproducibility and audit.
  • Use the lab or scenario to practise the validation workflow.

Module opening

Teach learners to turn AI risk, trustworthiness, and quality characteristics into measurable release criteria.

Audience. QA professionals who need to challenge AI acceptance criteria and make quality measurable.

Why this matters. AI quality is multi-dimensional. A model can be accurate yet unfair, fast yet unsafe, useful yet impossible to explain, or impressive in demos but fragile in production.

ISTQB CT-AI mapping. CT-AI 2.1-2.8, 8.1-8.4

Trainer note

Start with the scenario before the theory. Ask learners what evidence would make them confident, then use the module to build that evidence step by step.

Learning path

  1. Start Here

    5 min

    Outcome, CT-AI exam relevance, and the hiring-screener scenario.

  2. Learn

    20 min

    Quality characteristics, risk, acceptance criteria, stakeholder evidence, and release gates.

  3. See It

    8 min

    Scenario evidence breakdown for subgroup false rejection risk.

  4. Try It

    14 min

    Build a risk-to-acceptance matrix for a realistic AI release decision.

  5. Recall and Apply

    10 min

    Exam traps, active recall, and the portfolio artifact.

Learning objectives

  • Explain the core quality risk in ai quality characteristics, risk, and acceptance criteria.
  • Select practical test evidence that supports an AI release decision.
  • Apply the module concepts to a realistic QA scenario.
  • Produce a portfolio artifact that can be reused in a professional AI testing context.

Mind map

AI Quality Characteristics, Risk, and Acceptance Criteria mind map

Real-life scenario · Recruitment technology

The hiring screener with a great average score

Situation. A ranking model prioritises CVs for recruiter review. Overall accuracy improved, but false rejection was higher for career changers and candidates with employment gaps.

Lesson. AI testing is strongest when risks, examples, evidence, and release decisions are connected.

Scenario evidence breakdown

Scenario elementDetail
Product/SystemCandidate screening platform
AI featureA ranking model prioritises CVs for recruiter review.
Failure or riskOverall accuracy improved, but false rejection was higher for career changers and candidates with employment gaps.
Testing challengeNo one had defined acceptable subgroup performance, appeal process, explanation need, or audit evidence.
Tester responseThe tester built a risk register linking harm, affected users, quality characteristic, metric, threshold, mitigation, and owner.
Evidence requiredRisk register, acceptance criteria matrix, subgroup metric report, review workflow, and release sign-off record.
Business decisionBlock release until high-impact false rejection criteria and human review controls are agreed.

Visual flow

AI Quality Characteristics, Risk, and Acceptance Criteria scenario flow

Topic-by-topic teaching guide

1. AI Quality Characteristics

AI quality includes functional performance, robustness, fairness, transparency, safety, security, privacy, usability, and maintainability.

Teaching lensPractical detail
Real QA exampleA loan model needs accuracy, but also fair treatment, explanation, monitoring, and secure access to decision data.
What can go wrongOptimising only one characteristic and accidentally weakening another.
How a tester should thinkDiscuss quality as trade-offs tied to product risk.
Evidence to collectQuality characteristic matrix and stakeholder priorities.

Quality characteristics as trade-offs

AI quality is not one number. A system can improve aggregate performance while weakening fairness, explainability, robustness, privacy, or user control.

Example

A loan model has higher overall accuracy after retraining, but a protected customer segment now receives fewer positive decisions and weaker explanations.

Mistake

Treating accuracy as the only acceptance signal.

Evidence

Quality characteristic matrix, stakeholder priorities, slice metrics, explanation checks, and agreed trade-offs.

2. Risk-Based Testing

Risk combines likelihood, impact, detectability, and the organisation's tolerance for harm.

Teaching lensPractical detail
Real QA exampleA wrong music recommendation is low impact; a wrong medical triage decision can be severe.
What can go wrongSpending equal effort on low-risk and high-risk behaviours.
How a tester should thinkRank tests by business decision and possible harm.
Evidence to collectAI risk register with severity, controls, and evidence.

Risk-based release criteria

Risk-based AI testing starts from possible harm, then decides which evidence is required for launch, shadow mode, limited rollout, rollback, or rejection.

Example

The hiring screener's average score improved, but false rejection increased for career changers and candidates with employment gaps.

Mistake

Giving every behaviour equal test effort or approving a high-impact release from an average metric alone.

Evidence

AI risk register with affected groups, severity, likelihood, detectability, mitigation, owner, and release action.

3. Acceptance Criteria

Good AI acceptance criteria are measurable and reviewable. They name metric, slice, data source, threshold, and release action.

Teaching lensPractical detail
Real QA exampleUrgent-ticket recall must be at least 95% on the latest labelled validation set and no lower than 92% for any priority customer segment.
What can go wrongWriting vague criteria such as 'the model should be fair and accurate'.
How a tester should thinkTurn values into observable checks and gates.
Evidence to collectAcceptance matrix and metric definitions.

Measurable acceptance criteria

A useful AI acceptance criterion names the metric, population slice, dataset or traffic source, threshold, time window, and release action.

Example

Career-changer false rejection must not exceed the agreed threshold on the latest labelled validation set before the model can influence recruiter queues.

Mistake

Writing vague criteria such as "the model should be fair and accurate."

Evidence

Acceptance matrix, metric definitions, validation dataset version, threshold rationale, and sign-off owner.

4. Stakeholder Involvement

AI release criteria need product, data science, QA, operations, legal, security, and affected user representation where appropriate.

Teaching lensPractical detail
Real QA exampleSecurity may care about prompt injection while support managers care about escalation failures.
What can go wrongLetting one team define trustworthiness alone.
How a tester should thinkFacilitate evidence conversations across roles.
Evidence to collectRACI, approval workflow, and documented trade-offs.

Stakeholder-owned evidence

AI quality decisions cross team boundaries, so evidence needs named owners and visible trade-offs rather than informal confidence.

Example

Product accepts some false positives, legal needs bias evidence, support needs escalation controls, and security needs misuse checks.

Mistake

Letting one team define trustworthiness alone.

Evidence

RACI, approval workflow, documented trade-offs, reviewer notes, and sign-off record.

5. Release Gates

Release gates decide what evidence is required before launch, shadow mode, limited rollout, or rollback.

Teaching lensPractical detail
Real QA exampleA model can enter shadow mode with incomplete confidence but cannot affect customers until guardrails pass.
What can go wrongTreating a gate as paperwork after the release decision is already made.
How a tester should thinkMake gates explicit before testing starts.
Evidence to collectGate checklist, owner sign-off, and rollback trigger.

Release gates

Release gates turn risk appetite into practical decisions: block, approve for shadow mode, approve for limited rollout, or release with monitoring.

Example

The hiring screener can enter shadow mode while reviewers compare ranked and unranked queues, but it cannot affect candidate outcomes until subgroup criteria pass.

Mistake

Treating a gate as paperwork after the release decision is already made.

Evidence

Gate checklist, owner sign-off, monitoring condition, rollback trigger, and post-release review date.

Practical QA workflow

  • Start from the user or business decision affected by the AI system.
  • Name the AI asset under test: data, feature pipeline, model, prompt, retrieval index, tool, or full workflow.
  • Convert the main risk into observable quality signals and release gates.
  • Choose the right oracle: deterministic assertion, metric threshold, metamorphic relation, reviewer rubric, comparison, or production monitor.
  • Test important slices, edge cases, misuse cases, and change scenarios.
  • Record versions, data sources, thresholds, reviewer notes, and decision rationale.

Test design checklist

  • What harm could happen if this AI behaviour is wrong?
  • Which users, groups, products, regions, or workflows need separate evidence?
  • Which metric or observation would reveal the failure early?
  • What is the minimum evidence needed for release, shadow mode, rollback, or rejection?
  • Who owns the evidence after the model, prompt, or data changes?

Worked QA example

A tester receives a release request for the module scenario. Instead of asking only whether tests pass, the tester writes three release questions: what changed, who could be harmed, and what evidence proves the change is controlled. The answer becomes a small evidence pack: one risk table, one set of representative examples, one automated or reviewable check, and one release recommendation.

Worked example: Blocking an accuracy-only release

Scenario. The hiring screener's retrained ranking model shows a stronger average validation score, and the product owner asks to release because the headline metric is above target.

Reasoning. Average validation score does not prove acceptable quality for high-impact slices. The tester checks false rejection by candidate segment, explanation quality, appeal workflow, reviewer override, audit logging, and rollback readiness.

Model answer. Block production influence until subgroup false rejection, explanation coverage, human review controls, and audit evidence meet the agreed acceptance matrix. Allow shadow mode only if candidates are not affected.

Common mistakes

  • Treating AI output as a normal deterministic response when the real risk is behavioural.
  • Reporting one impressive metric without slices, uncertainty, or business context.
  • Forgetting that data, prompts, model versions, and monitoring are part of the test surface.
  • Writing governance language that cannot be checked by a tester.

Guided exercise

Use the scenario above and create a one-page evidence plan. Include the decision being influenced, the main risk, the test oracle, the data or examples required, the release gate, and the owner.

Try it: Build the risk-to-acceptance matrix

Prompt. Use the hiring-screener scenario to define release criteria that would be acceptable to product, QA, data science, legal, and operations.

Learner action. Map each risk to a quality characteristic, metric or review signal, population slice, threshold, evidence source, owner, and release action.

Expected output. `ai-risk-acceptance-matrix.md` with at least five risks, measurable acceptance criteria, owners, and a release recommendation.

Discussion prompt

Which AI quality characteristic is easiest for your team to measure today, and which is most likely to be ignored?

Exam trap

Objective

CT-AI 2.1-2.8

Common trap

Choosing the answer with the best average performance metric while ignoring impact, slices, explainability, monitoring, or stakeholder acceptance.

Wording clue

Look for answers that turn quality attributes into measurable criteria and decision gates.

Portfolio checkpoint

Create the module portfolio deliverable and use it to support your release decision.

Artifact structure

ai-risk-acceptance-matrix.md

ContextRiskQuality characteristicMetric or review signalSliceThresholdOwnerRelease actionOpen questions

Hands-on lab mapping

  • Lab: CourseMaterials/sandboxes/introduction/
  • Task: Create a risk-to-acceptance matrix for an AI feature and identify missing release evidence.
  • Why this lab matters: it turns the module theory into visible evidence that a release approver can inspect.

Decision simulation

A stakeholder wants to launch because accuracy is above target, but fairness and explanation evidence is missing. Decide the release action and conditions.

Key terms

  • Trustworthiness: The degree to which stakeholders can rely on the AI system for its intended context.
  • Acceptance criterion: A measurable condition that must be satisfied for release.
  • Quality characteristic: A dimension of quality such as fairness, robustness, privacy, or transparency.
  • Release gate: A decision point that requires specific evidence before deployment.

Revision prompts

  • Explain the module scenario in two minutes to a product owner.
  • Name three pieces of evidence you would require before release.
  • Identify one automated check and one human-review check.
  • Describe how this topic changes after deployment.

Recall check

Why is aggregate accuracy insufficient for the hiring screener?
It can hide high-impact subgroup failures such as false rejection for career changers or candidates with employment gaps.
What makes an AI acceptance criterion testable?
It names a metric or review signal, slice, data source, threshold, release action, and owner.
When should a release be blocked rather than monitored?
When high-impact acceptance criteria, human controls, audit evidence, or rollback readiness are missing.
What portfolio artifact does this module produce?
ai-risk-acceptance-matrix.md, a risk-to-evidence matrix for AI release decisions.