Skip to main content
QATraining
Back to curriculum
Chapter 7 of 10

Explainability, Fairness, Bias, and Responsible AI Evidence

Help QA professionals evaluate explainability and fairness as testable quality concerns, not vague ethical slogans.

45 min guide5 reference questions folded into the guide material
Guided briefing

Explainability, Fairness, Bias, and Responsible AI Evidence video briefing

A focused explanation of chapter 7, turning the AI testing theory into concrete validation checks.

Briefing focus

Module opening

This is a structured lesson briefing. Real video/audio can be added later as a media source.

Estimated time

9 min

  1. 1Module opening
  2. 2Learning objectives
  3. 3Mind map
  4. 4Scenario evidence breakdown

Transcript brief

Help QA professionals evaluate explainability and fairness as testable quality concerns, not vague ethical slogans. The briefing explains why the topic matters, walks through a failure scenario, and identifies the artefacts a tester should produce for evidence and auditability.

Key takeaways

  • Connect the AI risk to a measurable test or monitor.
  • Document the evidence needed for reproducibility and audit.
  • Use the lab or scenario to practise the validation workflow.

Module opening

Help QA professionals evaluate explainability and fairness as testable quality concerns, not vague ethical slogans.

Audience. Testers working on AI systems that affect people, eligibility, prioritisation, or trust.

Why this matters. Responsible AI needs evidence. Explanations, fairness metrics, bias reviews, and human challenge routes must be designed, tested, and maintained.

ISTQB CT-AI mapping. CT-AI 2.4, 2.7, 8.3, 8.6

Trainer note

Start with the scenario before the theory. Ask learners what evidence would make them confident, then use the module to build that evidence step by step.

Learning objectives

  • Explain the core quality risk in explainability, fairness, bias, and responsible ai evidence.
  • Select practical test evidence that supports an AI release decision.
  • Apply the module concepts to a realistic QA scenario.
  • Produce a portfolio artifact that can be reused in a professional AI testing context.

Mind map

Explainability, Fairness, Bias, and Responsible AI Evidence mind map

Real-life scenario · Financial services

The lending model with unexplained rejection patterns

Situation. A model recommends approve, refer, or decline. Approval rates differed by applicant group and customer service could not explain disputed decisions.

Lesson. AI testing is strongest when risks, examples, evidence, and release decisions are connected.

Scenario evidence breakdown

Scenario elementDetail
Product/SystemLoan pre-approval journey
AI featureA model recommends approve, refer, or decline.
Failure or riskApproval rates differed by applicant group and customer service could not explain disputed decisions.
Testing challengeOverall performance looked acceptable, but subgroup outcomes and explanation usability were untested.
Tester responseThe tester required fairness slices, explanation review, threshold rationale, human appeal workflow, and responsible AI evidence pack.
Evidence requiredFairness metric report, explanation samples, disputed-case review, bias risk assessment, and appeal test results.
Business decisionDelay launch until fairness evidence and explanation workflow meet the release criteria.

Visual flow

Explainability, Fairness, Bias, and Responsible AI Evidence scenario flow

Learning path

  1. Start Here

    5 min

    Outcome, CT-AI exam relevance, and the lending decision scenario.

  2. Learn

    24 min

    Bias sources, fairness metrics, explanations, responsible evidence, and communication.

  3. See It

    10 min

    Subgroup outcome and explanation workflow evidence.

  4. Try It

    18 min

    Build a fairness and explainability evidence pack.

  5. Recall and Apply

    10 min

    Exam traps, active recall, and the portfolio artifact.

Responsible AI needs test evidence

Fairness and explainability become testable when the team defines groups, metrics, explanation purpose, review workflow, and remediation route.

Example

Approval rates differed by applicant group and support agents could not explain disputed lending decisions.

Mistake

Treating fairness as a principle statement or assuming explanations prove causality.

Evidence

Fairness metric rationale, subgroup report, explanation samples, disputed-case review, appeal test results, and responsible AI sign-off.

Worked example: Delaying a lending launch

Scenario. A lending model passes aggregate performance, but one applicant group has a worse approval rate and the support team cannot explain declined applications.

Reasoning. The model may be technically strong but not release-ready because high-impact outcomes need subgroup evidence, explanation usability, and appeal controls.

Model answer. Delay launch until fairness criteria, explanation review, human appeal workflow, and responsible AI evidence meet the agreed release gate.

Try it: Build the fairness evidence pack

Prompt. Use the lending scenario to decide what responsible AI evidence is required before release.

Learner action. Define affected groups, fairness metric, explanation checks, reviewer workflow, appeal path, owner, and recommendation.

Expected output. `fairness-and-explainability-evidence-pack.md` with subgroup evidence, explanation examples, limitations, appeal checks, and release decision.

Exam trap

Objective

CT-AI 2.4, 2.7, 8.3, 8.6

Common trap

Choosing a fairness metric without connecting it to the product decision or affected users.

Wording clue

Prefer answers that combine metric evidence, explanation limits, human review, and remediation.

Portfolio checkpoint

Create the module portfolio deliverable and use it to support your release decision.

Artifact structure

fairness-and-explainability-evidence-pack.md

ContextAffected groupsFairness metricExplanation samplesLimitationsAppeal workflowRecommendationOpen questions

Recall check

Why is aggregate performance insufficient for responsible AI?
It can hide subgroup harms and explanation failures.
What is the role of explanations?
They support investigation and review, but they are not automatic proof of causality.
What evidence makes an appeal route testable?
User path, reviewer guidance, decision records, service levels, and remediation owners.
What portfolio artifact does this module produce?
fairness-and-explainability-evidence-pack.md, a responsible AI release evidence pack.

Topic-by-topic teaching guide

1. Bias Sources

Bias can enter through sampling, labels, proxies, historical decisions, or product design.

Teaching lensPractical detail
Real QA exampleA historical hiring dataset may encode past preferences rather than job performance.
What can go wrongAssuming removing a protected attribute removes bias.
How a tester should thinkLook for proxies, slices, and historical feedback loops.
Evidence to collectBias risk assessment and data review.

2. Fairness Metrics

Fairness metrics compare outcomes or errors across groups, but each metric has trade-offs.

Teaching lensPractical detail
Real QA exampleDemographic parity and equal opportunity answer different fairness questions.
What can go wrongChoosing a metric without understanding the product decision.
How a tester should thinkSelect metrics with stakeholders and legal context.
Evidence to collectFairness metric rationale and subgroup report.

3. Explainability

Global explanations summarise broad behaviour; local explanations support individual case review.

Teaching lensPractical detail
Real QA exampleA global report may show income dominates decisions, while a local explanation explains one applicant outcome.
What can go wrongTreating explanation output as proof of causality.
How a tester should thinkUse explanations as evidence for investigation, not magic truth.
Evidence to collectExplanation report and limitation notes.

4. Responsible Evidence

Responsible AI evidence shows that the team considered harm, accountability, transparency, and remediation.

Teaching lensPractical detail
Real QA exampleAn appeal workflow can be tested like any other critical path.
What can go wrongWriting principles without operational checks.
How a tester should thinkTurn principles into tests, owners, and artifacts.
Evidence to collectEvidence pack, sign-off, and review cadence.

5. Communication

Fairness and explanations must be understandable to the people who use them.

Teaching lensPractical detail
Real QA exampleA support agent needs actionable reason codes, not raw SHAP values.
What can go wrongDelivering technical charts without user guidance.
How a tester should thinkTest whether explanations help real decisions.
Evidence to collectReviewer feedback and usability notes.

Practical QA workflow

  • Start from the user or business decision affected by the AI system.
  • Name the AI asset under test: data, feature pipeline, model, prompt, retrieval index, tool, or full workflow.
  • Convert the main risk into observable quality signals and release gates.
  • Choose the right oracle: deterministic assertion, metric threshold, metamorphic relation, reviewer rubric, comparison, or production monitor.
  • Test important slices, edge cases, misuse cases, and change scenarios.
  • Record versions, data sources, thresholds, reviewer notes, and decision rationale.

Test design checklist

  • What harm could happen if this AI behaviour is wrong?
  • Which users, groups, products, regions, or workflows need separate evidence?
  • Which metric or observation would reveal the failure early?
  • What is the minimum evidence needed for release, shadow mode, rollback, or rejection?
  • Who owns the evidence after the model, prompt, or data changes?

Worked QA example

A tester receives a release request for the module scenario. Instead of asking only whether tests pass, the tester writes three release questions: what changed, who could be harmed, and what evidence proves the change is controlled. The answer becomes a small evidence pack: one risk table, one set of representative examples, one automated or reviewable check, and one release recommendation.

Common mistakes

  • Treating AI output as a normal deterministic response when the real risk is behavioural.
  • Reporting one impressive metric without slices, uncertainty, or business context.
  • Forgetting that data, prompts, model versions, and monitoring are part of the test surface.
  • Writing governance language that cannot be checked by a tester.

Guided exercise

Use the scenario above and create a one-page evidence plan. Include the decision being influenced, the main risk, the test oracle, the data or examples required, the release gate, and the owner.

Discussion prompt

What would a fair outcome mean for your chosen AI feature: equal selection, equal error, equal opportunity, or something else?

Hands-on lab mapping

  • Lab: CourseMaterials/AI-Testing/labs/02_fairness_and_aif360.ipynb and CourseMaterials/AI-Testing/labs/03_explainability_shap_lime.ipynb
  • Task: Compute group metrics and review model explanations for responsible release evidence.
  • Why this lab matters: it turns the module theory into visible evidence that a release approver can inspect.

Decision simulation

A model passes overall performance but fails one subgroup metric. Decide whether to block, mitigate, monitor, or narrow the launch scope.

Key terms

  • Bias: Systematic tendency that can produce unfair or inappropriate outcomes.
  • Global explanation: Explanation of broad model behaviour across many cases.
  • Local explanation: Explanation of a specific prediction or decision.
  • Fairness metric: A quantitative comparison of outcomes or errors across groups.

Revision prompts

  • Explain the module scenario in two minutes to a product owner.
  • Name three pieces of evidence you would require before release.
  • Identify one automated check and one human-review check.
  • Describe how this topic changes after deployment.