ADAWCAG.org
Log inRequest a Quote
6 min read

Why Our AI Never Picks Your Compliance Tier

Accessibility compliance advice has a hallucination problem. Ask a general-purpose AI "is my site WCAG 2.1 AA compliant?" and it will often produce confident, fluent, and wrong guidance — citing non-existent success criteria, inventing remediation steps, or blending WCAG 2.0, 2.1, and 2.2 rules without flagging the differences. For a legal defense document or a federal procurement deliverable, that's not a minor UX hiccup. It's a liability.

At ADAWCAG.org, we spent significant time designing a system that can use AI to enrich compliance recommendations without ever allowing the model to fabricate the underlying standards. Here's how it works.

The problem with open-ended compliance prompting

The failure mode is straightforward. If you give a large language model a site URL and ask it to produce an accessibility report, it will:

  • Invent WCAG success criteria numbers that don't exist
  • Confuse WCAG 2.0, 2.1, and 2.2 requirements in the same sentence
  • Recommend remediation approaches that are technically valid but wrong for the specific violation
  • Produce copy that sounds authoritative but contradicts the actual DOJ rulemaking record

None of this is unique to AI. Human consultants make these same errors without rigorous standards training. But AI does it at scale, instantly, and with complete confidence — which makes it worse for any context where the output will be reviewed by a lawyer, a federal contracting officer, or a court.

The architecture: quiz → deterministic tier → AI enrichment

Our quiz funnel processes audience inputs across four tracks: Business, Government, Investor, and Nonprofit. Within each track, a series of multiple-choice questions maps to a deterministic tier selection using a weighted scoring function. The tier selection — Starter, Pro, Agency, Enterprise, or City variants — is never AI-generated. It comes from a rule-based function in lib/quiz/analyzeAnswers.ts that runs on the server, evaluating the submitted answers before any AI call is made.

This means even if our AI layer fails completely (API timeout, rate limit, or model error), the user receives a valid tier match. The fallback is always deterministic. The AI layer only runs on the server, after the tier is already selected, and its sole job is to produce a tailored headline, summary, and next-steps list that references the pre-selected tier.

Anchoring the system prompt to known WCAG criteria

The system prompt given to the model at /api/quiz/analyze is carefully structured to prevent standards drift. Rather than asking the model to determine what compliance standards apply, the prompt explicitly provides the applicable standard (WCAG 2.1 Level AA, 28 CFR Part 35 Subpart H, Section 508 Refresh) and instructs the model only to personalize the framing for the detected audience and tier.

Key constraints in the prompt:

  • The WCAG version is always specified — the model is not asked to decide which version applies
  • The tier and pricing are injected as facts, not asked as inference tasks
  • The model is explicitly instructed not to make compliance determinations — only to explain what the selected plan covers
  • Deadline dates are sourced from lib/constants/deadlines.ts and injected as string constants, not derived by the model

The fallback chain

Every AI call in the quiz pipeline uses a three-layer fallback:

  1. AI call succeeds — the enriched recommendation is returned and shown
  2. AI call fails or times out — the deterministic fallback text from quizConfig.ts is used; the user sees no error, only slightly less personalized copy
  3. Client-side render fails — the tier and CTA are still visible from the URL-encoded quiz state, which encodes the audience + answers into searchParams and can be re-rendered without any API call

The quiz state is URL-encoded on every step, which means a user who refreshes mid-quiz or shares a link lands in exactly the same state, including restoring their answers and re-running analyze if needed. This is also how we handle deep-link sharing from email campaigns.

Preview scanning without blocking the recommendation

For users who provide a website URL, we now offer an asynchronous preview scan via /api/quiz/preview-scan. This fires a lightweight axe-core run against the provided URL with a hard 10-second timeout. The results — violation counts by severity and top issue descriptions — are shown inline in the results panel, below the tier recommendation.

Critically, the preview scan never blocks the recommendation render. The tier selection and CTA appear immediately. The scan results appear asynchronously when ready, or not at all if the site is unreachable or the scan times out. This keeps the quiz fast for the majority of users who don't provide a URL while adding real value when they do.

What this means for legal defensibility

Our enterprise audit reports and VPAT/ACR documents are generated from scan data, not from AI inference. The AI layer in the quiz is used only for lead qualification and audience personalization — it has no role in generating the technical findings that appear in a legal document.

The separation is deliberate. Compliance documentation that will be reviewed in federal procurement or litigation must be traceable to specific scan runs, specific WCAG success criteria, and specific remediation steps. We generate those from deterministic tools (axe-core, pa11y, Lighthouse) with human verification by Aaron Espinoza — our DHS Trusted Tester on staff. The AI handles personalization. The scanners and the human handle the findings.

If you're building a compliance workflow that touches legal review, this is the distinction worth enforcing rigorously: use AI where the stakes of a hallucination are low (recommendation framing, email copy), and use deterministic tools where the stakes are high (violation categorization, WCAG criterion mapping, remediation documentation).