Errors — structured diagnostics and runtime failures

This document defines the error-reporting model for Weaver (@arakendo/weaver-xslt) across XPath, XSLT, serialization, the future CLI/watch-mode surface, codegen, and downstream editor or debugger tooling.

It complements ARCHITECTURE.md, especially DEC-007 and DEC-013, and DIFFERENTIATORS.md, especially D1. Those documents define why diagnostics matter; this file defines the durable shape of the error and diagnostic contract. See also SEMANTIC_BOUNDARIES.md for the cross-cutting rules behind structured-vs-formatted diagnostics, provenance tier clarity, and the requirement that boundary translations preserve meaning.

Goals

Give every diagnosable failure a stable machine-readable identity.
Preserve source locations, related locations, and runtime call context as errors propagate.
Use one structured report shape for parse errors, static diagnostics, runtime failures, watch-mode output, and future editor integration.
Keep the canonical code aligned with W3C XPath/XSLT/serialization codes whenever the spec defines one.
Translate errors to human-readable text, JSON, logs, or editor squiggles only at system boundaries.
Keep generated-code and interpreter diagnostics semantically equivalent.

Non-goals

Inventing a second primary error taxonomy when the spec already gives us W3C codes.
Replacing local TypeScript error classes with one giant enum-like mega-object.
Making core XPath/XSLT code know about terminal colors, JSON envelopes, or watch-mode UX.
Parsing formatted error strings after the fact.
Treating compiler diagnostics and runtime errors as unrelated systems.

Core model

Weaver has two closely related shapes:

XdmError: throwable internal error object used inside the engine.
DiagnosticReport: plain JSON-serializable boundary shape used for formatting, watch mode, tests, codegen parity, and future editor integration.

The rule is:

Internal code may throw XdmError or a subclass.
Boundary code translates it to DiagnosticReport.
Human-readable formatting is done from DiagnosticReport, not by inventing strings in many places.

Identity model

Primary code: W3C first

When XPath, XSLT, or serialization defines a code, that code is the canonical identity.

Examples:

XPST0003
XPDY0002
XPTY0004
XTSE0010
XTDE0040
SENR0001

This is better than introducing a second parallel naming system because:

users already see these codes in the spec and existing tooling
conformance tests key off these codes
the current engine already models them in src/errors/codes.ts

Weaver-local codes

Use Weaver-local codes only when the failure is outside W3C language semantics, such as:

codegen emitter failures
source-map generation failures
watch-mode file resolution failures
internal invariant failures where no W3C code fits

Suggested format:

WEAVER_<AREA>_<REASON>

Examples:

WEAVER_CODEGEN_EMIT_FAILED
WEAVER_SOURCEMAP_BUILD_FAILED
WEAVER_WATCH_RESOLVE_FAILED
WEAVER_INTERNAL_IR_INVARIANT_FAILED

Local codes should stay rare. If a W3C code is even reasonably correct, prefer it.

Report shape

A durable DiagnosticReport should carry these fields:

export type DiagnosticPhase =
  | 'compile'
  | 'runtime'
  | 'serialization'
  | 'codegen'
  | 'internal';

export type DiagnosticSeverity = 'error' | 'warning' | 'note';

export type DiagnosticCategory =
  | 'syntax'
  | 'type'
  | 'resolution'
  | 'analysis'
  | 'execution'
  | 'serialization'
  | 'internal';

export interface SourceSpan {
  uri?: string;
  offsetStart: number;
  offsetEnd: number;
  lineStart: number;
  columnStart: number;
  lineEnd: number;
  columnEnd: number;
}

export interface RelatedSpan {
  label: string;
  span: SourceSpan;
}

export interface DiagnosticFrame {
  kind: 'template' | 'instruction' | 'xpath' | 'function' | 'mode';
  label: string;
  span?: SourceSpan;
}

export type KnownDetailKind =
  | 'sequenceType'
  | 'qname'
  | 'axis'
  | 'functionSignature'
  | 'templateRef'
  | 'instructionRef';

export type DiagnosticDetailValue =
  | string
  | number
  | boolean
  | { kind: KnownDetailKind | string; [key: string]: unknown };

export interface DiagnosticDetail {
  key: string;
  value: DiagnosticDetailValue;
}

export type DiagnosticSuggestionKind = 'fix' | 'hint' | 'alternative';

export interface DiagnosticSuggestion {
  kind: DiagnosticSuggestionKind;
  label: string;
  replacement?: string;
  confidence?: number;
}

export interface DiagnosticReport {
  code: string;
  phase: DiagnosticPhase;
  severity: DiagnosticSeverity;
  category: DiagnosticCategory;
  message: string;
  primary?: SourceSpan;
  related: readonly RelatedSpan[];
  frames: readonly DiagnosticFrame[];
  details: readonly DiagnosticDetail[];
  suggestions: readonly DiagnosticSuggestion[];
  causes: readonly DiagnosticReport[];
}

The important design choice is that details, related, frames, suggestions, and causes are structured. They must not be collapsed back into the message string.

Why not a Tosumu-style status enum?

Tosumu uses statuses like Busy, Conflict, and IntegrityFailure because it is a storage engine with CLI/process boundaries.

Weaver is a compiler/runtime product. The broad policy axes that matter here are:

code: stable identity
phase: when in the lifecycle it happened
category: what kind of problem it is
severity: how serious it is

That is a better fit than a database-style status enum.

If we later need exit-code policy for a CLI, it should be derived from phase and severity, with code-specific overrides where necessary.

phase is intentionally coarse. compile includes lexing, parsing, resolution, static analysis, and compile-time type checking. Finer distinctions belong in category and, when useful, structured details such as analysisPass: 'type'.

category is intentionally about problem kind, not lifecycle. That means:

phase = 'runtime', category = 'execution' is meaningful
phase = 'runtime', category = 'execution' is the right pairing for evaluator failures

If a category starts restating a phase, the category is wrong.

Source locations

Diagnostics-first means source locations are not optional metadata.

SourceSpan must use one canonical coordinate system:

uri: source identity for files, virtual buffers, or generated artifacts
offsetStart / offsetEnd: UTF-16 code unit offsets
lineStart / columnStart / lineEnd / columnEnd: 1-based human-facing positions

The UTF-16 rule is deliberate because this project lives in TypeScript, source maps, and editor tooling that already operate in UTF-16 offsets. If one part of the engine uses bytes and another uses UTF-16, diagnostics will drift.

Rules:

Tokens carry full spans.
XPath AST nodes carry full spans.
Stylesheet AST and IR nodes carry full spans.
Runtime failures keep the most precise XPath or instruction span available.
Codegen preserves source-map fidelity so generated-code failures can still be reported against the original .xsl.

A location-lite error is incomplete, not merely less polished.

Many Weaver failures need more than one location.

Examples:

the current offending XPath expression
the containing template match declaration
the apply-templates call site that invoked it
the conflicting template when priorities overlap
the previous declaration when a name is duplicated

These secondary locations belong in related, not in prose-only messages.

Example:

XTSE0010: unknown XSLT element `xsl:vale-of`

  at invoice.xsl:42:4
       <xsl:vale-of select="total"/>
        ^^^^^^^^^^^

related:
  did you mean `xsl:value-of`

Runtime frames

Dynamic errors need call context, not just a single point location.

A runtime diagnostic should be able to say:

which template was executing
which instruction failed
which caller invoked that template
which mode or function context applied

This is the Weaver equivalent of Tosumu preserving error cause and structured context.

Suggested frame examples:

in template match="invoice/total" (invoice.xsl:39)
called from apply-templates select="total" (invoice.xsl:24)

These should come from structured frames, not be assembled ad hoc in every formatter.

Suggestions

Suggestions are part of the product, not optional polish.

Use them for cases like:

misspelled XSLT instruction names
misspelled function names
unknown variable names with close matches
obvious replacement for invalid string/number concatenation

Suggestions should be represented structurally:

{
  kind: 'fix',
  label: 'did you mean',
  replacement: "concat(string(amount), ' USD')",
  confidence: 0.92,
}

That lets different boundaries render them differently while preserving the same meaning.

confidence must use a stable 0.0..1.0 scale:

1.0: deterministic fix; safe for auto-apply if the boundary supports it
0.5: likely suggestion; good default for editor hints and ranked alternatives
< 0.3: weak hint; show only when the boundary wants low-confidence guidance

If a suggestion generator cannot explain its confidence policy, it should omit the field.

Throwable shape inside the engine

XdmError and its subclasses should stay as the engine-facing throwable types:

XdmError
├─ XPathError
├─ XsltError
└─ SerializationError

XdmError should grow to carry the structured information needed to produce a DiagnosticReport, for example:

code
message
phase
category
primary
related
frames
details
suggestions
JS-native cause

The engine throws XdmError; boundaries call something like toDiagnosticReport().

Inside the engine, a thrown XdmError may still use JS-native Error.cause. At the boundary, that is normalized into causes: readonly DiagnosticReport[]. The durable report contract should use plural causes because static analysis and aggregated compile errors can legitimately have more than one underlying diagnostic.

Normalization rules:

If cause is another XdmError, convert it to a DiagnosticReport and append it.
If cause is already a DiagnosticReport, append it directly.
If cause is an unknown Error, project it to a single WEAVER_INTERNAL_* diagnostic with preserved message text in details.
Flatten cause chains during boundary conversion; do not create recursively nested DiagnosticReport trees-of-trees when a flat causes[] list preserves the meaning.
Boundary formatters should guard against cycles even if the engine accidentally creates one.

Formatter boundary

Human-readable text should be produced by a dedicated diagnostics module, not by hand-building strings all over the engine.

Planned boundary:

src/diagnostics/
  report.ts      // DiagnosticReport types and conversions
  format.ts      // formatDiagnostic(report, sourceText)
  json.ts        // JSON-safe projection if needed later

For MVP+1, the minimum formatter output is the D1-style caret format:

XPTY0004: expected xs:string, got xs:integer (1)

  at invoice.xsl:42:18
         <xsl:value-of select="amount + ' USD'"/>
                                      ^^^^^^^^^
  in template match="invoice/total" (invoice.xsl:39)
  called from apply-templates select="total" (invoice.xsl:24)

did you mean: concat(string(amount), ' USD')

Compile-time diagnostics vs runtime failures

Weaver should use one shared report shape for both:

parse errors
static analysis findings
type mismatches
runtime transform failures
serialization failures
codegen failures

The only meaningful difference is the combination of phase and severity.

Examples:

parse failure: phase = 'compile', category = 'syntax', severity = 'error'
unreachable template: phase = 'compile', category = 'analysis', severity = 'warning'
compile-time type failure: phase = 'compile', category = 'type', severity = 'error'
type mismatch in evaluation: phase = 'runtime', category = 'type', severity = 'error'
sourcemap emit bug: phase = 'codegen', category = 'internal', severity = 'error'

This keeps watch mode, tests, interpreter, and codegen on one contract.

Details usage

Use details for stable, machine-meaningful fields such as:

expectedType
actualType
functionName
variableName
mode
axis
templateMatch
instructionKind

Do not duplicate prose in details, and do not store large opaque payloads there.

Some details are inherently structured and should stay that way. Examples:

{ key: 'expectedType', value: { kind: 'sequenceType', raw: 'xs:string?' } }
{ key: 'functionName', value: { kind: 'qname', prefix: 'fn', local: 'concat' } }
{ key: 'axis', value: { kind: 'axis', name: 'descendant-or-self' } }

If a detail needs later comparison, grouping, or transformation, do not squeeze it into a string too early.

Classification invariants

This design only stays useful if the contract is validated.

At minimum, development builds or test helpers should have an invariant check such as assertValidDiagnostic(report).

Suggested checks:

W3C codes match their expected family shape when not using a WEAVER_* local code
phase, category, and severity are present and internally consistent
primary and related spans use valid UTF-16 offset ordering
code families and categories line up reasonably (XPST* should not ship as category = 'execution')
code-specific required details exist when the engine depends on them

This does not need to become a framework. It does need to be strict enough that no one can casually ship code: 'oops' and call it structured diagnostics.

Required details should be explicit rather than implied in prose. A small map is enough:

const REQUIRED_DETAILS: Record<string, readonly string[]> = {
  XPTY0004: ['expectedType', 'actualType'],
  XPST0017: ['functionName'],
  XTDE0040: ['mode'],
  XTSE0165: ['href'],
};

This table should stay deliberately small and only cover codes where missing detail fields would make the diagnostic materially less useful.

Immutability at the boundary

DiagnosticReport is a contract object. Treat it as immutable once created.

In practice that means one of:

construct reports through small factory helpers
freeze reports before exposing them at boundaries
avoid in-place mutation after formatting, testing, or boundary translation begins

The exact mechanism is less important than the rule: boundary diagnostics should not be mutable bags of fields that different layers casually rewrite.

Boundary translation

Different boundaries may render the same DiagnosticReport differently:

CLI compile/run command: rich human-readable stderr
watch mode: streaming formatted diagnostics to stdout/stderr
tests: byte-exact golden strings or object snapshots
editor tooling: squiggles, hover, related locations, quick fixes
codegen parity tests: compare structured reports between interpreter and generated code

The stable contract is the report object, not any specific renderer.

Semantic parity rule

A feature that exists in both interpreter and codegen backends should produce equivalent structured diagnostics, not merely similar prose.

Parity means:

same canonical code
same phase, category, and severity
same primary span when the same source is available
same relevant details
same logical suggestions

String wording may differ slightly in development, but the structured meaning must match.

Suggested module ownership

Start small.

Suggested first implementation:

src/errors/
  XdmError.ts
  XPathError.ts
  XsltError.ts
  SerializationError.ts
  codes.ts

src/diagnostics/
  report.ts
  format.ts

Why this shape:

the error class hierarchy already exists in src/errors/
formatter logic should not live inside engine exceptions
diagnostics are a first-class product surface and deserve their own module

Rollout plan

Phase 1 — MVP+1 diagnostic bones

expand SourceLocation into a full source span shape
introduce DiagnosticReport
add toDiagnosticReport() conversion from XdmError
add formatDiagnostic(report, sourceText)
lock one or two byte-exact formatter tests for XPath parse/type failures

Phase 2 — XPath evaluator context

add structured details for expected/actual type, function names, and context failures
make evaluator errors emit structured phase = 'runtime'
extend QT3-focused tests to assert codes and structured fields where practical

Phase 3 — XSLT runtime context

add runtime frames for template, instruction, caller chain, and mode
add related spans for containing template and caller locations
ensure apply-templates and template dispatch preserve this information

Phase 4 — codegen parity

make generated TypeScript reconstruct or emit equivalent DiagnosticReport values
compare interpreter and codegen diagnostics in fixture tests
ensure source maps preserve XSLT-facing locations

Phase 5 — watch mode and editor surfaces

stream formatted diagnostics in watch mode
add a JSON-safe projection if a future CLI or editor protocol needs it
preserve one stable report contract across all user-facing surfaces

Rules to keep this small

No string parsing as control flow.
No second naming system when a W3C code already exists.
No giant formatter switch scattered across parser, evaluator, compiler, and codegen.
No boundary-specific concepts inside core XPath/XSLT logic.
No feature is done if it still produces poor diagnostics.

The goal is not an error framework. The goal is a stable, inspectable, product-quality contract for failures and diagnostics that makes XSLT debugging stop feeling punitive.

Errors — structured diagnostics and runtime failures

Goals

Non-goals

Core model

Identity model

Primary code: W3C first

Weaver-local codes

Report shape

Why not a Tosumu-style status enum?

Source locations

Related spans

Runtime frames

Suggestions

Throwable shape inside the engine

Formatter boundary

Compile-time diagnostics vs runtime failures

Details usage

Classification invariants

Immutability at the boundary

Boundary translation

Semantic parity rule

Suggested module ownership

Rollout plan

Phase 1 — MVP+1 diagnostic bones

Phase 2 — XPath evaluator context

Phase 3 — XSLT runtime context

Phase 4 — codegen parity

Phase 5 — watch mode and editor surfaces

Rules to keep this small