Skip to content

WeaverPDF v1

This note defines the first bounded implementation target for WeaverPDF.

The goal is not to implement the full EzPDF reference and not to start with FO. The goal is to ship a small, solid Markdown-first PDF renderer with a clean engine boundary that later features can build on.

Goal

Render ordinary GitHub-flavored Markdown documents to professional paged PDFs with predictable layout, readable defaults, and a clean internal architecture.

The first slice should prove:

  • Markdown parsing
  • normalized document AST
  • layout IR
  • basic pagination
  • PDF output

without taking on the full directive and page-composition surface.

Non-goals

The first WeaverPDF slice should explicitly avoid:

  • FO or XSL-FO input
  • full EzPDF syntax parity
  • advanced directive tables
  • multi-column layout
  • foldouts
  • blank-page and parity logic
  • cross-references and multi-pass dynamic placeholders
  • widows/orphans, keeps, floats, or advanced pagination policy
  • full bidi/RTL shaping beyond what the chosen text stack can safely support

Input surface

In scope

  • standard headings (# through ######)
  • paragraphs
  • emphasis and strong
  • strikethrough
  • inline code
  • fenced code blocks
  • blockquotes
  • ordered and unordered lists
  • links
  • images from local paths
  • thematic breaks
  • GFM tables

Deferred

  • custom block directives
  • inline styled spans ({text|...})
  • symbols and icon shortcuts
  • YAML frontmatter semantics beyond optional metadata capture
  • variables, loops, filters, anchors, and dynamic placeholders
  • directive-table YAML mode

Architecture target

WeaverPDF v1 should not render directly from a third-party Markdown AST.

The implementation boundary should be:

Markdown parser
  → WeaverPDF document AST
  → WeaverPDF layout IR
  → PDF backend

The document AST and layout IR contracts are described in WEAVERPDF_ARCHITECTURE.md.

Parser

Use a mature Markdown parser rather than writing one by hand.

Good candidates are parser stacks in the remark / mdast ecosystem. The parser is a dependency. The normalized AST, layout IR, and renderer are owned in-tree.

Document AST

The v1 AST should be minimal and explicit, with nodes along these lines:

  • Document
  • Heading
  • Paragraph
  • Text
  • Emphasis
  • Strong
  • Delete
  • InlineCode
  • CodeBlock
  • BlockQuote
  • List
  • ListItem
  • Link
  • Image
  • ThematicBreak
  • Table
  • TableRow
  • TableCell

Layout IR

The layout IR should capture page/layout concerns rather than source syntax.

The initial IR only needs enough power for:

  • vertical block flow
  • inline text runs
  • simple line breaking
  • list indentation
  • code block boxes
  • image sizing constraints
  • table column measurement and row layout
  • page breaks generated by overflow

Rendering behavior

v1 page model

  • single page size per document
  • single margin box per document
  • optional header/footer support can wait until after the first slice unless it falls out naturally from the page model
  • single-column page flow only

v1 styling model

  • one built-in default theme
  • heading size hierarchy
  • readable body font and spacing defaults
  • monospace treatment for code
  • basic link styling
  • simple blockquote styling
  • simple table styling

The point is not theme richness. The point is predictable, good default output.

Diagnostics

Even in v1, diagnostics should remain explicit.

The first slice should report:

  • Markdown parse failures if the parser exposes them
  • unsupported v1 features encountered in the normalization step
  • missing local image resources
  • layout failures that the engine can detect deterministically

Prefer compile-time or normalization-time diagnostics over silent runtime degradation when practical.

Testing

WeaverPDF v1 should ship with three kinds of tests.

1. normalization tests

Input Markdown to normalized WeaverPDF AST snapshots for:

  • headings
  • lists
  • code blocks
  • links
  • images
  • tables

2. layout tests

Small focused fixtures for:

  • paragraph pagination
  • list indentation
  • code block sizing
  • image scaling
  • table measurement and wrapping

3. golden output tests

Fixture documents that render to stable page/layout snapshots and, when viable, stable PDF metadata assertions.

The v1 goldens should stay simple:

  • README-like document
  • release notes / changelog
  • small technical note with table and code blocks

Suggested milestone order after v1

Once v1 is stable, the likely order is:

  1. frontmatter metadata and basic document config
  2. minimal directives such as page break and section break
  3. inline styled spans and admonitions
  4. anchors, cross-references, and TOC
  5. directive tables / advanced tables
  6. page-composition features
  7. WeaverFO exploration against the shared layout engine

Exit criteria

WeaverPDF v1 is done when:

  • a normal GitHub-style README renders cleanly to PDF
  • a small document with lists, code blocks, images, and a GFM table renders predictably
  • the engine has an owned document AST and layout IR
  • the output path does not depend on direct rendering from third-party parser nodes
  • unsupported non-v1 constructs produce clear diagnostics rather than ad hoc behavior

Decision summary

WeaverPDF v1 is a bounded Markdown-first PDF renderer.

It should start with GFM core features, a clean internal AST/IR boundary, and single-column paged layout.

Everything else in the current EzPDF reference is staged on top of that, including directives, templating, advanced page composition, and WeaverFO.