WeaverPDF v1

This note defines the first bounded implementation target for WeaverPDF.

The goal is not to implement the full EzPDF reference and not to start with FO. The goal is to ship a small, solid Markdown-first PDF renderer with a clean engine boundary that later features can build on.

Goal

Render ordinary GitHub-flavored Markdown documents to professional paged PDFs with predictable layout, readable defaults, and a clean internal architecture.

The first slice should prove:

Markdown parsing
normalized document AST
layout IR
basic pagination
PDF output

without taking on the full directive and page-composition surface.

Non-goals

The first WeaverPDF slice should explicitly avoid:

FO or XSL-FO input
full EzPDF syntax parity
advanced directive tables
multi-column layout
foldouts
blank-page and parity logic
cross-references and multi-pass dynamic placeholders
widows/orphans, keeps, floats, or advanced pagination policy
full bidi/RTL shaping beyond what the chosen text stack can safely support

Input surface

In scope

standard headings (# through ######)
paragraphs
emphasis and strong
strikethrough
inline code
fenced code blocks
blockquotes
ordered and unordered lists
links
images from local paths
thematic breaks
GFM tables

Deferred

custom block directives
inline styled spans ({text|...})
symbols and icon shortcuts
YAML frontmatter semantics beyond optional metadata capture
variables, loops, filters, anchors, and dynamic placeholders
directive-table YAML mode

Architecture target

WeaverPDF v1 should not render directly from a third-party Markdown AST.

The implementation boundary should be:

Markdown parser
  → WeaverPDF document AST
  → WeaverPDF layout IR
  → PDF backend

The document AST and layout IR contracts are described in WEAVERPDF_ARCHITECTURE.md.

Parser

Use a mature Markdown parser rather than writing one by hand.

Good candidates are parser stacks in the remark / mdast ecosystem. The parser is a dependency. The normalized AST, layout IR, and renderer are owned in-tree.

Document AST

The v1 AST should be minimal and explicit, with nodes along these lines:

Document
Heading
Paragraph
Text
Emphasis
Strong
Delete
InlineCode
CodeBlock
BlockQuote
List
ListItem
Link
Image
ThematicBreak
Table
TableRow
TableCell

Layout IR

The layout IR should capture page/layout concerns rather than source syntax.

The initial IR only needs enough power for:

vertical block flow
inline text runs
simple line breaking
list indentation
code block boxes
image sizing constraints
table column measurement and row layout
page breaks generated by overflow

Rendering behavior

v1 page model

single page size per document
single margin box per document
optional header/footer support can wait until after the first slice unless it falls out naturally from the page model
single-column page flow only

v1 styling model

one built-in default theme
heading size hierarchy
readable body font and spacing defaults
monospace treatment for code
basic link styling
simple blockquote styling
simple table styling

The point is not theme richness. The point is predictable, good default output.

Diagnostics

Even in v1, diagnostics should remain explicit.

The first slice should report:

Markdown parse failures if the parser exposes them
unsupported v1 features encountered in the normalization step
missing local image resources
layout failures that the engine can detect deterministically

Prefer compile-time or normalization-time diagnostics over silent runtime degradation when practical.

Testing

WeaverPDF v1 should ship with three kinds of tests.

1. normalization tests

Input Markdown to normalized WeaverPDF AST snapshots for:

headings
lists
code blocks
links
images
tables

2. layout tests

Small focused fixtures for:

paragraph pagination
list indentation
code block sizing
image scaling
table measurement and wrapping

3. golden output tests

Fixture documents that render to stable page/layout snapshots and, when viable, stable PDF metadata assertions.

The v1 goldens should stay simple:

README-like document
release notes / changelog
small technical note with table and code blocks

Suggested milestone order after v1

Once v1 is stable, the likely order is:

frontmatter metadata and basic document config
minimal directives such as page break and section break
inline styled spans and admonitions
anchors, cross-references, and TOC
directive tables / advanced tables
page-composition features
WeaverFO exploration against the shared layout engine

Exit criteria

WeaverPDF v1 is done when:

a normal GitHub-style README renders cleanly to PDF
a small document with lists, code blocks, images, and a GFM table renders predictably
the engine has an owned document AST and layout IR
the output path does not depend on direct rendering from third-party parser nodes
unsupported non-v1 constructs produce clear diagnostics rather than ad hoc behavior

Decision summary

WeaverPDF v1 is a bounded Markdown-first PDF renderer.

It should start with GFM core features, a clean internal AST/IR boundary, and single-column paged layout.

Everything else in the current EzPDF reference is staged on top of that, including directives, templating, advanced page composition, and WeaverFO.