WeaverPDF v1
This note defines the first bounded implementation target for WeaverPDF.
The goal is not to implement the full EzPDF reference and not to start with FO. The goal is to ship a small, solid Markdown-first PDF renderer with a clean engine boundary that later features can build on.
Goal
Render ordinary GitHub-flavored Markdown documents to professional paged PDFs with predictable layout, readable defaults, and a clean internal architecture.
The first slice should prove:
- Markdown parsing
- normalized document AST
- layout IR
- basic pagination
- PDF output
without taking on the full directive and page-composition surface.
Non-goals
The first WeaverPDF slice should explicitly avoid:
- FO or XSL-FO input
- full EzPDF syntax parity
- advanced directive tables
- multi-column layout
- foldouts
- blank-page and parity logic
- cross-references and multi-pass dynamic placeholders
- widows/orphans, keeps, floats, or advanced pagination policy
- full bidi/RTL shaping beyond what the chosen text stack can safely support
Input surface
In scope
- standard headings (
#through######) - paragraphs
- emphasis and strong
- strikethrough
- inline code
- fenced code blocks
- blockquotes
- ordered and unordered lists
- links
- images from local paths
- thematic breaks
- GFM tables
Deferred
- custom block directives
- inline styled spans (
{text|...}) - symbols and icon shortcuts
- YAML frontmatter semantics beyond optional metadata capture
- variables, loops, filters, anchors, and dynamic placeholders
- directive-table YAML mode
Architecture target
WeaverPDF v1 should not render directly from a third-party Markdown AST.
The implementation boundary should be:
The document AST and layout IR contracts are described in WEAVERPDF_ARCHITECTURE.md.
Parser
Use a mature Markdown parser rather than writing one by hand.
Good candidates are parser stacks in the remark / mdast ecosystem. The
parser is a dependency. The normalized AST, layout IR, and renderer are owned
in-tree.
Document AST
The v1 AST should be minimal and explicit, with nodes along these lines:
DocumentHeadingParagraphTextEmphasisStrongDeleteInlineCodeCodeBlockBlockQuoteListListItemLinkImageThematicBreakTableTableRowTableCell
Layout IR
The layout IR should capture page/layout concerns rather than source syntax.
The initial IR only needs enough power for:
- vertical block flow
- inline text runs
- simple line breaking
- list indentation
- code block boxes
- image sizing constraints
- table column measurement and row layout
- page breaks generated by overflow
Rendering behavior
v1 page model
- single page size per document
- single margin box per document
- optional header/footer support can wait until after the first slice unless it falls out naturally from the page model
- single-column page flow only
v1 styling model
- one built-in default theme
- heading size hierarchy
- readable body font and spacing defaults
- monospace treatment for code
- basic link styling
- simple blockquote styling
- simple table styling
The point is not theme richness. The point is predictable, good default output.
Diagnostics
Even in v1, diagnostics should remain explicit.
The first slice should report:
- Markdown parse failures if the parser exposes them
- unsupported v1 features encountered in the normalization step
- missing local image resources
- layout failures that the engine can detect deterministically
Prefer compile-time or normalization-time diagnostics over silent runtime degradation when practical.
Testing
WeaverPDF v1 should ship with three kinds of tests.
1. normalization tests
Input Markdown to normalized WeaverPDF AST snapshots for:
- headings
- lists
- code blocks
- links
- images
- tables
2. layout tests
Small focused fixtures for:
- paragraph pagination
- list indentation
- code block sizing
- image scaling
- table measurement and wrapping
3. golden output tests
Fixture documents that render to stable page/layout snapshots and, when viable, stable PDF metadata assertions.
The v1 goldens should stay simple:
- README-like document
- release notes / changelog
- small technical note with table and code blocks
Suggested milestone order after v1
Once v1 is stable, the likely order is:
- frontmatter metadata and basic document config
- minimal directives such as page break and section break
- inline styled spans and admonitions
- anchors, cross-references, and TOC
- directive tables / advanced tables
- page-composition features
- WeaverFO exploration against the shared layout engine
Exit criteria
WeaverPDF v1 is done when:
- a normal GitHub-style README renders cleanly to PDF
- a small document with lists, code blocks, images, and a GFM table renders predictably
- the engine has an owned document AST and layout IR
- the output path does not depend on direct rendering from third-party parser nodes
- unsupported non-v1 constructs produce clear diagnostics rather than ad hoc behavior
Decision summary
WeaverPDF v1 is a bounded Markdown-first PDF renderer.
It should start with GFM core features, a clean internal AST/IR boundary, and single-column paged layout.
Everything else in the current EzPDF reference is staged on top of that, including directives, templating, advanced page composition, and WeaverFO.