paperdoc-dev/paperdoc-lib

A zero-dependency PHP library for generating, parsing and converting documents (PDF, HTML, CSV, DOCX)

Maintainers

Package info

github.com/paperdoc-dev/paperdoc-lib

Homepage

pkg:composer/paperdoc-dev/paperdoc-lib

Fund package maintenance!

paperdoc

Statistics

Installs: 133

Dependents: 0

Suggesters: 0

Stars: 53

Open Issues: 0

v0.5.0 2026-04-25 12:51 UTC

README

Latest Version Pre-release PHP Version License Tests

A zero-dependency PHP library for generating, parsing and converting documents — PDF, HTML, CSV, DOCX, XLSX, PPTX, Markdown and more.

Features

  • Generate documents from scratch (PDF, HTML, CSV, DOCX, XLSX, PPTX, Markdown)
  • Parse existing documents into a unified in-memory model
  • Convert between any supported formats in one call
  • Rich document model — typed headings, ordered/bullet lists (nested), bookmarks, code blocks, blockquotes, images, tables, page breaks and typed document properties (author, subject, dates…)
  • Native rendering core — every block element renders cleanly to DOCX, PDF, HTML and Markdown: typed headings (<h1>/<w:pStyle>), nested lists (<ul>/<w:numPr>), blockquotes, code blocks (with language hint), bookmarks, embedded or on-disk images
  • Hyperlinks — parse <w:hyperlink> from DOCX and round-trip them to HTML <a>, Markdown [text](url) and DOCX hyperlink relationships, with anchors and tooltips
  • Batch processing — open and process multiple files at once
  • Laravel integration — first-class ServiceProvider and Facade
  • AI-powered features via Neuron AI (OCR, LLM extraction)
  • Typed exceptionsParserException, RendererException, UnsupportedFormatException, InvalidDocumentException all extending a common PaperdocException
  • Zero native binary dependencies — pure PHP

Requirements

Dependency Version
PHP ^8.2
ext-dom *
ext-mbstring *
ext-zip *
ext-zlib *

Optional (Laravel)

Package Version
illuminate/support ^11.0 | ^12.0

Installation

composer require paperdoc-dev/paperdoc-lib

Laravel auto-discovery

The PaperdocServiceProvider and Paperdoc facade are registered automatically via Laravel's package auto-discovery.

Quick Start

DocumentManager uses static methods (create, save, open, …). A document is built from Section instances: use addSection($section) or addSection() to append an empty section, or openSection() when you want a fluent chain (addParagraph, addHeading, …) on the new section. Bold and other run styles live on TextStyle (see examples/usage.php for tables and advanced composition).

Standalone PHP

use Paperdoc\Support\DocumentManager;
use Paperdoc\Document\Style\TextStyle;

$doc = DocumentManager::create('pdf', 'My Report');

$doc->openSection()
    ->addParagraph('Hello, Paperdoc!', TextStyle::make()->setBold());

DocumentManager::save($doc, 'output/report.pdf');

Laravel (via Facade)

use Paperdoc\Facades\Paperdoc;

// Create
$doc = Paperdoc::create('docx', 'Invoice #1042');
$doc->openSection()->addParagraph('Amount due: $500');
Paperdoc::save($doc, storage_path('invoices/1042.docx'));

// Parse an existing file
$doc = Paperdoc::open('uploads/report.xlsx');

// Convert directly
Paperdoc::convert('report.docx', 'report.pdf', 'pdf');

// Render as string
$html = Paperdoc::renderAs($doc, 'html');

// Batch open
$docs = Paperdoc::openBatch([
    'file1.pdf',
    'file2.docx',
    'file3.xlsx',
]);

Supported Formats

Format Parse Render/Generate
PDF
HTML
DOCX
XLSX
PPTX
CSV
Markdown
DOC
XLS
PPT

Document Model

Every format shares the same strongly-typed in-memory structure:

Document (format, title, ?Metadata, metadata[])
└── Section[]
    ├── Heading (level 1-6, runs, ?id)
    ├── Paragraph (TextRun[], ?ParagraphStyle)
    │   └── TextRun (text, ?TextStyle, ?TextLink)
    ├── ListBlock (bullet | ordered, start)
    │   └── ListItem (runs, blocks → nested ListBlock…)
    ├── Blockquote (nested DocumentElement[])
    ├── CodeBlock (code, ?language)
    ├── Bookmark (id) — link target for TextLink anchors
    ├── Table → TableRow[] → TableCell[]
    ├── Image (src | embedded data + mimeType)
    └── PageBreak

All block elements implement Paperdoc\Contracts\BlockElementInterface. Styles live in Document/Style/ (ParagraphStyle, TextStyle, TableStyle), links in Document/Link/TextLink, typed document properties in Document/Metadata.

Example — build a richly-typed document

use Paperdoc\Document\{Document, Section, Metadata, ListBlock};
use Paperdoc\Document\Style\TextStyle;

$doc = Document::make('md', 'Release notes v0.5.0')
    ->setProperties(
        Metadata::make()
            ->setAuthor('Alice')
            ->setKeywords('release, changelog, paperdoc')
            ->setLanguage('en-US')
    );

$section = $doc->openSection();

$section->addElement(\Paperdoc\Document\Heading::make('Getting started', 2, 'intro'));

$section->addBulletList()
    ->addText('Install the library')
    ->addText('Run the quick start')
    ->addText('Read the docs');

$section->addCodeBlock("composer require paperdoc-dev/paperdoc-lib", 'bash');

$section->addBookmark('ready-to-go');

$section->addBlockquote()
    ->addText('You are all set.', TextStyle::make()->setItalic());

Rendering

Since v0.5.0, every element of the document model is natively rendered by all four core renderers — no element is silently dropped, every output is a valid file format.

Element DOCX PDF HTML Markdown
Heading (1–6) <w:pStyle w:val="HeadingN"/> + bookmark anchor typed font sizes (24/20/16/14/13/12 pt) + navy <h1><h6> with id #######, optional {#id}
Paragraph <w:p> + run styling wrapped text + inline run styles <p> + inline <span> plain text + emphasis
ListBlock <w:numPr> + word/numbering.xml, nested <w:ilvl> / 1. markers, depth-based indent <ul> / <ol start="N">, nested - / 1., two-space indent
Blockquote <w:pStyle w:val="Quote"/> + indent indented italic muted-grey <blockquote> (nested children) > prefixed lines
CodeBlock <w:pStyle w:val="Code"/> + Consolas + <w:br/> Courier, dedicated spacing <pre><code class="language-…"> fenced ```lang block
Bookmark <w:bookmarkStart/> / <w:bookmarkEnd/> rendered silently (PDF annotations: roadmap) <a id="…" class="paperdoc-bookmark"> inline <a id="…"></a>
TextLink <w:hyperlink> (external rels + w:anchor + tooltip) blue underlined run <a href> with safe target/rel safe [label](url "title")
Image <w:drawing> + word/media/imageN.ext rel XObject DCT (JPEG/PNG/GIF via GD re-encode) <img src> or data: URI ![alt](path) or data: URI
Table <w:tbl> with header rows + gridSpan drawn cells with header bg <table> + striped rows | rows
PageBreak <w:br w:type="page"/> newPage() .page-break divider blank line
Metadata docProps/core.xml PDF /Creator (HTML head meta — roadmap) (frontmatter — roadmap)

Both Image::make($path) (on-disk) and Image::fromData($bytes, $mimeType) (in-memory) are accepted everywhere; HTML and Markdown automatically inline embedded images as data: URIs, DOCX writes them to word/media/, and PDF embeds them as DCT XObjects (re-encoding GIF/PNG/WebP through GD when needed).

Typed Exceptions

All library errors extend a single base so consumers can catch them uniformly:

Exception Thrown when…
Paperdoc\Exceptions\PaperdocException Base (extends RuntimeException)
Paperdoc\Exceptions\ParserException A parser cannot read/decode a file (::forFile($path, $reason, $previous))
Paperdoc\Exceptions\RendererException A renderer cannot serialise a document (::forFormat($fmt, $reason, $previous))
Paperdoc\Exceptions\UnsupportedFormatException Unknown format or extension (::forFormat() / ::forExtension())
Paperdoc\Exceptions\InvalidDocumentException Document is used in an invalid state (e.g. invalid heading level)
use Paperdoc\Exceptions\PaperdocException;

try {
    $doc = Paperdoc::open('report.docx');
} catch (PaperdocException $e) {
    // Any Paperdoc error ends up here.
}

Hyperlinks

Every TextRun can carry an optional Paperdoc\Document\Link\TextLink. Links survive the full round-trip: they're parsed from DOCX (<w:hyperlink>) and rendered natively by the HTML and Markdown renderers.

Add a link programmatically

use Paperdoc\Support\DocumentManager;
use Paperdoc\Document\Section;
use Paperdoc\Document\Link\TextLink;

$doc = DocumentManager::create('md', 'Release notes');
$section = Section::make('main');

$section->addText(
    'See the full changelog',
    null,
    TextLink::make('https://github.com/paperdoc-dev/paperdoc-lib/blob/main/CHANGELOG.md', '', 'Changelog')
);

$doc->addSection($section);
echo DocumentManager::renderAs($doc, 'md');
// [See the full changelog](https://github.com/paperdoc-dev/paperdoc-lib/blob/main/CHANGELOG.md "Changelog")

Supported link flavours

Kind Construction HTML output Markdown output
External URL TextLink::make('https://x.com') <a href="…" target="_blank" rel="noopener noreferrer">…</a> [label](url)
Internal anchor TextLink::make('', 'section-2') <a href="#section-2">…</a> [label](#section-2)
URL + fragment TextLink::make('https://x.com', 'sect-2') <a href="https://x.com#sect-2" …>…</a> [label](url#sect-2)
Tooltip / title TextLink::make('https://x.com', '', 'Open site') <a … title="Open site" …>…</a> [label](url "Open site")

External schemes (http, https, mailto, tel, ftp) automatically get target="_blank" rel="noopener noreferrer" in HTML to prevent tabnabbing. Run styling (bold, italic, color, font) is preserved when combined with a link.

Convert DOCX with hyperlinks to Markdown

use Paperdoc\Support\DocumentManager;

// <w:hyperlink r:id="…"> elements are parsed and attached to their TextRun
$doc = DocumentManager::open('report.docx');

// Links are rendered as safe [label](url) — labels with ] and URLs with spaces
// or parentheses are escaped/wrapped automatically.
file_put_contents('report.md', DocumentManager::renderAs($doc, 'md'));

Configuration

Publish the config (Laravel):

php artisan vendor:publish --tag=paperdoc-config

This creates config/paperdoc.php where you can set the default format, text styles, storage paths, and AI/OCR settings.

Testing

composer test
# or
./vendor/bin/phpunit

Integration tests live in tests/Integration/, unit tests in tests/Unit/.

Architecture

src/
├── Concerns/          # Shared traits
├── Console/           # Artisan commands
├── Contracts/         # DocumentInterface, ParserInterface, BlockElementInterface…
├── Document/          # Core model (Document, Section, Paragraph, Heading, ListBlock, Bookmark, CodeBlock, Blockquote, Metadata…)
├── Enum/              # Format enums
├── Exceptions/        # PaperdocException + typed exceptions
├── Facades/           # Laravel Facade
├── Factory/           # Document/Parser factories
├── Llm/               # AI/LLM integration (Neuron AI)
├── Ocr/               # OCR integration
├── Parsers/           # Format-specific parsers
├── Renderers/         # Format-specific renderers
├── Support/           # DocumentManager and helpers
└── PaperdocServiceProvider.php

Contributing

We welcome contributions! Please read CONTRIBUTING.md before opening a pull request.

Contributors

Thanks to everyone who has contributed to paperdoc-lib. A full list is kept in CONTRIBUTORS.md.

  • Olivier Mourlevat@olivM — DOCX hyperlink parsing, HTML/Markdown hyperlink rendering (#4)

Changelog

See CHANGELOG.md for release history.

License

Paperdoc Library is released under the MIT License — free to use, modify and distribute, commercial or not.

© Paperdoc — paperdoc.dev