talleu / md-to-ooxml
A lightweight PHP library to convert Markdown to Office Open XML (OOXML) and .docx files
Requires
- php: >=8.3
Requires (Dev)
- ext-zip: *
- friendsofphp/php-cs-fixer: ^3.94
- league/commonmark: ^2.8
- phpstan/phpstan: ^2.1
- phpunit/phpunit: ^12.5
Suggests
- ext-zip: Required to generate .docx files with DocxWriter.
- league/commonmark: Install to use the CommonMark parser adapter for robust Markdown parsing.
This package is auto-updated.
Last update: 2026-03-31 19:50:24 UTC
README
A lightweight, extensible PHP library to convert Markdown into Office Open XML (OOXML) — the XML format used inside Microsoft Word .docx files.
Use it to:
- Generate
.docxfiles directly from Markdown (with zero dependency on PHPWord or LibreOffice) - Get raw OOXML strings to embed into existing documents
- Inject Markdown content into
.docxtemplates
Features
| Feature | Built-in Parser | CommonMark Adapter |
|---|---|---|
| Headings (H1–H6) with Word styles | ✅ | ✅ |
| Paragraphs & blank lines | ✅ | ✅ |
| Bold, italic, bold+italic | ✅ | ✅ |
| Underline | ✅ | — |
| ✅ | ✅ | |
Inline code |
✅ | ✅ |
| Fenced code blocks (with language) | ✅ | ✅ |
| Links | ✅ | ✅ |
| Images (text placeholder) | ✅ | ✅ |
| Bullet lists | ✅ | ✅ |
| Ordered lists | ✅ | ✅ |
| Blockquotes | ✅ | ✅ |
| Horizontal rules | ✅ | ✅ |
| Tables | ✅ | ✅ (requires table extension) |
.docx file generation |
✅ | ✅ |
| Template injection | ✅ | ✅ |
Requirements
- PHP 8.3+
ext-zip(only required for.docxgeneration viaDocxWriter)
Installation
composer require talleu/md-to-ooxml
Optional: CommonMark support
composer require league/commonmark
Quick Start
1. Markdown → OOXML string
use Talleu\MdToOoxml\OoXmlConverterFactory; $converter = OoXmlConverterFactory::create(); // Full document XML (document.xml content) $xml = $converter->convert('# Hello **World**'); // Body fragment only (no XML declaration / envelope) $bodyXml = $converter->convertToBodyXml('# Hello **World**');
2. Markdown → .docx file
use Talleu\MdToOoxml\DocxWriter; // One-liner: Markdown → .docx DocxWriter::fromMarkdown('# My Document', '/path/to/output.docx'); // Or with a custom converter (e.g. CommonMark) $converter = OoXmlConverterFactory::createWithCommonMark(); DocxWriter::fromMarkdown($markdown, '/path/to/output.docx', $converter);
3. Inject into an existing .docx template
use Talleu\MdToOoxml\DocxWriter; use Talleu\MdToOoxml\OoXmlConverterFactory; $converter = OoXmlConverterFactory::create(); $bodyXml = $converter->convertToBodyXml('## New Section'); // Append content before </w:body> DocxWriter::injectIntoTemplate( templatePath: '/path/to/template.docx', bodyXml: $bodyXml, outputPath: '/path/to/output.docx', ); // Or replace a placeholder string in the template DocxWriter::injectIntoTemplate( templatePath: '/path/to/template.docx', bodyXml: $bodyXml, outputPath: '/path/to/output.docx', placeholder: '{{CONTENT}}', );
4. Two-step: convert then save manually
use Talleu\MdToOoxml\OoXmlConverterFactory; use Talleu\MdToOoxml\DocxWriter; $converter = OoXmlConverterFactory::create(); $documentXml = $converter->convert($markdown); // You can inspect/modify the XML here if needed DocxWriter::save($documentXml, '/path/to/output.docx');
Parsers
Built-in Parser (zero dependencies)
The default parser handles all common Markdown syntax via regex-based parsing. It's fast, lightweight, and requires no extra packages.
$converter = OoXmlConverterFactory::create();
Limitation: The built-in parser does not support nested inline formatting. For example,
__some *italic* text__will render the underline but treat the*italic*markers as literal text. If you need nested formatting (italic inside bold, underline inside strikethrough, etc.), use the CommonMark adapter below.
League CommonMark Adapter
For advanced Markdown features, use the adapter for league/commonmark:
$converter = OoXmlConverterFactory::createWithCommonMark();
The adapter automatically enables the Table and Strikethrough extensions if available.
Use the CommonMark adapter when you need:
- Nested inline formatting (e.g. bold inside italic, italic inside underline)
- Strict CommonMark compliance
- Edge cases the built-in regex parser may not handle correctly
Custom Parser
Implement MarkdownParserInterface and inject it:
use Talleu\MdToOoxml\OoXmlConverterFactory; $converter = OoXmlConverterFactory::createWithParser(new MyCustomParser());
Architecture
Markdown string
│
▼
┌──────────────────┐
│ Parser │ BlockParser (built-in) or LeagueCommonMarkAdapter
│ (Markdown → AST) │
└──────────────────┘
│
▼
┌──────────────────┐
│ AST (Node tree) │ DocumentNode → ParagraphNode → TextRunNode, etc.
└──────────────────┘
│
▼
┌──────────────────┐
│ Renderer │ NodeRenderer dispatches to per-node RendererInterface
│ (AST → OOXML) │
└──────────────────┘
│
▼
OOXML string
│
▼ (optional)
┌──────────────────┐
│ DocxWriter │ Packages XML into a valid .docx ZIP archive
└──────────────────┘
Node Types
| Node | Description |
|---|---|
DocumentNode |
Root node |
ParagraphNode |
Paragraph |
TitleNode |
Heading (level 1–6) |
TextRunNode |
Inline text with formatting flags |
InlineCodeNode |
Inline code span |
LinkNode |
Hyperlink |
ImageNode |
Image reference |
ListItemNode |
Bullet or ordered list item |
QuoteNode |
Blockquote |
CodeBlockNode |
Fenced code block |
BlankLineNode |
Empty line |
HorizontalRuleNode |
Horizontal rule / thematic break |
TableNode |
Table (contains TableRowNode → TableCellNode) |
Extending
Register a custom renderer for any node type:
use Talleu\MdToOoxml\Renderer\NodeRenderer; use Talleu\MdToOoxml\Renderer\RendererInterface; use Talleu\MdToOoxml\Node\NodeInterface; class MyCustomRenderer implements RendererInterface { public function render(NodeInterface $node): string { // Return OOXML string } } // Get the factory-built converter and add your renderer $converter = OoXmlConverterFactory::create(); // Or build the NodeRenderer manually for full control
Supported Markdown Syntax
# Heading 1 ## Heading 2 ### Heading 3 #### Heading 4 ##### Heading 5 ###### Heading 6 Regular paragraph text. **Bold text** and *italic text* and ***bold italic***. __Underlined text__ and ~~strikethrough~~. `inline code` [Link text](https://example.com)  - Bullet item - Another item * Also bullet + Also bullet 1. Ordered item 2. Another item > Blockquote text --- | Column A | Column B | | -------- | -------- | | Cell 1 | Cell 2 | \```php echo "fenced code block"; \```
Testing
# Run all tests vendor/bin/phpunit # Run only unit tests vendor/bin/phpunit --testsuite Unit # Run only integration tests vendor/bin/phpunit --testsuite Integration # Run only functional tests (requires ext-zip) vendor/bin/phpunit --testsuite Functional
How It Works (OOXML Primer)
A .docx file is a ZIP archive containing XML files. The main one is word/document.xml. This library generates valid OOXML that follows the ECMA-376 specification.
The generated .docx includes:
| File | Purpose |
|---|---|
[Content_Types].xml |
MIME type declarations |
_rels/.rels |
Package-level relationships |
word/document.xml |
The actual document content |
word/_rels/document.xml.rels |
Document-level relationships |
word/numbering.xml |
List (bullet/ordered) definitions |
word/styles.xml |
Heading and default styles |
License
MIT — see LICENSE.