wiki-connect/parsewiki

A library that helps parse wikitext template data

Maintainers

Package info

gerrit.wikimedia.org/r/mediawiki/tools/ParseWiki

pkg:composer/wiki-connect/parsewiki

Statistics

Installs: 31

Dependents: 0

Suggesters: 0

2.0 2025-07-18 22:35 UTC

This package is auto-updated.

Last update: 2026-05-19 00:49:40 UTC


README

A powerful PHP library for parsing MediaWiki-style content from raw wiki text.

๐Ÿ“š Overview

This library allows you to extract:

  • Templates (single, multiple, nested)
  • Internal wiki links
  • External links
  • Citations (references)
  • Categories (with or without display text)
  • Tables (complete MediaWiki table syntax support) Perfect for handling wiki-formatted text in PHP projects.

๐Ÿ—‚๏ธ Project Structure

  • ParserTemplates: Parses multiple templates.
  • ParserTemplate: Parses a single template.
  • ParserInternalLinks: Parses internal wiki links.
  • ParserExternalLinks: Parses external links.
  • ParserCitations: Parses citations and references.
  • ParserCategories: Parses categories from wiki text.
  • ParserTable: Parses MediaWiki tables with full syntax support.
  • DataModel classes:
    • Attribute
    • Citation
    • ExternalLink
    • InternalLink
    • Parameters
    • Template
    • Table
    • Cell
    • LegacyTableCompatibility (backward compatibility trait)
  • tests/: Contains PHPUnit test files:
    • ParserCategoriesTest
    • ParserCitationsTest
    • ParserExternalLinksTest
    • ParserInternalLinksTest
    • ParserSectionTest
    • ParserTableTest (39 tests)
    • ParserTemplatesTest
    • ParserTemplateTest
    • DataModel tests:
      • AttributeTest
      • CellTest
      • ParametersTest
      • TableTest (46 tests with 106 assertions)
      • TemplateTest
  • demo/: Live HTML testing interface:
    • index.html - Interactive demo frontend
    • parser.php - Backend API for real-time parsing

๐Ÿš€ Features

  • โœ… Parse single and multiple templates.
  • โœ… Support nested templates.
  • โœ… Handle named and unnamed template parameters.
  • โœ… Extract internal links with or without display text.
  • โœ… Extract external links with or without labels.
  • โœ… Parse citations including attributes and special characters.
  • โœ… Parse categories, support custom namespaces, handle whitespaces and special characters.
  • โœ… Full MediaWiki table syntax support with advanced features:
    • โœ… Multi-line cell content (paragraphs, lists, complex markup)
    • โœ… Rowspan and colspan attributes with proper span matrix handling
    • โœ… HTML wrapper detection (automatically strips <div> wrappers)
    • โœ… Accessibility support (scope attributes for screen readers)
    • โœ… Complex attribute parsing (style, alignment, etc.)
    • โœ… Nested table support
    • โœ… Caption and header attributes

๐Ÿงฉ Wikitext Features Support

FeatureRead โœ…Modify โœ๏ธReplace ๐Ÿ”„
Templatesโœ… Yesโœ… Yesโœ… Yes
Parametersโœ… Yesโœ… Yesโœ… Yes
Citationsโœ… Yesโœ… Yesโœ… Yes
Citations>Attributesโœ… Yesโœ… Yesโœ… Yes
Internal Linksโœ… Yes
External Linksโœ… Yes
Categoriesโœ… Yes
Tablesโœ… Yesโœ… Yesโœ… Yes
Tables>Attributesโœ… Yesโœ… Yesโœ… Yes
Tables>Cellsโœ… Yesโœ… Yesโœ… Yes
Tables>Headersโœ… Yesโœ… Yesโœ… Yes
Tables>Captionsโœ… Yesโœ… Yesโœ… Yes
Tables>Rowspan/Colspanโœ… Yesโœ… Yesโœ… Yes
Tables>Multi-lineโœ… Yesโœ… Yesโœ… Yes
Tables>HTML Wrappersโœ… Yesโœ… Yesโœ… Yes
Tables>Nestedโœ… Yes
HTML Tags
Parser Functions
Sections
Magic Words

๐ŸŸก Note: Some features are partially supported or under development. Contributions are welcome!

๐Ÿ“‹ Table Usage Examples

Basic Table Parsing

use WikiConnect\ParseWiki\ParserTable;

$wikitext = <<<WIKI
{| class="wikitable"
|+ Table Caption
|-
! Header 1 !! Header 2
|-
| Cell 1 || Cell 2
|-
| Cell 3 || Cell 4
|}
WIKI;

$parser = new ParserTable($wikitext);
$table = $parser->parse();

// Access table properties
echo $table->getCaption(); // "Table Caption"
echo $table->getAttributes()['class']; // "wikitable"

// Access headers
$headers = $table->getHeaders();
echo $headers[0]->getContent(); // "Header 1"

// Access data
$rows = $table->getRows();
echo $rows[0][0]->getContent(); // "Cell 1"

Advanced Table Features

// Table with attributes, spans, and accessibility
$complexTable = <<<WIKI
{| class="wikitable" style="width: 100%;"
! scope="col" colspan="2" | Product Info
! scope="col" | Price
|-
! scope="row" | Bread
| 2 loaves
| style="text-align: right;" | $3.50
|-
| rowspan="2" | Dairy
| Milk (1 gallon)
| align="right" | $4.25
|}
WIKI;

$parser = new ParserTable($complexTable);
$table = $parser->parse();

// Check spans
$headers = $table->getHeaders();
echo $headers[0]->getColSpan(); // 2
echo $headers[0]->getScope(); // "col"

// Check alignment
$rows = $table->getRows();
echo $rows[0][2]->getAlign(); // "right"

Multi-line Cell Content

// Table with complex multi-line content
$multiLineTable = <<<WIKI
{|
|Lorem ipsum dolor sit amet,
consetetur sadipscing elitr,
sed diam nonumy eirmod tempor invidunt
ut labore et dolore magna aliquyam erat.

At vero eos et accusam et justo duo dolores
et ea rebum. Stet clita kasd gubergren.
|
* Lorem ipsum dolor sit amet
* consetetur sadipscing elitr
* sed diam nonumy eirmod tempor invidunt
|}
WIKI;

$parser = new ParserTable($multiLineTable);
$table = $parser->parse();

// Access multi-line content
$rows = $table->getRows();
echo $rows[0][0]->getRawContent(); // Full paragraph content
echo $rows[0][1]->getRawContent(); // List content with bullets

HTML Wrapper Support

// Parser automatically handles HTML wrappers
$wrappedTable = <<<HTML
<div class="noresize">
{| class="wikitable"
! colspan="6" |Shopping List
|-
| rowspan="2" |Bread & Butter
| Pie
| Buns
|}
</div>
HTML;

$parser = new ParserTable($wrappedTable);
$table = $parser->parse(); // Automatically strips <div> wrapper

// Access span attributes
$headers = $table->getHeaders();
echo $headers[0]->getAttributes()['colspan']; // "6"

$rows = $table->getRows();
echo $rows[0][0]->getAttributes()['rowspan']; // "2"

๐ŸŒ Demo Interface

The library includes HTML demo interface files for testing:

  • demo/index.html - Interactive frontend interface
  • demo/parser.php - Backend API for parsing

Demo Features:

  • โœ… Real-time parsing for all parser types
  • โœ… Tabbed output (Parsed Data, JSON, Methods)
  • โœ… Example library with pre-built syntax examples
  • โœ… Responsive design for desktop and mobile
  • โœ… Statistics display (cell counts, attributes, etc.)
  • โœ… Error handling with detailed messages

Supported Parsers in Demo:

  • Templates Parser
  • Single Template Parser
  • Table Parser (with full span and multi-line support)
  • Internal Links Parser
  • External Links Parser
  • Citations Parser
  • Categories Parser
  • Sections Parser

๐Ÿ†• Recent Enhancements

Table Parser Improvements (2025)

  • โœ… Multi-line cell support: Handles paragraphs, lists, and complex content spanning multiple lines
  • โœ… Rowspan/colspan detection: Proper span matrix handling for advanced table layouts
  • โœ… HTML wrapper stripping: Automatically detects and removes <div>, <span>, and other HTML wrappers
  • โœ… Enhanced attribute parsing: Support for complex CSS styles and accessibility attributes
  • โœ… Content consistency: Unified getRawContent() and getContent() methods across all data models
  • โœ… 100% MediaWiki compatibility: Tested against all official MediaWiki table examples

Testing & Development

  • โœ… Comprehensive test suite: 169 tests with 533 assertions (100% success rate)
  • โœ… TableTest.php: 46 dedicated tests for Table data model and LegacyTableCompatibility
  • โœ… Live demo interface: Real-time testing with JSON output and method documentation
  • โœ… Official examples: Validated against MediaWiki.org documentation examples

โš™๏ธ Requirements

  • PHP 8.0 or higher
  • PHPUnit 9 or higher

๐Ÿ’ป Installation

composer require wiki-connect/parsewiki

Make sure you have proper PSR-4 autoloading for the WikiConnect\ParseWiki namespace.

๐Ÿงช Running Tests

vendor/bin/phpunit tests