richarddobron/dom-forge

Lightweight PHP library for parsing and manipulating HTML documents.

Installs: 30

Dependents: 1

Suggesters: 0

Security: 0

Stars: 0

Watchers: 0

Forks: 0

Open Issues: 0

pkg:composer/richarddobron/dom-forge

2.0.0 2026-01-11 20:05 UTC

This package is auto-updated.

Last update: 2026-01-12 17:51:12 UTC


README

PHP HTML PARSER

Lightweight PHP library for parsing and manipulating HTML documents.

📖 Requirements

  • PHP 7.0 or higher
  • Composer is required for installation
  • PHP Extensions: iconv, ext-mbstring

📦 Installation

Install the library using Composer:

$ composer require richarddobron/dom-forge

⚡️ Quick Start

Creating DomForge Instance

// Without configuration
$dom1 = DomForge::fromHtml('<div class="container"><p>Hello World</p></div>');
$dom2 = DomForge::fromFile('/path/to/file.html');

// With configuration
$config = (new Configuration())
    ->setLowercase(true)
    ->setRemoveLineBreaks(true)
    ->setTargetCharset('UTF-8');

$dom1 = DomForge::fromHtml('<div>Content</div>', $config);
$dom2 = DomForge::fromFile('/path/to/file.html', $config);

Finding Elements

// Find all matching elements
$elements = $dom->find('div.container');

// Find first matching element
$element = $dom->findOne('p');

// Find by index (0-based)
$element = $dom->find('p', 0);

// Find with CSS selectors
$dom->find('#id');           // By ID
$dom->find('.class');        // By class
$dom->find('div p');         // Descendant
$dom->find('div > p');       // Direct child
$dom->find('div + p');       // Adjacent sibling
$dom->find('div, p');        // Multiple selectors
$dom->find('[attribute]');   // Has attribute
$dom->find('[attr=value]');  // Attribute equals
$dom->find('[attr^=val]');   // Attribute starts with
$dom->find('[attr$=val]');   // Attribute ends with
$dom->find('[attr*=val]');   // Attribute contains

Getting Content

$element = $dom->findOne('div');

// Get inner HTML (child elements as HTML string)
$inner = $element->innerHtml();

// Get outer HTML (including the element itself)
$outer = $element->outerHtml();

// Get text content (strips HTML tags)
$text = $element->textContent();

// Using magic properties
$inner = $element->innerHtml;
$outer = $element->outerHtml;
$text = $element->textContent;

Setting Content

$element = $dom->findOne('div');

// Set inner HTML
$element->innerHtml = '<p>New content</p>';

// Set outer HTML (replaces the element)
$element->outerHtml = '<span>Replaced</span>';

Creating Elements

// Create an element
$span = $dom->createElement('span');
$span = $dom->createElement('span', 'Hello World');  // with content
$span = $dom->createElement('a', 'Click', ['href' => 'https://example.com', 'class' => 'btn']);

// Create a text node
$textNode = $dom->createTextNode('Hello World');

// Create a comment
$comment = $dom->createComment('This is a comment');

DOM Manipulation

// Append a child element
$container = $dom->findOne('#container');
$newElement = $dom->createElement('p', 'New paragraph');
$container->appendChild($newElement);

// Remove a child element
$child = $container->firstChild();
$container->removeChild($child);

// Insert before another element
$ul = $dom->findOne('ul');
$newLi = $dom->createElement('li', 'First item');
$existingLi = $ul->firstChild();
$ul->insertBefore($newLi, $existingLi);

// Check if element has children
if ($container->hasChildren()) {
    // ...
}

Working with Attributes

$element = $dom->findOne('a');

// Get attribute
$href = $element->getAttribute('href');

// Check if attribute exists
if ($element->hasAttribute('target')) {
    // ...
}

// Set attribute
$element->setAttribute('class', 'link');

// Get all attributes
$attrs = $element->getAttributes();

// Remove attribute
$element->removeAttribute('class');

// Magic property access
$href = $element->href;
$element->class = 'active';

Node Type Checking

$node = $dom->findOne('div');

$node->isElement();   // true for HTML elements
$node->isText();      // true for text nodes
$node->isComment();   // true for comment nodes
$node->isSelfClosing();  // true for self-closing tags (br, img, etc.)
$node->isNamespacedElement();  // true for elements like fbt:param

Traversing Nodes

$element = $dom->findOne('ul');

// Get parent
$parent = $element->parent;

// Get children
$children = $element->children();         // All child nodes
$firstChild = $element->children(0);      // First child by index
$firstChild = $element->firstChild();     // First child
$lastChild = $element->lastChild();       // Last child

// Get siblings
$next = $element->nextSibling();          // Next sibling element
$prev = $element->previousSibling();      // Previous sibling element

// Get nodes (includes text nodes)
$nodes = $element->nodes;

Lookup Methods

// Get element by ID
$element = $dom->getElementById('myId');

// Get elements by ID (with index support)
$elements = $dom->getElementsById('myId');
$element = $dom->getElementsById('myId', 0);

// Get element by tag name
$element = $dom->getElementByTagName('div');

// Get elements by tag name
$elements = $dom->getElementsByTagName('p');
$element = $dom->getElementsByTagName('p', 0);

Callbacks

$dom = DomForge::fromHtml('<div><p>Test</p></div>');

// Set a callback that runs when getting outerHtml
$dom->setCallback(function ($node) {
    // Process each node
});

// Remove callback
$dom->removeCallback();

Saving Output

// Get HTML as string
$html = $dom->save();

// Save to file
$dom->save('/path/to/file.html');

// Or use __toString
$html = (string) $dom;

⚙️ Configuration Options

You can customize the library with the following methods:

Method Description Default
setTargetCharset(string $targetCharset) Sets the target character set for the HTML content. 'UTF-8'
setLowercase(bool $lowercase) Enables or disables converting tag and attribute names to lowercase. true
setForceTagsClosed(bool $forceTagsClosed) Enables or disables forcing all tags to be closed. true
setRemoveLineBreaks(bool $removeLineBreaks) Enables or disables stripping of \r and \n characters from the HTML content. false
setDefaultBrText(string $defaultBrText) Sets the default text to use for <br> tags. "\n"
setDefaultSpanText(string $defaultSpanText) Sets the default text to use for <span> tags. ''
setSelfClosingTags(array $selfClosingTags) Sets the extra list of self-closing tags. null

📅 Change Log

Please see CHANGELOG for more information on what has changed recently.

🧪 Testing

$ composer tests

🤝 Contributing

Please see CONTRIBUTING for details.

⚖️ License

This repository is MIT licensed, as found in the LICENSE file.