tomzx/html-parser

An HTML parser written in PHP

v0.1.0 2016-01-24 13:58 UTC

This package is auto-updated.

Last update: 2024-12-10 08:35:12 UTC


README

License Latest Stable Version Latest Unstable Version Build Status Code Quality Code Coverage Total Downloads

An HTML parser written in PHP. Based on nikic's PHP Parser.

Getting started

HTML parser goal is to simplify the traversal/modification of an HTML tree using the visitor pattern.

First, you'll want to parse your HTML using the Parser in order to generate a data structure appropriate for the NodeTraverser. Once that is done, you specify one or many visitors that implement the operation you want to apply on the HTML elements. Then, you traverse the HTML tree structure, which will call the visitors on every element entry/exit. Finally, you may print back the final output as a string.

<?php

$code = file_get_contents('input.html');

$parser = new Parser();
$statements = $parser->parse($code);

$traverser = new NodeTraverser();
$traverser->addVisitor(new ElementStripper(['head', 'a'])); // A visitor which removes any element of a specific type

$statements = $traverser->traverse($statements);

$printer = new Printer();
$printer->output($statements);

License

The code is licensed under the MIT license. See LICENSE.