duncan3dc / domparser
Wrappers for the standard DOMDocument class to provide extra functionality for html/xml parsing
Installs: 292 587
Dependents: 3
Suggesters: 0
Security: 0
Stars: 9
Watchers: 6
Forks: 4
Open Issues: 1
Requires
- php: ^7.3 || ^8.0
- ext-dom: *
- ext-libxml: *
Requires (Dev)
- maglnet/composer-require-checker: ^2.0 || ^3.0
- mockery/mockery: ^1.4
- phpstan/phpstan: ^0.12.94
- phpstan/phpstan-mockery: ^0.12.14
- phpstan/phpstan-phpunit: ^0.12.22
- phpunit/phpunit: ^8.5
- squizlabs/php_codesniffer: ^3.6
README
Wrappers for the PHP DOMDocument class to provide extra functionality for html/xml parsing
Constructor Arguments
There are 2 constructors available, one for the HtmlParser class and one for the XmlParser class, they both work in the same way.
- Only 1 parameter is available, which should be passed as a string, and contain the content to parse (xml/html)
- Warnings are captured during the loading of the content using libxml_use_internal_errors() and libxml_get_errors(), these are available from the errors property after the class has been initiated
Public Properties
- html/xml (string) - Depending on which class was used one of these properties will be present and contain the content used for parsing
- errors (array) - If any errors were encountered during parsing they will be in this array (see [Constructor Arguments])
- dom (DOMDocument) - This is the internal instance of the DOMDocument class used
- mode (string) - Indicates whether the parser is operating in html or xml mode
Public Methods
- getTags(string $tagName): array - Similar to DOMDocument::getElementsByTagName() except a standard array is returned.
Alias: getElementsByTagName() - getTag(string $tagName): Element - Similar to getTags() but will return the first element matched, instead of an array of elements.
Alias: getElementByTagName() - getElementsByClassName(mixed $className): array - Matches elements that have a class attribute that matches the string parameter.
If you want to find elements with multiple classes, pass the $className as an array of classes, and only elements that have all classes will be returned - getElementByClassName(mixed $className): Element - Similar to getElementsByClassName() except this will return the first element matched
- output(): string - Returns the xml/html repesented by the receiver, formatted using DOMDocument::formatOutput
- xpath(string $path): array - Returns an array of elements matching the $path
Dom Elements
All of the methods that return elements return them as instances of the Element class, this acts similarly to the standard DOMElement class, except it has all of the above methods available, and the nodeValue property has leading and trailing whitespace removed using trim()
Examples
The parser classes use a namespace of duncan3dc\Dom
use duncan3dc\Dom\Html\Parser; use duncan3dc\Dom\Xml\Parser;
$parser = new Parser(file_get_contents("http://example.com")); echo "Page Title: " . $parser->getTag("title")->nodeValue . "\n"; $contentType = false; $meta = $parser->getTags("meta"); foreach($meta as $element) { if($element->getAttribute("http-equiv") == "Content-type") { $contentType = $element->getAttribute("content"); } } echo "Content Type: " . (($contentType) ?: "NOT FOUND") . "\n";
duncan3dc/domparser for enterprise
Available as part of the Tidelift Subscription
The maintainers of duncan3dc/domparser and thousands of other packages are working with Tidelift to deliver commercial support and maintenance for the open source dependencies you use to build your applications. Save time, reduce risk, and improve code health, while paying the maintainers of the exact dependencies you use. Learn more.