usox / html2text
A hacklang script to convert HTML into a plain text format
Installs: 11
Dependents: 0
Suggesters: 0
Security: 0
Stars: 1
Watchers: 3
Forks: 0
Open Issues: 0
Language:HTML
Requires
- ext-libxml: *
- hhvm: ^4
Requires (Dev)
- facebook/fbexpect: ^2.3
- hhvm/hacktest: ^1.4
- hhvm/hhast: ^4.0
This package is auto-updated.
Last update: 2024-11-28 00:36:27 UTC
README
html2text is a very simple script that uses DOM methods to convert HTML into a format similar to what would be rendered by a browser - perfect for places where you need a quick text representation. For example:
<html> <title>Ignored Title</title> <body> <h1>Hello, World!</h1> <p>This is some e-mail content. Even though it has whitespace and newlines, the e-mail converter will handle it correctly. <p>Even mismatched tags.</p> <div>A div</div> <div>Another div</div> <div>A div<div>within a div</div></div> <a href="http://foo.com">A link</a> </body> </html>
Will be converted into:
Hello, World!
This is some e-mail content. Even though it has whitespace and newlines, the e-mail converter will handle it correctly.
Even mismatched tags.
A div
Another div
A div
within a div
[A link](http://foo.com)
Installing
You can use Composer to add the package to your project:
{ "require": { "usox/html2text": "^1" } }
And then use it quite simply:
$converter = new \Usox\Html2Text() $text = $converter->convert($html);
You can also include the supplied html2text.php
and use $text = convert_html_to_text($html);
instead.
Options
Pass along options as a second argument to convert
, for example:
$html = 'some fine html'; $options = dict[ 'ignore_errors' => true, // other options go here ]; $converter = new \Usox\Html2Text() echo $converter->convert($html, $options);
Tests
Some very basic tests are provided in the tests/
directory. Run them with composer install && vendor/bin/hacktest tests
.
License
html2text
is licensed under MIT, making it suitable for both Eclipse and GPL projects.
Other versions
This is a port of the php version found here.