innmind / crawler
Library to extract meaningful informations out of a webpage
6.1.0
2021-02-13 11:35 UTC
Requires
- php: ~7.4|~8.0
- innmind/colour: ~3.0
- innmind/filesystem: ~4.0
- innmind/html: ~5.0
- innmind/http: ~4.0
- innmind/http-transport: ~5.0
- innmind/immutable: ~3.3
- innmind/math: ~5.0
- innmind/time-continuum: ~2.0
- innmind/url-resolver: ~4.0
Requires (Dev)
- innmind/coding-standard: ^1.1
- phpunit/phpunit: ~9.0
- vimeo/psalm: ~4.4
README
This tool allows you to extract a lot of useful informations out of a web page (may it be html, an image, or anything else).
Installation
composer require innmind/crawler
Usage
use function Innmind\Crawler\bootstrap; use Innmind\OperatingSystem\Factory; use Innmind\UrlResolver\UrlResolver; use Innmind\Url\Url; use Innmind\Http\{ Message\Request\Request, Message\Method\Method, ProtocolVersion, }; use function Innmind\Html\bootstrap as reader; $os = Factory::build(); $crawl = bootstrap( $os->remote()->http(), $os->clock(), reader(), new UrlResolver ); $resource = $crawl( new Request( Url::of('https://en.wikipedia.org/wiki/H2g2'), new Method('GET'), new ProtocolVersion(2, 0), ), );
Here $resource
is an instance of HttpResource
.