xparse / parser
Parser client
Installs: 37 045
Dependents: 1
Suggesters: 0
Security: 0
Stars: 2
Watchers: 5
Forks: 6
Open Issues: 1
Requires
- php: ^8.1
- ext-dom: *
- ext-mbstring: *
- guzzlehttp/guzzle: ^6.3 || ^7.3
- guzzlehttp/psr7: ^2.4.5
- psr/http-message: ^1.0 || ^2.0
- xparse/element-finder: ^2.0
Requires (Dev)
- ext-iconv: *
- phpunit/phpunit: ^9.3
- rector/rector: ^0.19
- roave/security-advisories: dev-latest
- symplify/easy-coding-standard: ^12.1
This package is auto-updated.
Last update: 2024-11-23 08:52:05 UTC
README
Parser client
Install
Via Composer
$ composer require xparse/parser
Usage
$parser = new \Xparse\Parser\Parser(); $title = $parser->get('http://funivan.com')->content('//*[@class="entry-title"]/a'); print_r($title);
Using with custom Middleware
If you are using custom Guzzle Middleware and it doesn't send real requests, in order to get last effective url you need to set it to response X-GUZZLE-EFFECTIVE-URL
header manually.
Here is an example of __invoke()
method in your custom Middleware
public function __invoke(callable $handler) : \Closure { return function (RequestInterface $request, array $options) use ($handler) { # some code return $handler($request, $options)->then(function (ResponseInterface $response) use ($request) { $response = $response->withHeader('X-GUZZLE-EFFECTIVE-URL', $request->getUri()); # some code return $response; }); }; }
Recursive Parser
Recursive Parser allows you to parse website pages recursively. You need to pass link from where to start and set next page expression (xPath, css, etc).
Basic Usage
Try to find all links to the repositories on github. Our query will be xparse
.
With recursive pagination we can traverse all pagination links and process each resulting page to fetch repositories links.
use Xparse\Parser\Parser; use Xparse\Parser\RecursiveParser; # init Parser $parser = new Parser(); # set expression to pagination links and initial page url $pages = new RecursiveParser( $parser, ["//*[@class='pagination']//a/@href"], ['https://github.com/search?q=xparse'] ); $allLinks = []; foreach($pages as $page){ # set expression to fetch repository links $adsList = $page->value("//*[@class='repo-list-name']//a/@href")->all(); # merge and remove duplicates $allLinks = array_values(array_unique(array_merge($allLinks, $adsList))); } print_r($allLinks);
Testing
./vendor/bin/phpunit
Contributing
Please see CONTRIBUTING for details.
Credits
License
The MIT License (MIT). Please see License File for more information.