creode / craft-page-crawler
This will allow a page to be crawled for useful content during an indexing process.
Package info
github.com/creode/craft-page-crawler
Type:craft-plugin
pkg:composer/creode/craft-page-crawler
1.1.0
2023-03-03 11:59 UTC
Requires
- craftcms/cms: ^4.0
- phpquery/phpquery: ^0.0.4
Requires (Dev)
README
Versions
For details about which version of this package to use with your version of Craft CMS please see the table below:
| Craft Version | Page Crawler Version |
|---|---|
| <4.0.0 | 0.x |
| 4.0.0 | 1.x |
Required config file
Please include and populate the config file "config/page-crawler.php". Use the following as a starting point.
<?php
use craft\helpers\App;
return [
/*
* CSS selectors for elements which should be removed from rendered page markup during a page crawl.
*/
'elementsToRemove' => [
],
/**
* If site is behind a htaccess password, you can add the below variables to your .env to determine if we need
* to bypass it.
*/
'http-auth-credentials' => [
'username' => App::env('PAGE_CRAWLER_AUTH_USER'),
'password' => App::env('PAGE_CRAWLER_AUTH_PASSWORD')
],
];
Performing a crawl
You can perform a crawl using PHP by calling the following function. This function accepts a relative page path and will return all relevant page content as text.
$content = \creode\pagecrawler\Plugin::$plugin->crawlerService->crawl($pagePath);