dachcom-digital / dynamic-search-data-provider-crawler
Installs: 18 551
Dependents: 0
Suggesters: 0
Security: 0
Stars: 8
Watchers: 9
Forks: 7
Open Issues: 0
Type:dynamic-search-provider-bundle
Requires
- dachcom-digital/dynamic-search: ^3.0
- pimcore/pimcore: ^11.0
- vdb/php-spider: ^0.7
Requires (Dev)
- codeception/codeception: ^5.0
- codeception/module-symfony: ^3.1
- phpstan/phpstan: ^1.0
- phpstan/phpstan-symfony: ^1.0
- symplify/easy-coding-standard: ^9.0
README
A spider crawler extension for Pimcore Dynamic Search.
Release Plan
Installation
"require" : { "dachcom-digital/dynamic-search" : "~3.0.0", "dachcom-digital/dynamic-search-data-provider-crawler" : "~3.0.0" }
Dynamic Search Bundle
You need to install / enable the Dynamic Search Bundle first. Read more about it here. After that, proceed as followed:
Add Bundle to bundles.php
:
<?php return [ \DsWebCrawlerBundle\DsWebCrawlerBundle::class => ['all' => true], ];
Basic Setup
dynamic_search: context: default: data_provider: service: 'web_crawler' options: always: own_host_only: true full_dispatch: seed: 'http://your-domain.test' valid_links: - '@^http://your-domain.test.*@i' user_invalid_links: - '@^http://your-domain.test\/members.*@i' single_dispatch: host: 'http://your-domain.test.test' normalizer: service: 'web_crawler_localized_resource_normalizer'
Provider Options
always
full_dispatch
single_dispatch
Resource Normalizer
DefaultResourceNormalizer
Identifier: web_crawler_default_resource_normalizer
Normalize simple documents
Options: none
LocalizedResourceNormalizer
Identifier: web_crawler_localized_resource_normalizer
Scaffold localized documents
Options:
Transformer
Scaffolder
HttpResponseHtmlDataScaffolder
Identifier: http_response_html_scaffolder
Simple object scaffolder.
Supported types: VDB\Spider\Resource
with content-type text/html
.
HttpResponsePdfDataScaffolder
Identifier: http_response_pdf_scaffolder
Simple object scaffolder.
Supported types: VDB\Spider\Resource
with content-type application/pdf
.
PimcoreElementScaffolder
Identifier: pimcore_element_scaffolder
Simple object scaffolder.
Supported types: Asset
, Document
, DataObject\Concrete
.
Field Transformer
UriExtractor
Identifier: resource_uri_extractor
Supported Scaffolder: http_response_html_scaffolder
, http_response_pdf_scaffolder
Return Type: string|null
Options: none
LanguageExtractor
Identifier: resource_language_extractor
Supported Scaffolder: http_response_html_scaffolder
, http_response_pdf_scaffolder
Return Type: string|null
Options: none
MetaExtractor
Identifier: resource_meta_extractor
Supported Scaffolder: http_response_html_scaffolder
Return Type: string|null
Options:
HtmlTagExtractor
Identifier: resource_html_tag_content_extractor
Supported Scaffolder: http_response_html_scaffolder
Return Type: string|null
Options: none
TextExtractor
Identifier: resource_text_extractor
Supported Scaffolder: http_response_html_scaffolder
, http_response_pdf_scaffolder
Return Type: string|null
TitleExtractor
Identifier: resource_title_extractor
Supported Scaffolder: http_response_html_scaffolder
, http_response_pdf_scaffolder
Return Type: string|null
Options: none
Copyright and License
Copyright: DACHCOM.DIGITAL
For licensing details please visit LICENSE.md
Upgrade Info
Before updating, please check our upgrade notes!