starfruit / crawler-bundle
Starfruit Crawler Bundle
Installs: 63
Dependents: 0
Suggesters: 0
Security: 0
Stars: 0
Watchers: 0
Forks: 0
Open Issues: 0
pkg:composer/starfruit/crawler-bundle
Requires
- google/apiclient: ^2.18
- google/auth: ^1.48
README
Starfruit Crawler Bundle
Requirements
Google Cloud
- Create a new project then enable below libraries:
- Create a service account and download JSON credentials file
Installation
composer require starfruit/crawler-bundle
OR
composer require starfruit/crawler-bundle --ignore-platform-req=ext-amqp
- Update
config/bundles.phpfile:
return [ .... Starfruit\CrawlerBundle\StarfruitCrawlerBundle::class => ['all' => true], ];
Setup
- Create a new variable in
.envfile:
# path to file Google Cloud JSON, example:
CRAWLER_BUNDLE_GOOGLE_JSON=/root/project/public/crawler-google-credential.json
- Update
config/config.yamlfile:
imports:
- { resource: 'local/' }
pimcore:
...
...
# config for crawler bundle
starfruit_crawler:
target:
class_object: # list of classname as key, and fields
News: # name of class
content_field: 'content' # field to paste crawled content
last_version_field: 'importUrl' # field to store last version, can be null
Event: # name of class
content_field: 'mainContent'
# custom asset path in Admin to store images, media
asset_store_path: '/default-crawler-media/image'
# custom format for html after crawling
content_format:
heading:
# all default config to mapping headling value to html tag
default: 'p' # default tag
HEADING_1: 'h1'
HEADING_2: 'h2'
HEADING_3: 'h3'
HEADING_4: 'h4'

