teamtnt/crawler

Distributed Crawler Architecture

Installs: 1

Dependents: 0

Suggesters: 0

Security: 0

Stars: 9

Watchers: 5

Forks: 3

Open Issues: 3

Type:project

dev-master 2019-06-10 19:00 UTC

README

A distributed crawler

Requirements

Installation

Via Composer:

composer require teamtnt/crawler

Configuration

Each instance needs to have an identifier. This can be added in .env

NODE_NAME="Instance 1"

The domain feeder needs to start with a seed domain. After that, running

php artisan crawler

For scraping a single url

php artisan url:frontier www.example.com/something

Crawler Topology

Crawler Topology

Domain Feeder

Domain Feeder

Single Instance

Single Instance

URL Frontier

URL Frontier