sukohi / search-bot
Laravel package to crawl websites.
1.0.5
2017-02-15 09:30 UTC
Requires
- fabpot/goutte: ^3.2
- laravel/framework: ~5.0
- sukohi/laravel-absolute-url: 1.*
This package is not auto-updated.
Last update: 2024-11-09 21:27:23 UTC
README
Laravel package to crawl websites.(Laravel 5+)
Requirements
Installation
Execute the next command.
composer require sukohi/search-bot:1.*
Set the service providers in app.php
'providers' => [
...Others...,
Sukohi\SearchBot\SearchBotServiceProvider::class,
Sukohi\LaravelAbsoluteUrl\LaravelAbsoluteUrlServiceProvider::class,
]
Also alias
'aliases' => [
...Others...,
'LaravelAbsoluteUrl' => Sukohi\LaravelAbsoluteUrl\Facades\LaravelAbsoluteUrl::class,
'SearchBot' => Sukohi\SearchBot\Facades\SearchBot::class,
]
Then execute the next commands.
php artisan vendor:publish
php artisan migrate
Now you have config/search_bot.php
which you can set domains restrictions.
Config
return [
'main' => '*',
'yahoo' => ['yahoo.com', 'www.yahoo.com'],
'reddit' => ['www.reddit.com']
];
- If you don't need to set restriction, set
*
.
Usage
$starting_url = 'http://yahoo.com';
$options = [
'type' => 'main', // $type is optional.(Default: main),
'url_deletion' => true // Default: true
];
$result = \SearchBot::request($starting_url, $options);
if($result->exists()) {
// Symfony\Component\BrowserKit\Response
// See http://api.symfony.com/2.3/Symfony/Component/BrowserKit/Response.html
$response = $result->response();
// Symfony\Component\DomCrawler/Crawler
// See http://api.symfony.com/2.3/Symfony/Component/DomCrawler/Crawler.html
$crawler = $result->crawler();
$result->links(function($url, $text){
// All links including URL & text will come here.
});
$result->queues(function($crawler_queue, $url, $text){
// All links that do not exist in DB will come here.
// $crawler_queue has already type and url.
$crawler_queue->save();
});
} else {
$e = $result->exception();
echo $e->getMessage();
$type = $result->type();
$url = $result->url();
}
Options
-
type
Type is string that you can decide freely.
Default ismain
. -
url_deletion
If true here, URL accessed will be removed from DB.
Default istrue
.
License
This package is licensed under the MIT License.
Copyright 2017 Sukohi Kuhoh