cdz / laravel-pagiscrap
PagiScrap will efficiently extract paginated data from any API while optimizing performance with Laravel Queues. Each page is processed asynchronously, ensuring a smooth, non-blocking execution.
Fund package maintenance!
cdz
Requires
- php: ^8.3
- illuminate/contracts: ^10.0||^11.0||^12.0
- spatie/laravel-package-tools: ^1.16
Requires (Dev)
- larastan/larastan: ^2.9||^3.0
- laravel/pint: ^1.14
- nunomaduro/collision: ^8.1.1||^7.10.0
- orchestra/testbench: ^10.0.0||^9.0.0||^8.22.0
- pestphp/pest: ^3.0
- pestphp/pest-plugin-arch: ^3.0
- pestphp/pest-plugin-laravel: ^3.0
- phpstan/extension-installer: ^1.3||^2.0
- phpstan/phpstan-deprecation-rules: ^1.1||^2.0
- phpstan/phpstan-phpunit: ^1.3||^2.0
This package is auto-updated.
Last update: 2025-03-28 15:11:02 UTC
README
PagiScrap will efficiently extract paginated data from any API while optimizing performance with Laravel Queues. Each page is processed asynchronously, ensuring a smooth, non-blocking execution.
Support
- PHP >= 8.3
- Laravel >= 11
Installation
You can install the package via composer:
composer require cdz/laravel-pagiscrap
Usage
API class
First, you need to create a custom class that will handle the API calls. This class should implement the PaginateApiRequestInterface and define its methods:
-
pages(): int - This method returns the total number of pages available for the API calls. It helps determine how many pages need to be processed.
-
process(int $page): mixed - This method handles the API request for the given page. It makes an API call, retrieves the data, updates the total number of pages, and returns the extracted data.
-
after(mixed $result): void: - This method is executed after the data is retrieved. You can use it to format and save the data before using it in your application.
Here is a complete example to illustrate how to use this class:
<?php use Cdz\PagiScrap\PaginateApiRequestInterface; class ApiRequest implements PaginateApiRequestInterface { protected int $pages = 1; function pages(): int { return $this->pages; } function process(int $page): mixed { // Insert your API call logic here $api = new MyApi(); $result = $api->get_page_data($page); if ($result) { // Update the total number of pages based on the API response (pagination) $this->pages = $result['to']; // Return data return $result['data']; } return null; } function after(mixed $result): void { // Process data foreach ($result as $data) { ... } } }
Scraping
Next, you can scrape the paginated data
<?php use Cdz\PagiScrap\PaginateApiScraper; use Cdz\PagiScrap\Jobs\FixedMultipageScraper; use Cdz\PagiScrap\Jobs\AdaptiveMultipageScraper; // Your Custom class $apiRequest = new ApiRequest(); // Scrape a specified number of pages (3) FixedMultipageScraper::dispatch($apiRequest, new PaginateApiScraper(), 3); // Or scrape all pages when the total number of pages is unknown. AdaptiveMultipageScraper::dispatch($apiRequest, new PaginateApiScraper())
Customisation
You can customize the batch name and define its completion callbacks.
<?php $apiScraper = new PaginateApiScraper(); $apiScraper->name('Import API data'); $apiScraper->finally(function ($batch){ Log::info('The import is complete!'); }); $apiScraper->success(function ($batch){ Log::info('The import was successful!'); }); $apiScraper->error(function ($batch, \Exception $exception){ Log::info('The import failed!'); }); FixedMultipageScraper::dispatch($apiRequest, $apiScraper, 3);
Run the Queue Worker
php artisan queue:work
Refer to the Laravel documentation for queue configuration options.
Testing
composer test
Troubleshooting
Remember, queue workers are long-lived processes and store the booted application state in memory. As a result, they will not notice changes in your code base after they have been started. So, during your deployment process, be sure to restart your queue workers. In addition, remember that any static state created or modified by your application will not be automatically reset between jobs.
Credits
License
The MIT License (MIT). Please see License File for more information.