cdz/laravel-pagiscrap

PagiScrap will efficiently extract paginated data from any API while optimizing performance with Laravel Queues. Each page is processed asynchronously, ensuring a smooth, non-blocking execution.

Fund package maintenance!
cdz

1.0.1 2025-03-28 15:01 UTC

This package is auto-updated.

Last update: 2025-03-28 15:11:02 UTC


README

Latest Version on Packagist GitHub Tests Action Status Total Downloads

PagiScrap will efficiently extract paginated data from any API while optimizing performance with Laravel Queues. Each page is processed asynchronously, ensuring a smooth, non-blocking execution.

Support

  • PHP >= 8.3
  • Laravel >= 11

Installation

You can install the package via composer:

composer require cdz/laravel-pagiscrap

Usage

API class

First, you need to create a custom class that will handle the API calls. This class should implement the PaginateApiRequestInterface and define its methods:

  • pages(): int - This method returns the total number of pages available for the API calls. It helps determine how many pages need to be processed.

  • process(int $page): mixed - This method handles the API request for the given page. It makes an API call, retrieves the data, updates the total number of pages, and returns the extracted data.

  • after(mixed $result): void: - This method is executed after the data is retrieved. You can use it to format and save the data before using it in your application.

Here is a complete example to illustrate how to use this class:

<?php

use Cdz\PagiScrap\PaginateApiRequestInterface;

class ApiRequest implements PaginateApiRequestInterface {

    protected int $pages = 1;

    function pages(): int
    {
        return $this->pages;
    }

    function process(int $page): mixed
    {
        // Insert your API call logic here
        $api = new MyApi();
        $result = $api->get_page_data($page);

        if ($result) {
            // Update the total number of pages based on the API response (pagination)
            $this->pages = $result['to'];

            // Return data
            return $result['data'];
        }

        return null;
    }

    function after(mixed $result): void
    {
        // Process data 
        foreach ($result as $data) {
            ...
        }
    }
}

Scraping

Next, you can scrape the paginated data

<?php
use Cdz\PagiScrap\PaginateApiScraper;
use Cdz\PagiScrap\Jobs\FixedMultipageScraper;
use Cdz\PagiScrap\Jobs\AdaptiveMultipageScraper;

// Your Custom class
$apiRequest = new ApiRequest();

// Scrape a specified number of pages (3)
FixedMultipageScraper::dispatch($apiRequest, new PaginateApiScraper(), 3);

// Or scrape all pages when the total number of pages is unknown.
AdaptiveMultipageScraper::dispatch($apiRequest, new PaginateApiScraper())

Customisation

You can customize the batch name and define its completion callbacks.

<?php
$apiScraper = new PaginateApiScraper();
$apiScraper->name('Import API data');
$apiScraper->finally(function ($batch){
    Log::info('The import is complete!');
});
$apiScraper->success(function ($batch){
    Log::info('The import was successful!');
});
$apiScraper->error(function ($batch, \Exception $exception){
    Log::info('The import failed!');
});

FixedMultipageScraper::dispatch($apiRequest, $apiScraper, 3);

Run the Queue Worker

php artisan queue:work

Refer to the Laravel documentation for queue configuration options.

Testing

composer test

Troubleshooting

Remember, queue workers are long-lived processes and store the booted application state in memory. As a result, they will not notice changes in your code base after they have been started. So, during your deployment process, be sure to restart your queue workers. In addition, remember that any static state created or modified by your application will not be automatically reset between jobs.

Credits

License

The MIT License (MIT). Please see License File for more information.