juvo / as-processor
Process huge datasets for import or sync with ease.
Installs: 1 381
Dependents: 0
Suggesters: 0
Security: 0
Stars: 0
Watchers: 1
Forks: 0
pkg:composer/juvo/as-processor
Requires
- php: >=8.1.0
- cardinalby/content-disposition: ^1.1
- halaxa/json-machine: ^1.2
- league/csv: ^9.0
- phpoffice/phpspreadsheet: ^2.1
- phpseclib/phpseclib: ^3.0.37
- sabre/dav: ^4.6
- woocommerce/action-scheduler: ^3.7
Requires (Dev)
- phpstan/extension-installer: ^1.1
- phpstan/phpstan: ^1.10.6
- phpunit/phpunit: 10
- szepeviktor/phpstan-wordpress: ^v1.1.7
- wp-coding-standards/wpcs: ^3.1
- dev-main
- 3.2.1
- 3.2.0
- 3.1.3
- 3.1.2
- 3.1.1
- 3.1.0
- 3.0.5
- 3.0.4
- 3.0.3
- 3.0.2
- 3.0.1
- 3.0.0
- 2.3.10
- 2.3.9
- 2.3.8
- 2.3.7
- 2.3.6
- 2.3.5
- 2.3.4
- 2.3.3
- 2.3.2
- 2.3.1
- 2.3.0
- 2.2.0
- 2.1.4
- 2.1.3
- 2.1.2
- 2.1.1
- 2.1.0
- 2.0.7
- 2.0.6
- 2.0.5
- 2.0.4
- 2.0.3
- 2.0.2
- 2.0.1
- 2.0.0
- 1.0.4
- 1.0.3
- 1.0.2
- 1.0.1
- 1.0.0
- dev-array-helper-allow-indexed-keys
- dev-improve-tracking-chunk-spawner
- dev-copilot/add-tracking-deleted-chunks
- dev-replace-deprecated-leaguecsv-functions
- dev-copilot/refactor-sync-lifecycle-hooks
- dev-copilot/add-start-hook-for-sync-actions
- dev-feature/add-rest-api
- dev-fix/sequential-sync
- dev-fix/empty-check
- dev-improve-lifecycle-hooks
- dev-fix/chunker-filename
- dev-renovate/configure
This package is auto-updated.
Last update: 2025-11-10 15:34:38 UTC
README
The AS Processor library is a robust synchronization and data chunking framework designed specifically for WordPress environments. Leveraging asynchronous task management through the Action Scheduler, it provides a flexible and efficient orchestration for large-scale data processing tasks, such as API synchronizations, file-based (CSV, Excel, JSON) imports, and seamless chunk-wise data management.
Core Features
-
Data Chunking and Processing:
- The library introduces a consistent chunking mechanism to split large datasets (from files or APIs) into smaller, manageable pieces. Each chunk is processed asynchronously, reducing memory usage and improving load balancing.
- Multiple data sources are supported, including API endpoints, Excel files, CSV files, and JSON files.
-
Asynchronous Task Management:
- Powered by the awesome Action Scheduler, tasks can be managed asynchronously, ensuring smooth execution without blocking other site processes.
- Tasks are queued for execution, with efficient handling of task timeouts, retries, and cancellations.
-
Data Source Adaptability:
- The library provides abstract, extensible classes for different data formats:
- CSV Processor (
CSV_Sync): Handles CSV imports split into chunks, supporting UTF-8 conversions, custom delimiters, optional headers, and efficient file cleanup. - Excel Processor (
Excel): Manages Excel files with optional headers and the ability to process specific worksheets. Includes support for rich text cell values and row skipping. - JSON Processor (
JSON): Enables chunked processing of JSON files and supports JSON Pointer for extracting specific portions of the data. - API Processor (
API): Works with paginated APIs, automatically handles rate limiting, request intervals, and pagination (by page, offset, or URL).
- CSV Processor (
- The library provides abstract, extensible classes for different data formats:
-
Highly Extensible and Customizable:
- Abstract classes allow developers to implement their own data-fetching or processing methods tailored to their use case.
- A robust foundation supports advanced features like progressive pagination, deep merging, lock management, and transient-based sync data storage.
-
Reliable Sync Management:
- Built-in sync lifecycle management, including hooks for starting, progressing, and completing synchronizations.
- Automatic cleanup of completed tasks and retention of synchronization data for specified durations.
-
Error Handling and Recovery:
- Exception handling is seamlessly integrated at every critical stage, ensuring the process fails gracefully and any issues are recorded for diagnosis.
- Built-in support for handling job timeouts, cancellations, and retries with appropriate sync lifecycle callbacks.
Use Cases
- Importing Data: Import and process data from large datasets such as customer lists in Excel, product catalogs in JSON, or user data from CSV files.
- API Data Sync: Synchronize large-scale data from external APIs, with built-in features like pagination handling and rate limiting.
- Scheduled and Batch Processing: Execute complex batch processing tasks (like order processing or generating reports) asynchronously without affecting website performance.
- Custom Data Processing Flows: Build scalable workflows for chunking and processing any large dataset with minimal memory consumption and maximum fault tolerance.
Technical Highlights
-
Core Components:
- The
Chunkertrait schedules and processes data chunks, leveraging Action Scheduler for asynchronous execution. - The
Syncclass serves as an abstract base that manages the entire synchronization process lifecycle. - The
Sync_Datatrait provides reliable synchronization data storage using WordPress's transients, enabling flexible data sharing and locking mechanisms.
- The
-
Focus on Memory Efficiency:
- Uses iterative processing (e.g., PHP Generators, chunk-based file reading) to minimize memory usage and ensure scalability for large datasets.
-
Modular Design:
- The clean separation of concerns allows developers to modify or extend specific aspects like chunk processing or API fetching independently.
-
Action Scheduler Integration:
- Uses the Action Scheduler's event-driven architecture to manage background tasks effectively, along with group-level synchronization to maintain context-aware processing.
Available Lifecycle Hooks
The library provides several WordPress action hooks that fire during the synchronization lifecycle, allowing developers to hook into different stages of the sync process:
{sync_name}/start: Fires when any Action Scheduler action belonging to the sync begins execution (both dispatcher/root actions and chunk processing actions). ReceivesActionScheduler_Action $actionandint $action_idas parameters.{sync_name}/process_chunk: Fires to process an individual chunk. Receivesint $chunk_idas parameter.{sync_name}/complete: Fires when all actions in the sync group have completed successfully. Receivesstring $sync_group_nameas parameter.{sync_name}/fail: Fires when an action fails with an exception. ReceivesActionScheduler_Action $action,Exception $e, andint $action_idas parameters.{sync_name}/timeout: Fires when an action times out. ReceivesActionScheduler_Action $actionandint $action_idas parameters.{sync_name}/cancel: Fires when an action is canceled. ReceivesActionScheduler_Action $actionandint $action_idas parameters.{sync_name}/delete: Fires when an action is deleted. ReceivesChunk $chunkas parameter.
These hooks enable custom implementations for metrics tracking, logging, notifications, or other side effects at different stages of the synchronization process.
Ideal For
This library is an excellent choice for WordPress developers and enterprises dealing with:
- High-volume data integration from various sources.
- Automating repetitive and resource-intensive synchronization tasks.
- Optimizing workflows for applications that rely on large datasets or slow APIs.
- Frequent e-commerce product imports
AS Processor combines the power of modern WordPress development practices, Action Scheduler's asynchronous processing capabilities, and a highly abstracted framework to enable seamless and fault-tolerant data processing at scale. It offers developers a solid foundation for building efficient, scalable synchronization solutions tailored to their applications' unique requirements.
Sync Lifecycle Hooks
The Sync class provides a comprehensive set of lifecycle hooks that allow you to monitor and respond to various stages of the synchronization process. These hooks are namespaced by your sync name (as returned by get_sync_name()).
Available Hooks
{sync_name}/start
When: Fired when an action begins execution
Parameters: None (triggered via track_action_start)
Use case: Log the start of processing, set up temporary resources, or track progress
{sync_name}/complete
When: Fired for each sync-owned action upon successful completion (triggered by Action Scheduler's native action_scheduler_completed_action hook)
Parameters:
ActionScheduler_Action $action- The completed action objectint $action_id- The ID of the action
Use case: Track individual action completions, update progress indicators, or perform per-action cleanup
Note: This hook fires every time an action in your sync group completes. If your sync schedules 100 chunks, this hook will fire 100 times. Use this for per-action tracking, not for final completion logic (use /finish for that).
{sync_name}/finish
When: Fired once when all actions in the sync group are complete
Parameters:
string $group_name- The sync group name
Use case: Final cleanup, send completion notifications, or trigger dependent processes
{sync_name}/fail
When: Fired when an action encounters an exception during execution
Parameters:
ActionScheduler_Action $action- The failed action objectException $e- The exception that was thrownint $action_id- The ID of the failed action
Use case: Error logging, send failure notifications, or trigger recovery processes
{sync_name}/cancel
When: Fired when an action is manually cancelled
Parameters:
ActionScheduler_Action $action- The cancelled action objectint $action_id- The ID of the cancelled action
Use case: Clean up resources, log cancellation events
{sync_name}/delete
When: Fired when an action is deleted
Parameters:
Chunk $chunk- The chunk belonging to the deleted action
Use case: Track deleted actions, clean up resources, log deletion events
{sync_name}/timeout
When: Fired when an action times out
Parameters:
ActionScheduler_Action $action- The timed-out action objectint $action_id- The ID of the timed-out action
Use case: Handle timeout scenarios, log timeout events, retry logic
Hook Usage Example
// Track progress for each completed action add_action( 'my_custom_sync/complete', function( $action, $action_id ) { error_log( "Action $action_id completed. Belongs to group {$action->get_group()}." ); }, 10, 1 ); // Final cleanup when ALL actions are finished add_action( 'my_custom_sync/finish', function( $group_name ) { error_log( "All actions in group $group_name are finished!" ); } ); // Handle failures add_action( 'my_custom_sync/fail', function( $action, $exception, $action_id ) { error_log( "Action $action_id failed: " . $exception->getMessage() ); }, 10, 3 ); // Handle deletions add_action( 'my_custom_sync/delete', function( $chunk ) { error_log( "Action {$chunk->get_action_id()} was deleted." ); } );
Overridable Methods
Instead of using hooks, you can override these methods in your child class:
on_finish()
Called when all actions in the sync group are complete. This is the preferred method for implementing group completion logic.
public function on_finish(): void { // Your completion logic here }
on_fail()
Called when an action fails.
public function on_fail(): void { // Your failure handling logic here }