tamirrental / laravel-text-extractor
A Laravel package for extracting structured data from documents via OCR APIs. Ships with Koncile AI provider.
Package info
github.com/TamirRental/laravel-text-extractor
pkg:composer/tamirrental/laravel-text-extractor
Requires
- php: ^8.4
- illuminate/contracts: ^11.0||^12.0
- spatie/laravel-package-tools: ^1.16
Requires (Dev)
- laravel/pint: ^1.14
- orchestra/testbench: ^10.0
- pestphp/pest: ^4.0
- pestphp/pest-plugin-laravel: ^4.0
- pestphp/pest-plugin-type-coverage: ^4.0
This package is auto-updated.
Last update: 2026-03-05 05:56:29 UTC
README
A Laravel package for extracting structured data from documents (images, PDFs) via OCR APIs. Ships with a Koncile AI provider out of the box.
Features
- Extract structured data from documents using OCR providers
- Fluent API — chainable
metadata(),force(), andsubmit()methods - Async processing via Laravel queues
- Pluggable provider architecture — bring your own OCR provider
- Facade for clean, expressive syntax
- Built-in model scopes for querying extractions
Requirements
- PHP 8.4+
- Laravel 11 or 12
Installation
composer require tamirrental/laravel-text-extractor
Run the install command to publish the config file and migration:
php artisan document-extraction:install
Then run the migration:
php artisan migrate
Configuration
config/document-extraction.php
Provider connection settings.
return [ 'default' => env('EXTRACTION_PROVIDER', 'koncile_ai'), 'providers' => [ 'koncile_ai' => [ 'url' => env('KONCILE_AI_API_URL', 'https://api.koncile.ai'), 'key' => env('KONCILE_AI_API_KEY'), 'webhook_secret' => env('KONCILE_AI_WEBHOOK_SECRET'), ], ], ];
Environment Variables
Add these to your .env file:
KONCILE_AI_API_KEY=your-api-key KONCILE_AI_WEBHOOK_SECRET=your-webhook-secret
Usage
Basic Usage with Facade
use TamirRental\DocumentExtraction\Facades\DocumentExtraction; // Store the uploaded file $path = $file->store('documents/car-licenses', 's3'); // Extract — creates a record and dispatches async processing $extraction = DocumentExtraction::extract('car_license', $path) ->metadata([ 'template_id' => 'your-koncile-template-id', 'folder_id' => 'optional-folder-id', // optional 'identifier_field' => 'license_number', // optional — used to resolve identifier from extracted data ]) ->submit();
The package automatically dispatches a queued job to download the file from storage, upload it to the OCR provider, and track the result.
Metadata
The metadata() method accepts a key-value array that gets stored on the extraction record and passed to the provider. This is how you supply provider-specific data without any config files.
| Key | Required | Description |
|---|---|---|
template_id |
Yes (Koncile AI) | The OCR template ID on the provider side |
folder_id |
No | Optional folder/organization ID on the provider side |
identifier_field |
No | The field name from extracted data to use as a unique identifier (e.g. license_number) |
Force Re-extraction
If an extraction already exists for a file, chain force() to create a new one:
$extraction = DocumentExtraction::extract('car_license', $path) ->metadata(['template_id' => 'your-template-id']) ->force() ->submit();
Conditional Force
Using the Conditionable trait, you can conditionally chain methods:
$extraction = DocumentExtraction::extract('car_license', $path) ->metadata(['template_id' => 'your-template-id']) ->when($shouldForce, fn ($pending) => $pending->force()) ->submit();
Checking Extraction Status
use TamirRental\DocumentExtraction\Enums\DocumentExtractionStatusEnum; use TamirRental\DocumentExtraction\Models\DocumentExtraction; $extraction = DocumentExtraction::find($id); if ($extraction->status === DocumentExtractionStatusEnum::Completed) { $data = $extraction->extracted_data; $identifier = $extraction->identifier; // e.g. "12-345-67" }
Querying Extractions
The DocumentExtraction model includes useful scopes:
use TamirRental\DocumentExtraction\Models\DocumentExtraction; // Filter by status DocumentExtraction::pending()->get(); DocumentExtraction::completed()->get(); DocumentExtraction::failed()->get(); // Filter by type or file DocumentExtraction::forType('car_license')->get(); DocumentExtraction::forFile('documents/license.png')->get(); // Combine scopes DocumentExtraction::forType('car_license')->completed()->latest()->first();
How It Works
1. Your App 2. Queue Worker 3. Provider (Koncile AI)
│ │ │
├─ Store file to Storage │ │
├─ extract()->submit() ───────►│ │
│ (auto-dispatches event) ├─ Download from Storage │
│ ├─ Upload to provider ────────►│
│ ├─ Save external_task_id │
│ │ ├─ OCR Processing...
│ │ │
│◄─────────────── Provider webhook callback ◄────────────────┤
├─ Your controller handles it │ │
├─ complete() / fail() │ │
│ │ │
├─ Check status / display │ │
Extraction Lifecycle
| Stage | Status | external_task_id | extracted_data |
|---|---|---|---|
| Record created | pending |
null |
{} |
| Sent to provider | pending |
task-abc-123 |
{} |
| Provider succeeds | completed |
task-abc-123 |
{...provider data} |
| Provider fails | failed |
task-abc-123 |
{} |
Handling Webhooks
The package does not register webhook routes — you own the entire webhook flow. Create your own controller to receive provider callbacks and use the service to update extractions:
<?php namespace App\Http\Controllers; use Illuminate\Http\JsonResponse; use Illuminate\Http\Request; use TamirRental\DocumentExtraction\Services\DocumentExtractionService; class KoncileWebhookController extends Controller { public function handle(Request $request, DocumentExtractionService $service): JsonResponse { // Validate the webhook (signature verification, etc.) $taskId = $request->input('task_id'); $status = $request->input('status'); match ($status) { 'DONE' => $service->complete( $taskId, (object) $request->all(), $request->input('General_fields.license_number.value', ''), ), 'FAILED' => $service->fail($taskId, $request->input('error_message', 'Provider error')), default => null, }; return response()->json(['message' => 'Webhook processed']); } }
Then register the route in your application:
// routes/api.php Route::post('/webhooks/koncile', [KoncileWebhookController::class, 'handle']);
Available Service Methods
| Method | Description |
|---|---|
$service->complete(string $taskId, object $data, string $identifier = '') |
Mark extraction as completed with extracted data |
$service->fail(string $taskId, string $message) |
Mark extraction as failed with error message |
Custom Providers
You can create your own extraction provider by implementing the DocumentExtractionProvider contract:
<?php namespace App\Services; use Illuminate\Support\Facades\Storage; use TamirRental\DocumentExtraction\Contracts\DocumentExtractionProvider; use TamirRental\DocumentExtraction\Enums\DocumentExtractionStatusEnum; use TamirRental\DocumentExtraction\Models\DocumentExtraction; class MyCustomProvider implements DocumentExtractionProvider { /** * Process a document extraction request. * * The provider owns the full workflow: downloading the file, * calling the extraction API, and updating the model. */ public function process(DocumentExtraction $extraction): void { // Your extraction logic here... // Download file: Storage::get($extraction->filename) // Call your API, then update the model: $extraction->update([ 'status' => DocumentExtractionStatusEnum::Completed, 'extracted_data' => (object) ['field' => 'value'], 'identifier' => 'parsed-id', ]); } }
Then register it in the service provider by extending the package's binding:
// AppServiceProvider.php use TamirRental\DocumentExtraction\Contracts\DocumentExtractionProvider; public function register(): void { $this->app->bind(DocumentExtractionProvider::class, MyCustomProvider::class); }
Events
| Event | Dispatched When |
|---|---|
DocumentExtractionRequested |
Automatically dispatched when extract()->submit() creates a new extraction |
The event is dispatched internally — you don't need to dispatch it yourself. The queued listener downloads the file from storage and uploads it to the provider.
Listen for extraction completion in your app by creating your own listener that watches for model updates.
Testing
The package ships with model factories for testing:
use TamirRental\DocumentExtraction\Models\DocumentExtraction; // Default (pending, no task ID) $extraction = DocumentExtraction::factory()->create(); // With external task ID $extraction = DocumentExtraction::factory()->pending()->create(); // Completed with data $extraction = DocumentExtraction::factory()->completed()->create(); // Failed with error $extraction = DocumentExtraction::factory()->failed()->create(); // With metadata $extraction = DocumentExtraction::factory()->create([ 'metadata' => [ 'template_id' => 'your-template-id', 'identifier_field' => 'license_number', ], ]);
License
MIT