laravelsmartocr/laravel-smart-ocr

Laravel Smart OCR & Document Data Extractor - A powerful OCR and document parsing engine for Laravel

Installs: 0

Dependents: 0

Suggesters: 0

Security: 0

Stars: 1

Watchers: 0

Forks: 0

Open Issues: 0

pkg:composer/laravelsmartocr/laravel-smart-ocr

dev-main 2025-08-31 10:52 UTC

This package is not auto-updated.

Last update: 2025-09-29 09:27:32 UTC


README

A powerful Laravel package for OCR and intelligent document parsing with AI-powered data cleanup, reusable templates, and multi-language support.

Features

  • Multi-Driver OCR Support: Tesseract (offline), Google Vision, AWS Textract, Azure OCR
  • Template Matching System: Create and share reusable document templates
  • AI-Powered Cleanup: Automatic typo correction and data structuring
  • Multi-Language Support: Extract text in multiple languages
  • Laravel Native: Seamless integration with Eloquent, Queues, and Blade
  • Privacy-First: Full offline capability for sensitive documents
  • Smart Data Extraction: Automatically extract dates, amounts, emails, phone numbers
  • Document Preview: Interactive Blade components for reviewing extracted data

Installation

composer require laravelsmartocr/laravel-smart-ocr

Configuration

Publish the configuration file:

php artisan vendor:publish --tag=smart-ocr-config

Run migrations:

php artisan migrate

Basic Usage

Simple OCR Extraction

use LaravelSmartOCR\Facades\SmartOCR;

// Extract text from an image
$result = SmartOCR::extract('path/to/document.jpg');

// Extract with specific language
$result = SmartOCR::extract('path/to/document.jpg', [
    'language' => 'spa' // Spanish
]);

Using Templates

// Extract using a specific template
$result = SmartOCR::extractWithTemplate('invoice.pdf', $templateId);

// Auto-detect template
$parser = app('smart-ocr.parser');
$result = $parser->parse('invoice.pdf', [
    'auto_detect_template' => true
]);

AI Cleanup

$parser = app('smart-ocr.parser');
$result = $parser->parse('receipt.jpg', [
    'use_ai_cleanup' => true,
    'document_type' => 'receipt'
]);

Creating Templates

$templateManager = app('smart-ocr.templates');

$template = $templateManager->create([
    'name' => 'Standard Invoice',
    'type' => 'invoice',
    'fields' => [
        [
            'key' => 'invoice_number',
            'label' => 'Invoice Number',
            'type' => 'string',
            'pattern' => '/Invoice\s*#?\s*:\s*([A-Z0-9\-]+)/i',
        ],
        [
            'key' => 'total_amount',
            'label' => 'Total Amount',
            'type' => 'currency',
            'pattern' => '/Total\s*:\s*\$?\s*([0-9,.]+)/i',
        ]
    ]
]);

Batch Processing

$documents = [
    'invoice1.pdf',
    'invoice2.jpg',
    'receipt.png'
];

$results = $parser->parseBatch($documents, [
    'use_ai_cleanup' => true,
    'save_to_database' => true
]);

Blade Components

Display extracted document data with the included Blade component:

<x-smart-ocr::document-preview 
    :document="$processedDocument"
    :show-overlay="true"
    :show-actions="true"
/>

Advanced Configuration

Configure OCR Drivers

# Tesseract (Default - Offline)
SMART_OCR_DRIVER=tesseract
TESSERACT_LANGUAGE=eng

# Google Vision
SMART_OCR_DRIVER=google_vision
GOOGLE_VISION_KEY_FILE=/path/to/credentials.json
GOOGLE_VISION_PROJECT_ID=your-project-id

# AWS Textract
SMART_OCR_DRIVER=aws_textract
AWS_ACCESS_KEY_ID=your-key
AWS_SECRET_ACCESS_KEY=your-secret
AWS_DEFAULT_REGION=us-east-1

# Azure OCR
SMART_OCR_DRIVER=azure
AZURE_OCR_ENDPOINT=https://your-resource.cognitiveservices.azure.com/
AZURE_OCR_KEY=your-key

Enable AI Cleanup

SMART_OCR_AI_CLEANUP=true
SMART_OCR_AI_PROVIDER=openai
OPENAI_API_KEY=your-openai-key

Queue Processing

SMART_OCR_QUEUE_ENABLED=true
SMART_OCR_QUEUE_NAME=ocr-processing

Workflows

Define custom workflows for specific document types:

// config/smart-ocr.php
'workflows' => [
    'invoice' => [
        'options' => [
            'use_ai_cleanup' => true,
            'auto_detect_template' => true,
            'extract_tables' => true,
        ],
        'post_processors' => [
            ['class' => 'App\OCR\Processors\InvoiceProcessor'],
        ],
    ],
]

// Usage
$result = $parser->parseWithWorkflow('invoice.pdf', 'invoice');

API Usage

// Field mapping with fuzzy matching
$aiCleanup = app('smart-ocr.ai-cleanup');
$mapped = $aiCleanup->mapFields($extractedData, [
    'invoice_id' => [
        'alternatives' => ['invoice_number', 'inv_no', 'bill_number'],
        'transform' => 'uppercase'
    ],
    'amount' => [
        'field' => 'total',
        'transform' => 'currency'
    ]
]);

Security

  • Offline Mode: Use Tesseract for complete data privacy
  • Encryption: Enable data encryption for stored documents
  • Validation: Built-in MIME type and file size validation
  • Sanitization: Automatic input sanitization

Pro Version

Upgrade to Pro for:

  • Advanced AI cleanup with multiple providers
  • Access to community template marketplace
  • Priority support and updates
  • Advanced language packs
  • Custom OCR model training s