insightbase / invoice-parser-nette
Nette package for parsing invoice/accounting documents from PDF (including scanned PDFs) using Azure Document Intelligence, LLM normalization and Czech-specific validation.
Package info
github.com/insightbase/InvoiceParser-nette
pkg:composer/insightbase/invoice-parser-nette
v1.0.1
2026-03-15 11:42 UTC
Requires
- php: >=8.1
- ext-json: *
- guzzlehttp/guzzle: ^7.9
- nette/di: ^3.1
- nette/schema: ^1.3
- nette/utils: ^4.0
Requires (Dev)
- phpunit/phpunit: ^10.5 || ^11.0
Suggests
- contributte/rabbitmq: For asynchronous invoice processing workers.
This package is auto-updated.
Last update: 2026-03-15 11:42:40 UTC
README
Nette balíček pro vytěžování faktur a účetních dokladů z PDF (včetně skenů) přes:
- Azure Document Intelligence (OCR + strukturovaná extrakce)
- LLM normalizaci (Azure OpenAI)
- české regex fallbacky (
VS,DUZP,IČO,DIČ) - validační vrstvu a asynchronní worker pattern
Instalace
composer require insightbase/invoice-parser-nette
Registrace extension
extensions: invoiceParser: InsightBase\InvoiceParserNette\DI\InvoiceParserExtension invoiceParser: azureDi: endpoint: %env(AZURE_DI_ENDPOINT)% apiKey: %env(AZURE_DI_KEY)% model: prebuilt-invoice apiVersion: 2023-07-31 maxPollAttempts: 25 pollIntervalMs: 1000 llm: enabled: true endpoint: %env(AZURE_OPENAI_ENDPOINT)% deployment: %env(AZURE_OPENAI_DEPLOYMENT)% apiKey: %env(AZURE_OPENAI_KEY)% apiVersion: 2024-10-21
Použití
<?php declare(strict_types=1); use InsightBase\InvoiceParserNette\Parser\InvoiceParser; final class InvoiceService { public function __construct( private InvoiceParser $invoiceParser, ) { } public function parse(string $pdfPath): array { $pdfContent = file_get_contents($pdfPath); $result = $this->invoiceParser->parsePdf((string) $pdfContent); return $result->invoice->toArray(); } }
Asynchronní worker (Contributte RabbitMQ)
Knihovna obsahuje worker service InvoiceParseWorker::process(array $message).
Příklad payloadu zprávy:
{
"pdfPath": "/data/invoices/invoice-2026-001.pdf"
}
Nebo:
{
"pdfBase64": "JVBERi0xLjQKJ..."
}
Ukázková integrace je v examples/rabbitmq.neon a examples/InvoiceConsumer.php.
Poznámky
- Pro oskenované PDF se OCR řeší na straně Azure Document Intelligence.
- Regex fallback slouží jako doplněk, když DI/LLM vrátí neúplná data.
- Validátor hlídá základní konzistenci částek a dat.