randomstate / camelot-php
PHP Wrapper library for interfacing with the Camelot PDF table extraction library built in Python
Requires
- spatie/temporary-directory: ^1.2
- symfony/process: ^4.0|^5.0
Requires (Dev)
- league/csv: ^9.4
- phpunit/phpunit: ^8.5
This package is auto-updated.
Last update: 2024-11-06 18:10:23 UTC
README
A PHP wrapper for Camelot, the python PDF table extraction library
Installation
composer require randomstate/camelot-php
Usage
The package adheres closely with the camelot CLI API Usage.
Default output is in CSV format as a simple string. If you need to parse CSV strings we recommend the league/csv
package (https://csv.thephpleague.com/)
<?php use RandomState\Camelot\Camelot; use League\Csv\Reader; $tables = Camelot::lattice('/path/to/my/file.pdf') ->extract(); $csv = Reader::createFromString($tables[0]); $allRecords = $csv->getRecords();
Advanced Processing
Saving / Extracting
Note: No Camelot operations are run until one of these methods is run
$camelot->extract(); // uses temporary files and automatically grabs the table contents for you from each $camelot->save('/path/to/my-file.csv'); // mirrors the behaviour of Camelot and saves files in the format /path/to/my-file-page-*-table-*.csv $camelot->plot(); // useful for debugging, it will plot it in a separate window (see Visual Debugging below)
Set Format
$camelot->json();
$camelot->csv();
$camelot->html();
$camelot->excel();
$camelot->sqlite();
Specify Page Numbers
$camelot->pages('1,2,3-4,8-end')
Reading encrypted PDFs
$camelot->password('my-pass')
Processing background lines
$camelot->stream()->processBackgroundLines()
Visual debugging
$camelot->plot()
Specify table areas
<?php use RandomState\Camelot\Camelot; use RandomState\Camelot\Areas; Camelot::stream('my-file.pdf') ->inAreas( Areas::from($xTopLeft, $yTopLeft, $xBottomRight, $yBottomRight) // ->add($xTopLeft2, $yTopLeft2, $xBottomRight2, $yBottomRight2) // ->add($xTopLeft3, $yTopLeft3, $xBottomRight3, $yBottomRight3) );
Specify table regions
<?php use RandomState\Camelot\Camelot; use RandomState\Camelot\Areas; Camelot::stream('my-file.pdf') ->inRegions( Areas::from($xTopLeft, $yTopLeft, $xBottomRight, $yBottomRight) // ->add($xTopLeft2, $yTopLeft2, $xBottomRight2, $yBottomRight2) // ->add($xTopLeft3, $yTopLeft3, $xBottomRight3, $yBottomRight3) );
Specify column separators
$camelot->stream()->setColumnSeparators($x1,$x2...)
Split text along separators
$camelot->split()
Flag superscripts and subscripts
$camelot->flagSize()
Strip characters from text
$camelot->strip("\n")
Improve guessed table areas
$camelot->setEdgeTolerance(500)
Improve guessed table rows
$camelot->setRowTolerance(15)
Detect short lines
$camelot->lineScale(20)
Shift text in spanning cells
$camelot->shiftText('r', 'b')
Copy text in spanning cells
$camelot->copyTextSpanningCells('r', 'b')
License
MIT. Use at your own risk, we accept no liability for how this code is used.