dgtlss / parqbridge
Export Laravel database tables to Parquet files using Storage disks (no external deps).
Installs: 300
Dependents: 0
Suggesters: 0
Security: 0
Stars: 2
Watchers: 1
Forks: 0
Open Issues: 0
pkg:composer/dgtlss/parqbridge
Requires
- php: >=8.3
Requires (Dev)
- illuminate/config: ^11.0|^12.0
- illuminate/console: ^11.0|^12.0
- illuminate/container: ^11.0|^12.0
- illuminate/database: ^11.0|^12.0
- illuminate/filesystem: ^11.0|^12.0
- illuminate/support: ^11.0|^12.0
- league/flysystem: ^3.25.1
- phpoption/phpoption: ^1.9
- phpunit/phpunit: ^10.5
- vlucas/phpdotenv: ^5.6
README
Export your Laravel database tables to real Apache Parquet files on any Storage disk (local, S3, etc.) with a simple artisan command.
ParqBridge focuses on zero PHP dependency bloat while still producing spec-compliant Parquet files by delegating the final write step to a tiny, embedded Python script using PyArrow (or any custom CLI you prefer). You keep full Laravel DX for configuration and Storage; we bridge your data to Parquet.
Installation
- Require the package in your app (path repo or VCS):
composer require dgtlss/parqbridge
-
Laravel will auto-discover the service provider. Alternatively, register
ParqBridge\\ParqBridgeServiceProvidermanually. -
Publish the config if you want to customize defaults:
php artisan vendor:publish --tag="parqbridge-config"
Configuration
Set your export disk and options in .env or config/parqbridge.php.
PARQUET_DISK: which filesystem disk to use (e.g.,s3,local).PARQUET_OUTPUT_DIR: directory prefix within the disk (defaultparquet-exports).PARQUET_CHUNK_SIZE: rows per DB chunk when exporting (default 1000).PARQUET_INFERENCE:database|sample|hybrid(defaulthybrid).PARQUET_COMPRESSION: compression codec for Parquet (UNCOMPRESSED/NONE,SNAPPY,GZIP,ZSTD,BROTLI,LZ4_RAW) when using PyArrow backend.PARQBRIDGE_WRITER:pyarrow(default) orcustom. Ifcustom, setPARQBRIDGE_CUSTOM_CMD.PARQBRIDGE_PYTHON: python executable for PyArrow (defaultpython3).
Example .env:
PARQUET_DISK=s3 PARQUET_OUTPUT_DIR=parquet-exports PARQUET_CHUNK_SIZE=2000
Ensure your filesystems disk is configured (e.g., s3) in config/filesystems.php.
FTP disk configuration
You can export directly to an FTP server using Laravel's ftp disk. Add an FTP disk to config/filesystems.php and reference it via PARQUET_DISK=ftp or --disk=ftp.
'disks' => [ 'ftp' => [ 'driver' => 'ftp', 'host' => env('FTP_HOST'), 'username' => env('FTP_USERNAME'), 'password' => env('FTP_PASSWORD'), // Optional FTP settings 'port' => (int) env('FTP_PORT', 21), 'root' => env('FTP_ROOT', ''), 'passive' => filter_var(env('FTP_PASSIVE', true), FILTER_VALIDATE_BOOL), 'ssl' => filter_var(env('FTP_SSL', false), FILTER_VALIDATE_BOOL), 'timeout' => (int) env('FTP_TIMEOUT', 90), ], ],
Note: This package will coerce common FTP env values (e.g., port, timeout, passive, ssl) to the proper types before resolving the disk to avoid Flysystem type errors like "Argument #5 ($port) must be of type int, string given".
Usage
- List tables:
php artisan parqbridge:tables
- Export a table to the configured disk:
php artisan parqbridge:export users --where="active = 1" --limit=1000 --output="parquet-exports" --disk=s3
On success, the command prints the full path written within the disk. Files are named {table}-{YYYYMMDD_HHMMSS}.parquet.
- Export ALL tables into one folder (timestamped subfolder inside
parqbridge.output_directory):
php artisan parqbridge:export-all --disk=s3 --output="parquet-exports" --exclude=migrations,password_resets
Options:
--include=: comma-separated allowlist of table names--exclude=: comma-separated denylist of table names
Data types
The schema inferrer maps common DB types to a set of Parquet primitive types and logical annotations. With the PyArrow backend, an Arrow schema is constructed to faithfully write types:
- Primitive:
BOOLEAN,INT32,INT64,FLOAT,DOUBLE,BYTE_ARRAY,FIXED_LEN_BYTE_ARRAY - Logical:
UTF8,DATE,TIME_MILLIS,TIME_MICROS,TIMESTAMP_MILLIS,TIMESTAMP_MICROS,DECIMAL
For decimals we write Arrow decimal types (decimal128/decimal256) with declared precision/scale.
Testing
Run the test suite:
composer install vendor/bin/phpunit
The tests bootstrap a minimal container, create a SQLite database, and verify:
- listing tables works on SQLite
- exporting a table writes a Parquet file to the configured disk (magic
PAR1) - schema inference on SQLite maps major families
Backend requirements
- By default ParqBridge uses Python + PyArrow. Ensure
python3is available and install PyArrow:
python3 -m pip install --upgrade pip python3 -m pip install pyarrow
- Alternatively set a custom converter command via
PARQBRIDGE_WRITER=customandPARQBRIDGE_CUSTOM_CMD(must read{input}CSV and write{output}Parquet).
You can automate setup via the included command:
php artisan parqbridge:setup --write-env
Options:
--python=: path/name of Python (default from configparqbridge.pyarrow_python)--venv=: location for virtualenv (default./parqbridge-venv)--no-venv: install into global Python instead of a venv--write-env: appendPARQBRIDGE_PYTHONandPARQBRIDGE_WRITERto.env--upgrade: upgrade pip first--dry-run: print commands without executing