survos / jsonl-bundle
Fast, concurrent, resumable, strictly-ordered JSONL ingestion utilities for Symfony 7.3 / PHP 8.4
Fund package maintenance!
kbond
Installs: 427
Dependents: 4
Suggesters: 0
Security: 0
Stars: 0
Watchers: 0
Forks: 0
Open Issues: 0
Type:symfony-bundle
pkg:composer/survos/jsonl-bundle
Requires
- php: ^8.4
- ext-zip: *
- ext-zlib: *
- psr/log: ^3.0
- symfony/console: ^7.3||^8.0
- symfony/dependency-injection: ^7.3||^8.0
- symfony/framework-bundle: ^7.3||^8.0
- symfony/http-client: ^7.3||^8.0
- symfony/lock: ^8.0
Requires (Dev)
- halaxa/json-machine: ^1.2
- league/csv: ^9.27
- phpunit/phpunit: ^12
- roave/security-advisories: dev-latest
- symfony/messenger: ^7.4||^8.0
Suggests
- halaxa/json-machine: for importing large json
- dev-main
- 2.0.96
- 2.0.95
- 2.0.94
- 2.0.93
- 2.0.92
- 2.0.91
- 2.0.90
- 2.0.89
- 2.0.88
- 2.0.87
- 2.0.86
- 2.0.85
- 2.0.84
- 2.0.83
- 2.0.82
- 2.0.81
- 2.0.80
- 2.0.79
- 2.0.78
- 2.0.77
- 2.0.76
- 2.0.75
- 2.0.74
- 2.0.73
- 2.0.72
- 2.0.71
- 2.0.70
- 2.0.69
- 2.0.68
- 2.0.67
- 2.0.66
- 2.0.65
- 2.0.64
- 2.0.63
- 2.0.62
- 2.0.61
- 2.0.60
- 2.0.59
- 2.0.58
- 2.0.57
- 2.0.56
- 2.0.55
- 2.0.54
- 2.0.53
- 2.0.51
- 2.0.50
- 2.0.49
- 2.0.48
- 2.0.47
- 2.0.46
- 2.0.45
- 2.0.44
- 2.0.43
- 2.0.42
- 2.0.41
- 2.0.40
- 2.0.39
- 2.0.38
- 2.0.37
- 2.0.36
- 2.0.35
- 2.0.34
- 2.0.33
- 2.0.32
- 2.0.31
- 2.0.30
- 2.0.29
- 2.0.28
- 2.0.27
This package is auto-updated.
Last update: 2026-01-05 18:04:08 UTC
README
Streaming JSONL read/write utilities for Symfony, with first-class support for resumable writes, sidecar progress tracking, transparent gzip compression, and CLI-level inspection and querying.
JsonlBundle is intentionally small at the API surface, but feature-rich under the hood. Many of its most powerful capabilities are currently “hidden” because they require no extra code once you adopt the bundle’s reader/writer abstractions.
This README makes those capabilities explicit, with concrete, copy-pasteable examples.
Why JSONL?
JSON Lines (.jsonl) is the ideal format for:
- Large datasets
- Streaming ingestion pipelines
- Fault-tolerant ETL
- CLI-driven workflows
- Append-only logs
- Partial resume after failure
JsonlBundle embraces these properties rather than fighting them.
Installation
composer require survos/jsonl-bundle
Symfony Flex will register the bundle automatically.
Core Concepts
Files
- One JSON object per line
- Append-only
- Safe for streaming
Sidecar
Every JSONL file written with JsonlWriter has an optional sidecar file that tracks:
- Total records written
- Byte offset
- Completion status
- Timestamps
- Resume metadata
Sidecars enable resume, progress inspection, and CLI tooling.
Compression
If the filename ends in .gz, compression is automatic.
No flags. No config. Just naming.
Writing JSONL
Basic Write
use Survos\JsonlBundle\IO\JsonlWriter; $writer = JsonlWriter::open('data/products.jsonl'); $writer->write([ 'id' => 1, 'name' => 'Widget', 'price' => 9.99, ]); $writer->close();
This immediately gives you:
- File locking
- Atomic writes
- Sidecar tracking
Appending Multiple Records
$writer = JsonlWriter::open('data/products.jsonl'); foreach ($products as $product) { $writer->write($product); } $writer->close();
No buffering required. Memory-safe for large datasets.
Resume a Partial Write
If the process crashes midway, simply reopen the same file:
$writer = JsonlWriter::open('data/products.jsonl'); foreach ($remainingProducts as $product) { $writer->write($product); } $writer->close();
The writer will:
- Detect the sidecar
- Resume safely
- Avoid corrupt output
Writing with Gzip Compression
Just use .gz:
$writer = JsonlWriter::open('data/products.jsonl.gz'); $writer->write([ 'id' => 1, 'name' => 'Compressed Widget', ]);
Features retained:
- Streaming
- Resume
- Sidecar
- Locking
Reading JSONL
Basic Read
use Survos\JsonlBundle\IO\JsonlReader; $reader = new JsonlReader('data/products.jsonl'); foreach ($reader as $row) { echo $row['name'] . PHP_EOL; }
The reader:
- Streams line-by-line
- Uses constant memory
- Works for
.jsonland.jsonl.gz
Read with Offset (Resume Processing)
$reader = new JsonlReader( filename: 'data/products.jsonl', offset: 5000 ); foreach ($reader as $row) { // Continue processing after record 5000 }
Ideal for batch pipelines and workers.
Sidecar Files (The Hidden Superpower)
For a file:
data/products.jsonl
JsonlBundle maintains:
data/products.jsonl.sidecar.json
What’s Inside the Sidecar?
Typical contents:
{
"rows": 12450,
"bytes": 9876543,
"completed": false,
"started_at": "2026-01-05T10:21:11Z",
"updated_at": "2026-01-05T10:25:02Z"
}
You do not manage this manually.
Query Progress Programmatically
use Survos\JsonlBundle\Model\JsonlSidecar; $sidecar = JsonlSidecar::fromFilename('data/products.jsonl'); echo $sidecar->rows; // 12450 echo $sidecar->completed; // false
Mark Completion Explicitly
$writer = JsonlWriter::open('data/products.jsonl'); // write rows... $writer->markCompleted(); $writer->close();
This is especially useful in multi-stage pipelines.
CLI Utilities
JsonlBundle ships with CLI tooling to inspect files without writing code.
Inspect a File
php bin/console jsonl:info data/products.jsonl
Example output:
File: data/products.jsonl
Rows: 12450
Bytes: 9.4 MB
Compressed: no
Completed: false
Inspect a Gzipped File
php bin/console jsonl:info data/products.jsonl.gz
Works exactly the same.
Peek at Records
php bin/console jsonl:head data/products.jsonl --limit=5
Resume-Aware Counting
Unlike wc -l, JsonlBundle uses the sidecar:
php bin/console jsonl:count data/products.jsonl
Fast and accurate even for huge files.
Patterns Enabled by JsonlBundle
ETL Pipelines
download → normalize → enrich → translate → index
Each step emits JSONL, resumes safely, and records progress.
Doctrine Import Pipelines
- Stream JSONL
- Batch insert entities
- Resume on failure
- Track progress via sidecar
Translation Caches
source.jsonltarget.jsonl- De-duplication
- Resume after API failure
Search Index Feeds
- Emit JSONL once
- Replay into Meilisearch / OpenSearch
- Deterministic re-runs
When Not to Use JsonlBundle
- Small datasets fully loaded into memory
- Interactive CRUD forms
- Request/response APIs
JsonlBundle shines in pipelines, CLI tools, and long-running jobs.
Design Philosophy
- Zero configuration
- Filename-driven behavior
- Streaming first
- Fail-safe defaults
- Sidecar over database state
- CLI-friendly
Summary
JsonlBundle is more than a reader/writer:
- ✔ Resumable writes
- ✔ Sidecar progress tracking
- ✔ Transparent gzip compression
- ✔ CLI inspection & querying
- ✔ Streaming everywhere
Most of this works automatically once you adopt JsonlWriter and JsonlReader.
Example: Caching a Remote API as JSONL (DummyJSON Products)
This example demonstrates how JsonlBundle can be used as a simple, durable cache for a remote HTTP API, without introducing a separate caching abstraction.
Goal:
- Command:
bin/console products:list - If
products.jsonldoes not exist:- Fetch
https://dummyjson.com/products - Write each product as a JSONL record
- Fetch
- Then:
- Read
products.jsonl - Display the first product
- Read
This turns a one-shot HTTP call into a replayable, CLI-friendly dataset.
The Command (Minimal, Accurate API Usage)
<?php declare(strict_types=1); namespace App\Command; use Survos\JsonlBundle\IO\JsonlReader; use Survos\JsonlBundle\IO\JsonlWriter; use Symfony\Component\Console\Attribute\AsCommand; use Symfony\Component\Console\Command\Command; use Symfony\Component\Console\Style\SymfonyStyle; use Symfony\Component\HttpClient\HttpClient; #[AsCommand('products:list', 'List products using a JSONL cache')] final class ProductsListCommand { public function __invoke(SymfonyStyle $io): int { $file = 'var/products.jsonl'; if (!file_exists($file)) { $client = HttpClient::create(); $data = $client->request('GET', 'https://dummyjson.com/products')->toArray(); $writer = JsonlWriter::open($file); foreach ($data['products'] as $product) { $writer->write($product); } $writer->close(); } $reader = JsonlReader::open($file); foreach ($reader as $product) { $io->title($product['title']); $io->listing([ 'Price: ' . $product['price'], 'Category: ' . $product['category'], ]); break; // first product only } return Command::SUCCESS; } }
What This Example Actually Uses
This command relies only on existing, stable APIs:
JsonlWriter
JsonlWriter::open($filename)->write(array $row)->close()
JsonlReader
JsonlReader::open($filename)foreach ($reader as $row)
No hidden methods. No imaginary helpers.
What You Still Get (Implicitly)
Even with this minimal API surface, you still benefit from:
- ✔ Streaming writes (no memory pressure)
- ✔ File locking
- ✔ Safe append semantics
- ✔ Sidecar tracking (rows / bytes / timestamps)
- ✔ Resume-safe re-execution
- ✔ Transparent
.gzsupport via filename
None of this appears in the command — which is the point.
Optional Variations
Enable Compression (Zero Code Changes)
$file = 'var/products.jsonl.gz';
Inspect the Cache from the CLI
bin/console jsonl:info var/products.jsonl
Re-run Offline
Once written, the command works without network access.
Why This Pattern Matters
This example shows the intended JsonlBundle workflow:
- Treat JSONL as a first-class artifact
- Use filenames, not flags, to control behavior
- Let sidecars track progress instead of databases
- Make pipelines restartable by default
You are not “caching HTTP”. You are materializing data.
That distinction is why JsonlBundle scales cleanly from demos to production pipelines.
Inspecting JSONL Files (jsonl:info)
In addition to counting rows, JsonlBundle provides a command to inspect progress and status metadata stored in sidecar files.
Show info for a single file
bin/console jsonl:info data/products.jsonl
Example output:
JSONL info
──────────
File: data/products.jsonl
Rows: 12450
Sidecar: data/products.jsonl.sidecar.json
Sidecar exists: yes
Completed: no
Started: 2026-01-05T14:03:21+00:00
Updated: 2026-01-05T14:22:09+00:00
Bytes (sidecar): 98234112
This is the fastest way to answer questions like:
- “How many records are written so far?”
- “Did this pipeline finish?”
- “When was this dataset last updated?”
Inspect a directory of JSONL files
bin/console jsonl:info var/data
Recurse into subdirectories
bin/console jsonl:info var/data -r
Directory mode renders a table with one row per file, including completion status and timestamps:
| Rows | Complete | Updated | Started | File |
|---|---|---|---|---|
| 170072 | no | 2026-01-05T14:22:09 | 2026-01-05T14:03:21 | place.jsonl |
| 3340 | yes | 2026-01-05T13:01:44 | 2026-01-05T12:58:10 | concept.jsonl |
| … | … | … | … | … |
Why jsonl:info matters
- Uses sidecar metadata, not slow file scans
- Works for
.jsonland.jsonl.gz - Distinguishes partial vs completed datasets
- Ideal for long-running or resumable pipelines
For deeper discussion of sidecars, resume logic, and CLI workflows, see the advanced usage guide.
Advanced Usage
This bundle is intentionally small at the surface but powerful in real-world pipelines.
For production patterns—including resume semantics, sidecar files, row counting, CLI workflows, and anti-patterns to avoid—see the advanced documentation:
👉 More examples 👉 Advanced Usage Guide
That document is where long-running jobs, API dumps, and restartable ingestion pipelines are covered in detail.