survos / ai-dataset-bundle
Dataset-scale AI batch processing for Survos datasets using canonical dataset paths and JSONL artifacts.
Package info
github.com/survos/ai-dataset-bundle
Type:symfony-bundle
pkg:composer/survos/ai-dataset-bundle
Fund package maintenance!
Requires
- php: ^8.4
- survos/ai-batch-bundle: ^2.5
- survos/claims-bundle: ^2.5
- survos/data-contracts: ^2.5
- survos/dataset-bundle: ^2.5
- survos/field-bundle: ^2.5
- survos/jsonl-bundle: ^2.5
- survos/kit-bundle: ^2.5
- symfony/console: ^8.0
- symfony/dependency-injection: ^8.0
- symfony/filesystem: ^8.0
- symfony/framework-bundle: ^8.0
- twig/twig: ^3.0
Requires (Dev)
- phpstan/phpstan: ^2.1
- phpunit/phpunit: ^13.0
Suggests
- yethee/tiktoken: Improves token and input cost estimates before submitting dataset AI batches.
This package is auto-updated.
Last update: 2026-05-29 12:51:38 UTC
README
Dataset-scale AI batch processing for Survos/Museado datasets.
This bundle is intentionally separate from survos/ai-workflow-bundle.
ai-workflow-bundle operates on individual workflow subjects. This bundle
operates on dataset JSONL stages, writes durable batch artifacts, and uses the
canonical workspace paths from survos/dataset-bundle.
Responsibilities
- Read normalized rows from
20_normalize/{core}.jsonl. - Write provider-ready batch input JSONL and manifests to
40_ai/. - Submit/check/download OpenAI batch jobs through
survos/ai-batch-bundle. - Convert downloaded batch responses into portable claim JSONL files.
- Leave later enrichment/import stages to consume those claim files.
Commands
Commands are exposed as methods on Survos\AiDatasetBundle\Service\DatasetAiService.
php bin/console ai:dataset:estimate mus/aust --core=obj php bin/console ai:dataset:prepare mus/aust --core=obj --force php bin/console ai:dataset:submit mus/aust --core=obj --force php bin/console ai:dataset:status mus/aust php bin/console ai:dataset:download mus/aust --core=obj --force
ai:dataset:submit is the paid provider call. estimate and prepare are local.
Files
For dataset mus/aust and core obj, the bundle uses:
| Path | Purpose |
|---|---|
20_normalize/obj.jsonl |
Normalized source records |
40_ai/obj.dense_summary.batch.input.jsonl |
OpenAI batch input |
40_ai/obj.dense_summary.batch.json |
Local batch manifest |
40_ai/obj.dense_summary.batch.output.jsonl |
Raw OpenAI batch output |
40_ai/obj.jsonl |
Portable claim rows for enrichment |
All paths are resolved with Survos\DataBundle\Service\DataPaths.
Install
composer require survos/ai-dataset-bundle
Register the bundle:
Survos\AiDatasetBundle\SurvosAiDatasetBundle::class => ['all' => true],
Required runtime bundles:
survos/dataset-bundlesurvos/jsonl-bundlesurvos/ai-batch-bundlesurvos/claims-bundle
Optional:
yethee/tiktokenfor better token estimates.