survos / data-bundle
NOT survos/dataset-bundle. Namespace: Survos\DataBundle (no 'set'). Database and tooling layer for survos/data-contracts: keyword→ContentType classification cache (VocabMap) and ContentType display-label translations (VocabLabel).
Fund package maintenance!
Requires
- php: ^8.4
- doctrine/dbal: ^3.9||^4.2
- doctrine/doctrine-bundle: ^3.2
- doctrine/orm: ^3.6
- survos/data-contracts: ^2.5
- survos/dataset-bundle: ^2.5
- survos/field-bundle: ^2.5
- survos/import-bundle: ^2.5
- survos/kit-bundle: ^2.5
- symfony/console: ^8.0
- symfony/framework-bundle: ^8.0
- dev-main
- 2.6.0
- 2.5.8
- 2.5.7
- 2.5.6
- 2.5.5
- 2.5.3
- 2.5.2
- 2.5.1
- 2.5.0
- 2.4.4
- 2.4.3
- 2.4.2
- 2.4.1
- 2.4.0
- 2.3.0
- 2.2.5
- 2.2.4
- 2.2.3
- 2.2.2
- 2.2.1
- 2.2.0
- 2.1.2
- 2.1.1
- 2.0.220
- 2.0.219
- 2.0.218
- 2.0.217
- 2.0.216
- 2.0.215
- 2.0.214
- 2.0.213
- 2.0.212
- 2.0.211
- 2.0.210
- 2.0.209
- 2.0.208
- 2.0.207
- 2.0.206
- 2.0.205
- 2.0.204
- 2.0.203
- 2.0.202
- 2.0.201
- 2.0.200
- 2.0.199
- 2.0.198
- 2.0.197
- 2.0.196
- 2.0.195
- 2.0.194
- 2.0.193
- 2.0.192
- 2.0.191
- 2.0.190
- 2.0.189
- 2.0.188
- 2.0.187
- 2.0.186
- 2.0.185
- 2.0.184
- 2.0.183
- 2.0.182
- 2.0.181
- 2.0.180
- 2.0.179
- 2.0.178
- 2.0.177
- 2.0.176
- 2.0.175
- 2.0.173
- 2.0.172
- 2.0.171
- 2.0.170
- 2.0.169
- 2.0.168
- 2.0.167
- 2.0.166
- 2.0.165
- 2.0.164
- 2.0.163
- 2.0.162
- 2.0.161
- 2.0.160
- 2.0.159
- 2.0.158
- 2.0.156
- 2.0.155
- 2.0.154
- 2.0.146
- 2.0.145
- 2.0.144
- 2.0.143
- 2.0.142
- 2.0.141
- 2.0.140
- 2.0.139
- 2.0.138
- 2.0.137
- 2.0.136
- 2.0.135
- 2.0.134
- 2.0.133
- 2.0.132
- 2.0.131
- 2.0.130
- 2.0.129
- 2.0.128
- 2.0.127
- 2.0.126
- 2.0.125
- 2.0.124
- 2.0.123
- 2.0.122
- 2.0.121
- 2.0.120
- 2.0.119
- 2.0.117
- 2.0.116
- 2.0.115
- 2.0.114
- 2.0.113
- 2.0.112
- 2.0.111
- 2.0.110
- 2.0.109
This package is auto-updated.
Last update: 2026-05-29 11:06:00 UTC
README
⚠️ NOT
survos/dataset-bundle(note the extraset). Different package, different namespace.
Package Namespace Bundle class Purpose survos/data-bundle(this)Survos\DataBundleSurvosDataBundleVocab/ContentType DB tooling survos/dataset-bundleSurvos\DatasetBundleSurvosDatasetBundleDataset filesystem conventions ( APP_DATA_DIR)Both bundles can be installed side-by-side. Don't merge their namespaces in
composer.jsonautoload — it silently masks classes from the other.
Database and tooling layer for survos/data-contracts.
Bridges the contract layer (ContentType constants, item DTOs) to the file-based import pipeline with two lightweight entities and a pipeline listener:
What's here
VocabMap — keyword → ContentType classification cache
One row per (lang, normKeyword). Populated by the AI classifier (via
ai-workflow-bundle); consumed by VocabResolver during the enrich stage.
Null contentType = evaluated and not classifiable — stored so the model
is never asked twice for the same term.
VocabLabel — ContentType → display label per language
One row per (contentType, lang). Generated once per language via a single
Claude call covering all ~22 ContentType slugs. Used to render facet sidebar
labels in the source language ("Photographie" not "photograph").
VocabTermExtractorListener
Fires on ImportConvertFinishedEvent (after normalize). Scans the output
JSONL, extracts unique (lang, term) pairs from genre/subject fields, and
writes 30_terms/vocab.jsonl — the per-dataset term inventory used to diff
against the shared vocab/{lang}/dto_map.jsonl.
VocabResolver
Service that wraps ContentType::fromRecord() with a VocabMap DB lookup for
foreign-language keywords. Used in the enrich stage to set content_type on
each record.
Pipeline flow
import:convert --stage=normalize
→ 20_normalize/obj.jsonl
→ VocabTermExtractorListener → 30_terms/vocab.jsonl
vocab:map (diff + AI call for misses)
→ vocab/{lang}/dto_map.jsonl (shared, language-level)
import:convert --stage=enrich
→ VocabResolver loads dto_map into memory
→ 60_enrich/obj.jsonl (with content_type set)
File locations (via DataPaths in dataset-bundle)
| Path | Purpose |
|---|---|
$APP_DATA_DIR/vocab/{lang}/dto_map.jsonl |
Shared keyword→ContentType map |
$APP_DATA_DIR/vocab/{lang}/labels.jsonl |
Shared ContentType display labels |
$APP_DATA_DIR/translation/{lang}/ |
Stub for future translation memory |
{dataset}/30_terms/vocab.jsonl |
Per-dataset extracted term inventory |
Install
composer require survos/data-bundle
Register in config/bundles.php:
Survos\DataBundle\SurvosDataBundle::class => ['all' => true],