tacman / nitf-parser
Parse NITF (News Industry Text Format) XML documents into a flat structure for search indexing
1.0.1
2026-04-06 16:59 UTC
Requires
- php: ^8.4
Requires (Dev)
- phpunit/phpunit: ^13.0
README
A PHP library for parsing NITF (News Industry Text Format) XML documents into a flat, searchable structure optimized for Meilisearch and similar full-text search engines.
Installation
composer require tacman/ntif-parser
Requirements
- PHP 8.4+
Quick Start
use Tacman\NTF\NTF; // Parse from file $ntf = NTF::fromFile('article.xml'); // Or from XML string $ntf = NTF::fromXml($xmlString); // Or from a zip archive containing multiple NITF files foreach (NTF::fromZip('articles.zip') as $ntf) { echo $ntf->headline; } // Get a flat array ready for indexing $searchable = $ntf->toSearchable();
Available Fields
The NTF class provides these public properties:
| Property | Type | Description |
|---|---|---|
$id |
string |
Document ID (from doc-id/@id-string) |
$headline |
string |
Main headline (from hl1) |
$subhead |
string |
Sub-headline (from hl2) |
$byline |
string |
Author byline |
$summary |
string |
Article summary/abstract |
$body |
string |
Full body text (all <p> elements joined) |
$keywords |
string[] |
Keywords from key-list |
$categories |
array |
Classifications as ['type' => '...', 'value' => '...'] |
$images |
array |
Media references with source, name, mimeType |
$publishedAt |
?DateTime |
Publication date |
$modifiedAt |
?DateTime |
Last modification date |
$section |
?string |
Publication section |
$type |
?string |
Publication type |
Meilisearch Integration
The toSearchable() method returns a flat array ready for direct indexing:
$ntf = NTF::fromFile('article.xml'); $searchable = $ntf->toSearchable(); // Index directly into Meilisearch $client->index('articles')->addDocuments([$searchable]);
The searchable array includes all fields with:
publishedAtandmodifiedAtas ISO 8601 stringskeywordsas an arraycategoriesandimagesas JSON arrays
Zip File Processing
Process large archives efficiently using the generator:
// Iterate through all NITF files in a zip $count = 0; foreach (NTF::fromZip('archive.zip') as $ntf) { $count++; // Process each document } // Or get all as an array $all = NTF::allFromZip('archive.zip');
The zip parser:
- Only processes
.xmlfiles - Skips invalid XML files silently
- Uses a generator for memory efficiency
Example
Given a NITF XML file:
<?xml version="1.0" encoding="UTF-8"?> <nitf xmlns="http://iptc.org/std/NITF/2006-10-18/"> <head> <docdata> <doc-id id-string="abc123"/> <date.release norm="2026-01-15T00:01:00Z"/> <key-list> <keyword key="#news"/> <keyword key="#sports"/> </key-list> </docdata> <pubdata type="web" position.section="news/sports"/> </head> <body> <body.head> <hedline> <hl1>Big Game Today</hl1> <hl2>Preview and analysis</hl2> </hedline> <byline>By John Smith</byline> </body.head> <body.content> <p>First paragraph of the article...</p> <p>Second paragraph...</p> </body.content> </body> </nitf>
You get:
$ntf->id; // "abc123" $ntf->headline; // "Big Game Today" $ntf->subhead; // "Preview and analysis" $ntf->byline; // "By John Smith" $ntf->body; // "First paragraph...\n\nSecond paragraph..." $ntf->keywords; // ["#news", "#sports"] $ntf->section; // "news/sports" $ntf->publishedAt; // DateTime object
Testing
./vendor/bin/phpunit
License
MIT