README

A PHP library for parsing NITF (News Industry Text Format) XML documents into a flat, searchable structure optimized for Meilisearch and similar full-text search engines.

Installation

composer require tacman/ntif-parser

Requirements

PHP 8.4+

Quick Start

use Tacman\NTF\NTF;

// Parse from file
$ntf = NTF::fromFile('article.xml');

// Or from XML string
$ntf = NTF::fromXml($xmlString);

// Or from a zip archive containing multiple NITF files
foreach (NTF::fromZip('articles.zip') as $ntf) {
    echo $ntf->headline;
}

// Get a flat array ready for indexing
$searchable = $ntf->toSearchable();

Available Fields

The NTF class provides these public properties:

Property	Type	Description
`$id`	`string`	Document ID (from `doc-id/@id-string`)
`$headline`	`string`	Main headline (from `hl1`)
`$subhead`	`string`	Sub-headline (from `hl2`)
`$byline`	`string`	Author byline
`$summary`	`string`	Article summary/abstract
`$body`	`string`	Full body text (all `<p>` elements joined)
`$keywords`	`string[]`	Keywords from key-list
`$categories`	`array`	Classifications as `['type' => '...', 'value' => '...']`
`$images`	`array`	Media references with `source`, `name`, `mimeType`
`$publishedAt`	`?DateTime`	Publication date
`$modifiedAt`	`?DateTime`	Last modification date
`$section`	`?string`	Publication section
`$type`	`?string`	Publication type

Meilisearch Integration

The toSearchable() method returns a flat array ready for direct indexing:

$ntf = NTF::fromFile('article.xml');
$searchable = $ntf->toSearchable();

// Index directly into Meilisearch
$client->index('articles')->addDocuments([$searchable]);

The searchable array includes all fields with:

publishedAt and modifiedAt as ISO 8601 strings
keywords as an array
categories and images as JSON arrays

Zip File Processing

Process large archives efficiently using the generator:

// Iterate through all NITF files in a zip
$count = 0;
foreach (NTF::fromZip('archive.zip') as $ntf) {
    $count++;
    // Process each document
}

// Or get all as an array
$all = NTF::allFromZip('archive.zip');

The zip parser:

Only processes .xml files
Skips invalid XML files silently
Uses a generator for memory efficiency

Example

Given a NITF XML file:

<?xml version="1.0" encoding="UTF-8"?>
<nitf xmlns="http://iptc.org/std/NITF/2006-10-18/">
  <head>
    <docdata>
      <doc-id id-string="abc123"/>
      <date.release norm="2026-01-15T00:01:00Z"/>
      <key-list>
        <keyword key="#news"/>
        <keyword key="#sports"/>
      </key-list>
    </docdata>
    <pubdata type="web" position.section="news/sports"/>
  </head>
  <body>
    <body.head>
      <hedline>
        <hl1>Big Game Today</hl1>
        <hl2>Preview and analysis</hl2>
      </hedline>
      <byline>By John Smith</byline>
    </body.head>
    <body.content>
      <p>First paragraph of the article...</p>
      <p>Second paragraph...</p>
    </body.content>
  </body>
</nitf>

You get:

$ntf->id;           // "abc123"
$ntf->headline;    // "Big Game Today"
$ntf->subhead;     // "Preview and analysis"
$ntf->byline;      // "By John Smith"
$ntf->body;        // "First paragraph...\n\nSecond paragraph..."
$ntf->keywords;     // ["#news", "#sports"]
$ntf->section;     // "news/sports"
$ntf->publishedAt; // DateTime object

Testing

./vendor/bin/phpunit

License

MIT

tacman / nitf-parser

Maintainers

Package info

Statistics

Security