hab/picareader

Classes for reading Pica+ records encoded in Pica, PicaXML and PicaPlain

v1.2.0 2017-10-27 06:40 UTC

This package is not auto-updated.

Last update: 2024-10-26 18:12:03 UTC


README

[![Build Status](https://travis-ci.org/dmj/PicaReader.svg?branch=master)](https://travis-ci.org/dmj/PicaReader)

About

PicaReader provides classes for reading Pica+ records encoded in PicaXML and PicaPlain.

PicaReader is copyright (c) 2012-2017 by Herzog August Bibliothek Wolfenbüttel and released under the terms of the GNU General Public License v3.

Installation

You can install PicaReader via Composer.

composer require hab/picareader

Usage

All readers adhere to the same interface. You open the reader with a string of input data by calling Reader::open() and can call Reader::read() to read the next record in the input data. If the input does not contain (anymore) records Reader::read() returns FALSE. Otherwise it returns either a record object created with PicaRecord’s Record::factory() function.

$reader = new \HAB\Pica\Reader\PicaXmlReader()
$reader->open(file_get_contents('http://unapi.gbv.de?id=opac-de-23:ppn:635012286&format=picaxml'));
$record = $reader->read();
$reader->close();

To filter out records or fields you can attach a filter to the reader via Reader::setFilter(). A filter is any valid PHP callback that takes an associative array representing the record as argument and returns a possibly modified array or FALSE if the entire record should be skipped.

The array representation of a record is defined as follows:

RECORD   := array('fields' => array(FIELD, …))
FIELD    := array('tag' => TAG, 'occurrence' => OCCURRENCE, 'subfields' => array(SUBFIELD, …))
SUBFIELD := array('code' => CODE, 'value' => VALUE)

Where TAG, OCCURRENCE, CODE, and VALUE are the respective properties of a Pica+ field or subfield.

For example, if your source delivers malformed PicaXML records like so:

<?xml version="1.0" encoding="UTF-8"?>
<record xmlns="info:srw/schema/5/picaXML-v1.0">
  <datafield tag="">
  </datafield>
  <datafield tag="001A">
    <subfield code="0">0001:14-09-10</subfield>
  </datafield>
  …
</record>

You can attach a filter function to remove these fields with an invalid tag:

$reader = new PicaXmlReader();
$reader->setFilter(function (array $r) { 
    return array('fields' => array_filter($r['fields'],
                                          function (array $f) {
                                            return isset($f['tag']) && \HAB\Pica\Record\Field::isValidFieldTag($f['tag']);
                                          }));
  });
$record = $reader->read(…);
$reader->close();

Acknowledgements

Large parts of this package would not have been possible without studying the source of Pica::Record, an open source Perl library for handling Pica+ records by Jakob Voß, and the practical knowledge of our library’s catalogers.

Footnotes