hab / picareader
Classes for reading Pica+ records encoded in Pica, PicaXML and PicaPlain
Requires
- hab/picarecord: ~1.0
This package is not auto-updated.
Last update: 2024-10-26 18:12:03 UTC
README
[![Build Status](https://travis-ci.org/dmj/PicaReader.svg?branch=master)](https://travis-ci.org/dmj/PicaReader)
About
PicaReader provides classes for reading Pica+ records encoded in PicaXML and PicaPlain.
PicaReader is copyright (c) 2012-2017 by Herzog August Bibliothek Wolfenbüttel and released under the terms of the GNU General Public License v3.
Installation
You can install PicaReader via Composer.
composer require hab/picareader
Usage
All readers adhere to the same interface. You open the reader with a string of input data by calling
Reader::open()
and can call Reader::read()
to read the next record in the input data. If the
input does not contain (anymore) records Reader::read()
returns FALSE
. Otherwise it returns
either a record object created with PicaRecord’s Record::factory()
function.
$reader = new \HAB\Pica\Reader\PicaXmlReader() $reader->open(file_get_contents('http://unapi.gbv.de?id=opac-de-23:ppn:635012286&format=picaxml')); $record = $reader->read(); $reader->close();
To filter out records or fields you can attach a filter to the reader via Reader::setFilter()
. A
filter is any valid PHP callback that takes an associative array representing the record as argument
and returns a possibly modified array or FALSE
if the entire record should be skipped.
The array representation of a record is defined as follows:
RECORD := array('fields' => array(FIELD, …)) FIELD := array('tag' => TAG, 'occurrence' => OCCURRENCE, 'subfields' => array(SUBFIELD, …)) SUBFIELD := array('code' => CODE, 'value' => VALUE)
Where TAG
, OCCURRENCE
, CODE
, and VALUE
are the respective properties of a Pica+ field or
subfield.
For example, if your source delivers malformed PicaXML records like so:
<?xml version="1.0" encoding="UTF-8"?> <record xmlns="info:srw/schema/5/picaXML-v1.0"> <datafield tag=""> </datafield> <datafield tag="001A"> <subfield code="0">0001:14-09-10</subfield> </datafield> … </record>
You can attach a filter function to remove these fields with an invalid tag:
$reader = new PicaXmlReader(); $reader->setFilter(function (array $r) { return array('fields' => array_filter($r['fields'], function (array $f) { return isset($f['tag']) && \HAB\Pica\Record\Field::isValidFieldTag($f['tag']); })); }); $record = $reader->read(…); $reader->close();
Acknowledgements
Large parts of this package would not have been possible without studying the source of Pica::Record, an open source Perl library for handling Pica+ records by Jakob Voß, and the practical knowledge of our library’s catalogers.