writecrow / tag_converter
Converts tagged textfiles to JSON, PHP or XML
Requires (Dev)
- brianium/paratest: ^3.0
- phpunit/phpunit: ~7
Suggests
- ext-mbstring: For best performance
- symfony/polyfill-mbstring: If you can't install ext-mbstring
README
A PHP library for converting files tagged with corpus metadata to JSON, PHP, or XML.
History
Corpus linguistics researchers use a markup-like syntax to provide metadata about texts. For consumption by applications, this syntax needs to be converted into a more universal, machine-readable format. The format chosen was JSON.
Basic Usage
The included /demo/index.php
file contains a conversion form demonstration.
Make your code aware of the TagConverter class via your favorite method (e.g.,
use
or require
)
Then pass a string of text into the class:
$text = TagConverter::json('<MyTag: 123>My tagged text here'); echo $text; // Returns {"MyTag":"123","text":"My tagged text here"} $text = TagConverter::php('<MyTag: 123>My tagged text here'); echo $text; // Returns array('MyTag' => '123', 'text' => 'My tagged text here') $text = TagConverter::xml('<MyTag: 123>My tagged text here'); echo $text; // Returns <?xml version="1.0"?><root><MyTag>123</MyTag><text>My tagged text here</text></root>
Expected input format
The corpus style tagging syntax expected by the library is defined as follows:
- Tags must be wrapped in
<
and>
- Tag names and tag values may only alphanumeric characters, spaces, underscores, and hypens.
- Tag names must be separated from tag values by a
:
- Spaces at the beginning at end of tag names or tag values are ignored; spaces within tag values will be preserved
- Everything not wrapped in
<
and>
will be considered "text"
Testing
Unit Tests can be run (after composer install
) by executing vendor/bin/phpunit