ankane / mitie
Named-entity recognition for PHP
Requires
- php: >= 8.1
- ext-ffi: >= 8.1
Requires (Dev)
- phpunit/phpunit: ^10
README
MITIE - named-entity recognition, binary relation detection, and text categorization - for PHP
- Finds people, organizations, and locations in text
- Detects relationships between entities, like
PERSON
was born inLOCATION
Installation
Run:
composer require ankane/mitie
Add scripts to composer.json
to download the shared library:
"scripts": { "post-install-cmd": "Mitie\\Vendor::check", "post-update-cmd": "Mitie\\Vendor::check" }
Run:
composer install
And download the pre-trained models for your language:
Getting Started
Named Entity Recognition
Load an NER model
$model = new Mitie\NER('ner_model.dat');
Create a document
$doc = $model->doc('Nat works at GitHub in San Francisco');
Get entities
$doc->entities();
This returns
[ ['text' => 'Nat', 'tag' => 'PERSON', 'score' => 0.3112371212688382, 'offset' => 0], ['text' => 'GitHub', 'tag' => 'ORGANIZATION', 'score' => 0.5660115198329334, 'offset' => 13], ['text' => 'San Francisco', 'tag' => 'LOCATION', 'score' => 1.3890524313885309, 'offset' => 23] ]
Get tokens
$doc->tokens();
Get tokens and their offset
$doc->tokensWithOffset();
Get all tags for a model
$model->tags();
Training
Load an NER model into a trainer
$trainer = new Mitie\NERTrainer('total_word_feature_extractor.dat');
Create training instances
$tokens = ['You', 'can', 'do', 'machine', 'learning', 'in', 'PHP', '!']; $instance = new Mitie\NERTrainingInstance($tokens); $instance->addEntity(3, 4, 'topic'); // machine learning $instance->addEntity(6, 6, 'language'); // PHP
Add the training instances to the trainer
$trainer->add($instance);
Train the model
$model = $trainer->train();
Save the model
$model->saveToDisk('ner_model.dat');
Binary Relation Detection
Detect relationships betweens two entities, like:
PERSON
was born inLOCATION
ORGANIZATION
was founded inLOCATION
FILM
was directed byPERSON
There are 21 detectors for English. You can find them in the binary_relations
directory in the model download.
Load a detector
$detector = new Mitie\BinaryRelationDetector('rel_classifier_organization.organization.place_founded.svm');
And create a document
$doc = $model->doc('Shopify was founded in Ottawa');
Get relations
$detector->relations($doc);
This returns
[['first' => 'Shopify', 'second' => 'Ottawa', 'score' => 0.17649169745814464]]
Training
Load an NER model into a trainer
$trainer = new Mitie\BinaryRelationTrainer($model);
Add positive and negative examples to the trainer
$tokens = ['Shopify', 'was', 'founded', 'in', 'Ottawa']; $trainer->addPositiveBinaryRelation($tokens, [0, 0], [4, 4]); $trainer->addNegativeBinaryRelation($tokens, [4, 4], [0, 0]);
Train the detector
$detector = $trainer->train();
Save the detector
$detector->saveToDisk('binary_relation_detector.svm');
Text Categorization
Load a model into a trainer
$trainer = new Mitie\TextCategorizerTrainer('total_word_feature_extractor.dat');
Add labeled text to the trainer
$trainer->add('This is super cool', 'positive');
Train the model
$model = $trainer->train();
Save the model
$model->saveToDisk('text_categorization_model.dat');
Load a saved model
$model = new Mitie\TextCategorizer('text_categorization_model.dat');
Categorize text
$model->categorize('What a super nice day');
History
View the changelog
Contributing
Everyone is encouraged to help improve this project. Here are a few ways you can help:
- Report bugs
- Fix bugs and submit pull requests
- Write, clarify, or fix documentation
- Suggest or add new features
To get started with development:
git clone https://github.com/ankane/mitie-php.git cd mitie-php composer install composer test