coral-media / php-ir
Information Retrieval algorithms (vector space, similarity, clustering)
Installs: 0
Dependents: 0
Suggesters: 0
Security: 0
Stars: 0
Watchers: 0
Forks: 0
Open Issues: 0
pkg:composer/coral-media/php-ir
Requires
- php: >=8.2
- ext-curl: *
- ext-json: *
- ext-openssl: *
Requires (Dev)
- friendsofphp/php-cs-fixer: ^3.92
- phpstan/phpstan: ^2.1
- phpunit/phpunit: ^11.5
- symfony/var-dumper: ^7.4
README
A focused PHP library implementing classical Information Retrieval (IR)
algorithms based on the Stanford IR book
Introduction to Information Retrieval.
The goal of this project is to provide a correct, deterministic, and explainable IR core suitable for search, clustering, and recommendation systems.
License
MIT
Scope
- Vector space model (dense and sparse vectors)
- Similarity measures (cosine, euclidean)
- Term weighting:
- Term Frequency (TF)
- Inverse Document Frequency (IDF)
- TF-IDF
- BM25 (probabilistic ranking)
- Clustering (K-Means, K-Means++)
- Deterministic, explainable algorithms
Non-Goals
- NLP pipelines
- Transformers / embeddings
- Semantic vector databases
- Dataset abstractions
- Framework integrations
This library intentionally focuses on classical IR, not modern NLP.
Philosophy
This library favors:
- Mathematical correctness
- Explicit, inspectable abstractions
- Deterministic behavior
- Minimal hidden state
- Native acceleration via Zephir (optional, future-facing)
API Stability
New functionality will be added in a backward-compatible manner.
Releasing
Project versions are tracked using a .version file and Git tags.
To bump the version:
./scripts/bump-version.sh {major|minor|patch}
git commit -am "chore(release): bump version"