sphamster / bayes
Bayes machine learning
Installs: 8
Dependents: 0
Suggesters: 0
Security: 0
Stars: 0
Watchers: 0
Forks: 0
Open Issues: 3
pkg:composer/sphamster/bayes
Requires
- php: ^8.2
Requires (Dev)
- laravel/pint: ^1.21.1
- pestphp/pest: ^3.7.4
- pestphp/pest-plugin-type-coverage: ^3.3
- phpstan/phpstan: ^2.1.8
- rector/rector: ^2.0.10
This package is auto-updated.
Last update: 2025-11-09 17:04:54 UTC
README
Bayes: Naive Bayes Classifier for PHP
A powerful machine learning library for text classification, sentiment analysis, and multi-label categorization
About Bayes
Bayes is a high-performance Naive Bayes classifier for PHP 8.2+ that leverages machine learning to automatically categorize text documents into arbitrary categories. Built with modern PHP practices and rigorous code quality standards, it's the perfect solution for natural language processing (NLP) tasks in PHP applications.
Whether you're building spam filters, sentiment analysis systems, content recommendation engines, or multi-label text categorization tools, Bayes provides a simple yet powerful API backed by solid mathematical foundations.
Fork Notice: This library is an enhanced fork of niiknow/bayes, rewritten with modern PHP 8.2 features, comprehensive test coverage, and strict type safety.
Table of Contents
- About Bayes
- Table of Contents
- Key Features
- Use Cases
- Setup
- Quick Start
- Single-Label Classification
- Multi-Label Classification
- Advanced Usage
- Credits
- License
Key Features
- 🚀 High Performance - Optimized for speed with efficient probability calculations
- 🎯 Multi-Label Support - Classify documents into multiple categories simultaneously
- 🔧 Customizable Tokenizers - Plug in your own tokenization logic for any language
- 💾 State Persistence - Export and import trained classifiers as JSON
- 📊 Multiple Filtering Strategies - Threshold, Top-K, and Above-Mean filters included
- ✅ Production Ready - PHPStan max level, 100% type coverage, comprehensive test suite
- 🌍 Framework Agnostic - Works with Laravel, Symfony, or standalone PHP applications
- 📦 Zero Dependencies - Pure PHP implementation, no external libraries required
Use Cases
Bayes excels at a wide range of text classification and machine learning tasks like:
- Spam Detection & Email Filtering
- Sentiment Analysis
- Content Categorization
- Intent Recognition
- Multi-Label Tagging
- Language Detection
Setup
You can install the package via Composer:
composer require sphamster/bayes
Requirements
- PHP 8.2 or higher
- No other dependencies required
Quick Start
Get up and running with Bayes in less than 5 minutes:
<?php use Sphamster\SingleLabelBayes; // Create a new classifier instance (uses DefaultTokenizer automatically) $classifier = new SingleLabelBayes(); // Train with positive examples $classifier->train('amazing, awesome movie!! Yeah!! Oh boy.', 'positive'); $classifier->train('Sweet, this is incredibly, amazing, perfect, great!!', 'positive'); // Train with negative examples $classifier->train('terrible, shitty thing. Damn. Sucks!!', 'negative'); // Predict the category of new text $result = $classifier->predict('awesome, cool, amazing!! Yay.'); // Returns: 'positive' // Get probability scores for all categories $probabilities = $classifier->probabilities('awesome, cool, amazing!! Yay.'); // Returns: array of Probability objects with log probabilities // Export the trained model for later use $json = $classifier->export(); // Import a previously trained model $classifier->import($json);
Single-Label Classification
Use the SingleLabelBayes class when each document belongs to exactly one category. This is ideal for tasks like spam detection, sentiment analysis, or any classification where outcomes are mutually exclusive.
Batch Training
Train on multiple examples at once:
$classifier->trainOn([ ['sample' => 'This movie is fantastic!', 'label' => 'positive'], ['sample' => 'Loved every minute of it', 'label' => 'positive'], ['sample' => 'Waste of time and money', 'label' => 'negative'], ['sample' => 'Absolutely terrible film', 'label' => 'negative'], ]);
Persistence
Save and restore your trained models:
// Export to JSON $json = $classifier->export(); file_put_contents('sentiment-model.json', $json); // Import from JSON $json = file_get_contents('sentiment-model.json'); $classifier->import($json);
Multi-Label Classification
For documents that can belong to multiple categories simultaneously, use MultiLabelBayes. Perfect for news article tagging, product categorization, or any scenario where a single piece of content fits multiple classifications:
use Sphamster\MultiLabelBayes; use Sphamster\Support\Filters\ThresholdFilter; use Sphamster\Support\Filters\TopKFilter; use Sphamster\Support\Filters\AboveMeanFilter; // Create a new multi-label classifier (uses DefaultTokenizer automatically) $classifier = new MultiLabelBayes(); // Train with multiple labels per sample $classifier->train( 'iPhone 15 Pro price drops as Samsung releases new Galaxy', ['technology', 'business', 'mobile'] ); $classifier->train( 'AI breakthrough helps detect cancer in early stages', ['technology', 'health', 'science'] ); $classifier->train( 'Stock market crashes amid banking crisis', ['business', 'finance', 'economy'] ); // Predict using different strategies: // 1. Threshold Filter: Get all categories above 30% probability $predictions = $classifier->predict( 'Tech company stocks rise after AI announcement', new ThresholdFilter(0.3) ); // Returns: [Probability('technology', ...), Probability('business', ...)] // 2. Top-K Filter: Get top 2 most likely categories $predictions = $classifier->predict( 'Medical AI startup secures funding', new TopKFilter(2) ); // Returns: top 2 categories by probability // 3. Above Mean Filter: Get categories above average probability $predictions = $classifier->predict( 'Electric cars impact oil industry', new AboveMeanFilter() ); // Returns: categories with above-average probability // Extract category names $categories = array_map(fn($p) => $p->category(), $predictions); // Batch training $classifier->trainOn([ [ 'sample' => 'New electric vehicle startup raises $500M', 'labels' => ['technology', 'business', 'automotive'] ], [ 'sample' => 'Machine learning improves weather forecasting', 'labels' => ['technology', 'science', 'environment'] ], ]);
Choosing Between Single-Label and Multi-Label
| Classifier | Best For | Example Use Cases |
|---|---|---|
SingleLabelBayes |
One category per document | Spam detection, sentiment analysis, language detection |
MultiLabelBayes |
Multiple categories per document | News tagging, product categorization, skill assessment |
Available Prediction Filters
| Filter | Description | Use Case |
|---|---|---|
ThresholdFilter(float $threshold = 0.3) |
Returns categories with probability ≥ threshold | Medical diagnosis (flag all conditions above 30%) |
TopKFilter(int $k = 3) |
Returns top K categories by probability | Content recommendations (show top 3 topics) |
AboveMeanFilter() |
Returns categories above mean probability | Adaptive filtering based on distribution |
Custom Filters
You can create custom filtering strategies by implementing the PredictionFilter interface:
use Sphamster\Contracts\PredictionFilter; use Sphamster\Support\Probability; class MyCustomFilter implements PredictionFilter { public function filter(array $probabilities): array { // Your custom filtering logic return array_filter($probabilities, function(Probability $p) { return exp($p->log()) > 0.5; }); } } $predictions = $classifier->predict('Sample text', new MyCustomFilter());
Advanced Usage
Customizing the Tokenizer
By default, the classifier uses DefaultTokenizer which:
- Converts text to lowercase
- Extracts only alphabetic characters
- Does NOT remove stopwords or perform stemming
To use your own custom tokenizer, create a class that implements the Tokenizer interface and pass an instance of it to the constructor. For example:
<?php use Sphamster\Contracts\Tokenizer; class MyCustomTokenizer implements Tokenizer { public function tokenize(string $text): array { // Define your custom stopwords $stopwords = ['the', 'and', 'is', 'are', 'was', 'were']; // Build a regex pattern to match stopwords $pattern = '~\b(' . implode('|', array_map('preg_quote', $stopwords)) . ')\b~i'; // Convert the text to lowercase and remove stopwords $clean_text = preg_replace($pattern, '', mb_strtolower($text)); // Extract tokens consisting only of alphabetic characters preg_match_all('/[[:alpha:]]+/u', $clean_text, $matches); return $matches[0] ?? []; } } // Instantiate your custom tokenizer and pass it to SingleLabelBayes $tokenizer = new MyCustomTokenizer(); $classifier = new \Sphamster\SingleLabelBayes(tokenizer: $tokenizer); // Works with both single-label and multi-label classifiers $multi_label_classifier = new \Sphamster\MultiLabelBayes(tokenizer: $tokenizer);
Working with Probabilities
Access raw probability scores for deeper analysis:
// Get probabilities for all categories $probabilities = $classifier->probabilities('This is amazing!'); foreach ($probabilities as $probability) { echo sprintf( "Category: %s, Log Probability: %.4f, Probability: %.4f\n", $probability->category(), $probability->log(), exp($probability->log()) ); }
Running Tests
# Run all tests (refactor, lint, types, unit) composer test # Run individual test suites composer test:unit # PHPUnit/Pest tests composer test:types # PHPStan static analysis composer test:lint # PSR-12 code style check composer test:refactor # Rector refactoring rules # Auto-fix code style issues composer lint # Format code with Pint composer refactor # Apply Rector rules
Credits
- Author: Andrea Civita
- Original Library: niiknow/bayes
- Built with ❤️ by Sphamster
License
The MIT License (MIT). Please see License File for more information.
Need help? Open an issue on GitHub
⭐ Found this useful? Give it a star on GitHub!
