ankane / libmf
Large-scale sparse matrix factorization for PHP
Requires
- php: >= 8.1
- ext-ffi: >= 8.1
Requires (Dev)
- phpunit/phpunit: ^10
README
LIBMF - large-scale sparse matrix factorization - for PHP
Check out Disco for higher-level collaborative filtering
Installation
Run:
composer require ankane/libmf
Add scripts to composer.json
to download the shared library:
"scripts": { "post-install-cmd": "Libmf\\Vendor::check", "post-update-cmd": "Libmf\\Vendor::check" }
And run:
composer install
Getting Started
Prep your data in the format rowIndex, columnIndex, value
$data = new Libmf\Matrix(); $data->push(0, 0, 5.0); $data->push(0, 2, 3.5); $data->push(1, 1, 4.0);
Create a model
$model = new Libmf\Model(); $model->fit($data);
Make predictions
$model->predict($rowIndex, $columnIndex);
Get the latent factors (these approximate the training matrix)
$model->p(); $model->q();
Get the bias (average of all elements in the training matrix)
$model->bias();
Save the model to a file
$model->save('model.txt');
Load the model from a file
$model = Libmf\Model::load('model.txt');
Pass a validation set
$model->fit($data, $validSet);
Cross-Validation
Perform cross-validation
$model->cv($data);
Specify the number of folds
$model->cv($data, 5);
Parameters
Pass parameters - default values below
use Libmf\Loss; new Libmf\Model( loss: Loss::RealL2, // loss function factors: 8, // number of latent factors threads: 12, // number of threads used bins: 25, // number of bins iterations: 20, // number of iterations lambdaP1: 0, // coefficient of L1-norm regularization on P lambdaP2: 0.1, // coefficient of L2-norm regularization on P lambdaQ1: 0, // coefficient of L1-norm regularization on Q lambdaQ2: 0.1, // coefficient of L2-norm regularization on Q learningRate: 0.1, // learning rate alpha: 1, // importance of negative entries c: 0.0001, // desired value of negative entries nmf: false, // perform non-negative MF (NMF) quiet: false // no outputs to stdout );
Loss Functions
For real-valued matrix factorization
Loss::RealL2
- squared error (L2-norm)Loss::RealL1
- absolute error (L1-norm)Loss::RealKL
- generalized KL-divergence
For binary matrix factorization
Loss::BinaryLog
- logarithmic errorLoss::BinaryL2
- squared hinge lossLoss::BinaryL1
- hinge loss
For one-class matrix factorization
Loss::OneClassRow
- row-oriented pair-wise logarithmic lossLoss::OneClassCol
- column-oriented pair-wise logarithmic lossLoss::OneClassL2
- squared error (L2-norm)
Metrics
Calculate RMSE (for real-valued MF)
$model->rmse($data);
Calculate MAE (for real-valued MF)
$model->mae($data);
Calculate generalized KL-divergence (for non-negative real-valued MF)
$model->gkl($data);
Calculate logarithmic loss (for binary MF)
$model->logloss($data);
Calculate accuracy (for binary MF)
$model->accuracy($data);
Calculate MPR (for one-class MF)
$model->mpr($data, $transpose);
Calculate AUC (for one-class MF)
$model->auc($data, $transpose);
Example
Download the MovieLens 100K dataset and use:
$trainSet = new Libmf\Matrix(); $validSet = new Libmf\Matrix(); if (($handle = fopen('u.data', 'r')) !== false) { $i = 0; while (($row = fgetcsv($handle, separator: "\t")) !== false) { $data = $i < 80000 ? $trainSet : $validSet; $data->push($row[0], $row[1], $row[2]); $i++; } fclose($handle); } $model = new Libmf\Model(factors: 20); $model->fit($trainSet, $validSet); echo $model->rmse($validSet), "\n";
Resources
History
View the changelog
Contributing
Everyone is encouraged to help improve this project. Here are a few ways you can help:
- Report bugs
- Fix bugs and submit pull requests
- Write, clarify, or fix documentation
- Suggest or add new features
To get started with development:
git clone https://github.com/ankane/libmf-php.git cd libmf-php composer install composer test