hichxm/hyperloglog

A PHP implementation of HyperLogLog algorithm

Maintainers

Package info

github.com/hichxm/HyperLogLog

pkg:composer/hichxm/hyperloglog

Transparency log

Statistics

Installs: 0

Dependents: 0

Suggesters: 0

Stars: 0

Open Issues: 0

dev-master 2026-07-02 21:23 UTC

This package is auto-updated.

Last update: 2026-07-02 21:24:02 UTC


README

A lightweight and dependency-free PHP implementation of the HyperLogLog probabilistic data structure for approximate cardinality estimation.

HyperLogLog allows you to estimate the number of distinct elements in very large datasets while using only a small, fixed amount of memory.

Features

  • 🚀 Fast approximate distinct counting
  • 📦 Zero dependencies
  • 🔧 Configurable number of registers (counterBits)
  • 🔐 Configurable hashing algorithm (xxh3, sha256, md5, etc.)
  • 📊 Theoretical error rate calculation
  • 🧮 Small and large cardinality bias corrections
  • ✅ Strict types and fully documented source code

Requirements

  • PHP 8.0, 8.1, 8.2, 8.3, 8.4 and 8.5
  • No external dependencies

Note: The xxh3 and xxh128 hash algorithms are available only when supported by your PHP version and build. If unavailable, you can use any other algorithm returned by hash_algos(), such as sha256, sha512, or md5.

Installation

Install via Composer:

composer require hichxm/hyperloglog

Basic Usage

<?php

use Hichxm\HyperLogLog\HyperLogLog;

$hll = new HyperLogLog();

$hll->add('apple');
$hll->add('banana');
$hll->add('orange');
$hll->add('apple'); // duplicate

echo $hll->count();

The returned value is an approximation of the number of unique elements.

Constructor

new HyperLogLog(
    int $counterBits = 5,
    string $hashAlgorithm = 'xxh3'
);

Parameters

Parameter Description
counterBits Number of bits used to select registers. The number of registers is 2^counterBits.
hashAlgorithm Any hashing algorithm supported by PHP's hash() function.

Increasing counterBits improves accuracy while increasing memory usage.

Example:

$hll = new HyperLogLog(
    counterBits: 10,
    hashAlgorithm: 'sha256'
);

Accuracy

The theoretical standard error is:

1.04 / √m

where:

m = 2^counterBits

Example:

$error = $hll->theoreticalErrorRate($hll->getM());

Supported Hash Algorithms

Any algorithm supported by PHP can be used.

Examples include:

  • xxh3 (recommended when available)
  • xxh128
  • sha256
  • sha512
  • md5
  • sha1

You can list available algorithms using:

print_r(hash_algos());

Public API

Add an element

$hll->add('user-123');

Estimate the number of distinct values

$count = $hll->count();

Calculate the theoretical error

$error = $hll->theoreticalErrorRate($hll->getM());

Measure estimation error

$error = $hll->measureError($estimated, $actual);

When to Use HyperLogLog

HyperLogLog is well suited for:

  • Counting unique visitors
  • Counting unique IP addresses
  • Analytics pipelines
  • Large log processing
  • Stream processing
  • Database statistics
  • Big data applications

It is not appropriate when an exact distinct count is required.

References

License

MIT