detain/php-dup-finder

AST-based PHP duplicate-logic detector and refactoring assistant.

Maintainers

Package info

github.com/detain/php-dup-finder

pkg:composer/detain/php-dup-finder

Transparency log

Statistics

Installs: 0

Dependents: 0

Suggesters: 0

Stars: 1

Open Issues: 0

v1.0.0 2026-05-11 05:36 UTC

This package is auto-updated.

Last update: 2026-07-01 12:02:45 UTC


README

A semantic clone detector and refactoring assistant for PHP codebases. Behaves more like an "extract function" advisor than a copy/paste finder.

CI codecov PHP Version License

phpdup parses every file in a PHP codebase into an Abstract Syntax Tree, normalizes those ASTs into a canonical form, and finds clusters of parameterizable duplication — places where the shape of the code repeats and only literals, identifiers, method names, table names, or whole optional segments of code vary.

For each cluster it doesn't just point at the duplicates, it tells you what the abstraction would look like — its parameter list, types, and a suggested function name — ready to drop into a refactor.

phpdup CLI scanning a fixture corpus

A run on tests/Fixtures returns its top 2 clusters by impact. Each cluster's "Suggested abstraction" box is the function signature phpdup is recommending you extract; the "Holes" table lists every parameter with its inferred type and the values observed across cluster members. Compare with classic copy/paste detectors that only highlight the duplication; phpdup tells you the threshold and the role string are the parameters of the abstraction with their inferred types and observed values, ready to apply.

Quick start

Install

# Download PHAR (recommended)
curl -sSLO https://github.com/detain/php-dup-finder/releases/latest/download/phpdup.phar
curl -sSLO https://github.com/detain/php-dup-finder/releases/latest/download/phpdup.phar.sha256
sha256sum --check phpdup.phar.sha256
chmod +x phpdup.phar && sudo mv phpdup.phar /usr/local/bin/phpdup

# Or via Composer (dev dependency)
composer require --dev detain/php-dup-finder

# Or from source
git clone https://github.com/detain/php-dup-finder.git && cd php-dup-finder && composer install

Analyze

# Scan and see top clusters in CLI
bin/phpdup analyze src

# Multi-format output: JSON + HTML report
bin/phpdup analyze src --json phpdup.json --html phpdup-report --min-impact 30

# CI gate (exact clones only, ~6s on 3,300-block corpus)
bin/phpdup analyze src --exact-only --min-impact 50

View report

# Open HTML report in browser
open phpdup-report/index.html

# Or query the JSON
jq '.clusters | length' phpdup.json

Key CLI flags

Flag Default Description
--config, -c Load settings from phpdup.json
--json FILE Structured JSON report
--html DIR Interactive HTML report
--min-impact N 20 Minimum cluster impact (≈ duplicated lines) to include
--min-block-size N 8 Minimum AST node count per block
--kinds K1,K2 all Block kinds to analyze: method, closure, function, if, for, etc.
--exact-only off Skip near-duplicate detection (fast Type-1 only)
--optional-blocks on/off on Enable Type-3 / optional-segment detection
--mode strict/default/aggressive aggressive Normalization strictness
--similarity N 0.80 Jaccard threshold for near-duplicate phase
--db-aware off Enable ORM-aware semantic deduplication (Eloquent, Doctrine, PDO, etc.)
--workers N, -j 0 Worker count (0 = auto-detect from CPU cores)
--auto-tune off Probe corpus and pick size-appropriate defaults
--tui off Show interactive SugarCraft dashboard
--watch off Re-run analysis on file changes (poll-based)

Full reference: docs/CLI.md

Documentation

Topic File
CLI flag reference docs/CLI.md
HTTP API server (phpdup serve) docs/SERVER.md
CI/CD integration recipes docs/CI.md
JSON config schema docs/config-schema.json
ML pair-similarity sidecar docs/ML.md
ML training corpus format docs/ml-corpus-format.md
Release & distribution process docs/RELEASE.md
Playground front-end docs/PLAYGROUND.md
JetBrains IDE plugin contract docs/JETBRAINS_PLUGIN.md

Table of contents

License

MIT — see LICENSE