detain / php-dup-finder
AST-based PHP duplicate-logic detector and refactoring assistant.
Requires
- php: ^8.1
- ext-gmp: *
- ext-hash: *
- nikic/php-parser: ^5.0
- sebastian/diff: ^5.1 || ^6.0
- sugarcraft/candy-core: dev-master
- sugarcraft/candy-hermit: dev-master
- sugarcraft/candy-kit: dev-master
- sugarcraft/candy-log: dev-master
- sugarcraft/candy-palette: dev-master
- sugarcraft/candy-sprinkles: dev-master
- sugarcraft/candy-zone: dev-master
- sugarcraft/sugar-bits: dev-master
- sugarcraft/sugar-charts: dev-master
- sugarcraft/sugar-crumbs: dev-master
- sugarcraft/sugar-prompt: dev-master
- sugarcraft/sugar-spark: dev-master
- sugarcraft/sugar-stickers: dev-master
- sugarcraft/sugar-table: dev-master
- sugarcraft/sugar-toast: dev-master
- sugarcraft/sugar-veil: dev-master
- symfony/console: ^6.4 || ^7.0
Requires (Dev)
- phpstan/phpstan: ^2.1
- phpunit/phpunit: ^10.5
- vimeo/psalm: 6.0.0
Suggests
- sugarcraft/candy-shell: spin/choose/filter/confirm CLI helpers for shell-script integration
- sugarcraft/sugar-wishlist: Example SSH endpoint picker app illustrating SugarCraft component usage
This package is auto-updated.
Last update: 2026-07-01 12:02:45 UTC
README
A semantic clone detector and refactoring assistant for PHP codebases. Behaves more like an "extract function" advisor than a copy/paste finder.
phpdup parses every file in a PHP codebase into an Abstract Syntax
Tree, normalizes those ASTs into a canonical form, and finds clusters
of parameterizable duplication — places where the shape of the
code repeats and only literals, identifiers, method names, table
names, or whole optional segments of code vary.
For each cluster it doesn't just point at the duplicates, it tells you what the abstraction would look like — its parameter list, types, and a suggested function name — ready to drop into a refactor.
A run on tests/Fixtures returns its top 2 clusters by impact. Each
cluster's "Suggested abstraction" box is the function signature
phpdup is recommending you extract; the "Holes" table lists every
parameter with its inferred type and the values observed across cluster
members. Compare with classic copy/paste detectors that only highlight
the duplication; phpdup tells you the threshold and the role string
are the parameters of the abstraction with their inferred types and
observed values, ready to apply.
Quick start
Install
# Download PHAR (recommended) curl -sSLO https://github.com/detain/php-dup-finder/releases/latest/download/phpdup.phar curl -sSLO https://github.com/detain/php-dup-finder/releases/latest/download/phpdup.phar.sha256 sha256sum --check phpdup.phar.sha256 chmod +x phpdup.phar && sudo mv phpdup.phar /usr/local/bin/phpdup # Or via Composer (dev dependency) composer require --dev detain/php-dup-finder # Or from source git clone https://github.com/detain/php-dup-finder.git && cd php-dup-finder && composer install
Analyze
# Scan and see top clusters in CLI bin/phpdup analyze src # Multi-format output: JSON + HTML report bin/phpdup analyze src --json phpdup.json --html phpdup-report --min-impact 30 # CI gate (exact clones only, ~6s on 3,300-block corpus) bin/phpdup analyze src --exact-only --min-impact 50
View report
# Open HTML report in browser open phpdup-report/index.html # Or query the JSON jq '.clusters | length' phpdup.json
Key CLI flags
| Flag | Default | Description |
|---|---|---|
--config, -c |
— | Load settings from phpdup.json |
--json FILE |
— | Structured JSON report |
--html DIR |
— | Interactive HTML report |
--min-impact N |
20 |
Minimum cluster impact (≈ duplicated lines) to include |
--min-block-size N |
8 |
Minimum AST node count per block |
--kinds K1,K2 |
all | Block kinds to analyze: method, closure, function, if, for, etc. |
--exact-only |
off | Skip near-duplicate detection (fast Type-1 only) |
--optional-blocks on/off |
on |
Enable Type-3 / optional-segment detection |
--mode strict/default/aggressive |
aggressive |
Normalization strictness |
--similarity N |
0.80 |
Jaccard threshold for near-duplicate phase |
--db-aware |
off | Enable ORM-aware semantic deduplication (Eloquent, Doctrine, PDO, etc.) |
--workers N, -j |
0 |
Worker count (0 = auto-detect from CPU cores) |
--auto-tune |
off | Probe corpus and pick size-appropriate defaults |
--tui |
off | Show interactive SugarCraft dashboard |
--watch |
off | Re-run analysis on file changes (poll-based) |
Full reference: docs/CLI.md
Documentation
| Topic | File |
|---|---|
| CLI flag reference | docs/CLI.md |
HTTP API server (phpdup serve) |
docs/SERVER.md |
| CI/CD integration recipes | docs/CI.md |
| JSON config schema | docs/config-schema.json |
| ML pair-similarity sidecar | docs/ML.md |
| ML training corpus format | docs/ml-corpus-format.md |
| Release & distribution process | docs/RELEASE.md |
| Playground front-end | docs/PLAYGROUND.md |
| JetBrains IDE plugin contract | docs/JETBRAINS_PLUGIN.md |
Table of contents
- Features — semantic detection, type-3/4, parameter discovery, pattern tags
- Installation — PHAR, Composer, from source
- Self-update —
phpdup self-update - How it works — 5-stage pipeline, normalization modes, clustering, anti-unification
- Type-3 / optional-segment detection
- Type-4 / behavioural similarity
- ORM- / DB-aware deduplication —
--db-aware,--trinity-collapse - TUI mode —
--tui,--theme, keyboard shortcuts - Watch mode —
--watch - SIGINT soft-cancel
phpdup serveREST API — health, sync/async analyze, job polling- Output formats — JSON, HTML, SARIF, GitLab SAST, diff, checkstyle, CSV, Prometheus, time-series, Graphviz, PlantUML, refactor patches, PHPUnit skeletons
- Configuration —
phpdup.jsonschema, per-directory overrides, project profiles - Programmatic use — use the pipeline from PHP
- Examples — threshold-gated, CRUD, optional segments, strategy dispatch
- Static analysis & validation — PHPStan, Psalm,
--validate-config - Benchmarks — comparative suite, feature matrix, internal scaling
- Architecture — module map
- Testing — PHPUnit suites, coverage
- Performance — complexity, caches
- Roadmap
- FAQ
License
MIT — see LICENSE
