shipmonk/copy-paste-detector

Finds duplicated PHP code structures using AST-based analysis, inspired by CloneDR to detect Type-2 (parameterized) code clones.

Maintainers

Package info

github.com/shipmonk-rnd/copy-paste-detector

pkg:composer/shipmonk/copy-paste-detector

Statistics

Installs: 87

Dependents: 0

Suggesters: 0

Stars: 5

Open Issues: 1

dev-master 2026-05-12 10:21 UTC

README

An AST-based structural code clone detector for PHP, inspired by the CloneDR methodology. This tool efficiently detects Type-2 (parameterized) code clones using AST analysis and hash-based exact matching.

Installation

composer require --dev shipmonk/copy-paste-detector

Basic Usage

vendor/bin/copy-paste-detector src/

CLI Options

  • --config=config.php or -c config.php

    • Path to configuration file
    • Defaults to copy-paste-detector.php in current directory
    • Configuration file must return a CopyPasteDetector\Config\Config instance
  • --min-node-count=100 or -m 100

    • Minimum number of AST nodes for a subtree to be considered
    • Defaults to 50
  • --cache-dir=cache/

    • Directory for caching parsed structures
    • Defaults to system temp directory
  • --patch=changes.patch

    • Path to a git diff/patch file (extension must be .patch or .diff)
    • Only reports clone groups with at least one instance fully inside the patch's added lines.
    • Requires sebastian/diff to be installed.
  • --ansi / --no-ansi

    • Force enable or disable ANSI color output
    • By default, colors are auto-detected based on the terminal

Check if MR copied code from elsewhere

git diff master...HEAD > changes.patch
vendor/bin/copy-paste-detector --patch=changes.patch src/ tests/
  • A clone group is reported if at least one instance lies fully inside the patch's added lines. The other instances may be either elsewhere in the codebase or also inside the patch.

Configuration File

Create a copy-paste-detector.php file in your project root to configure detection settings:

<?php

use CopyPasteDetector\Config\Config;

$config = new Config();

// Set paths to analyze
$config->setPaths(['src/', 'tests/']);

// Set the minimum node count for clone detection
$config->setMinNodeCount(50);

// Set cache directory (optional, defaults to system temp directory)
$config->setCacheDir('cache/copy-paste-detector/');

// Exclude paths from analysis
$config->setExcludePaths(['tests/_fixtures', 'src/Generated/']);

// Enable clickable links to your IDE
$config->setEditorUrl('phpstorm://open?file={file}&line={line}');

// Configure anonymization strategies
$config->setAnonymizeVariables(true); // treat variable names like `$foo` and `$bar` as equivalent
$config->setAnonymizeLiterals(false); // treat string and number literals as equivalent
$config->setAnonymizeNames(false); // treat function and class names as equivalent
$config->setAnonymizeIdentifiers(false); // treat method and constant names as equivalent

return $config;

Contributing

  • Check your code by composer check
  • Autofix coding-style by composer fix:cs
  • All functionality must be tested