decatur-vote / text-diff
A minimal library that produces diffs in a fairly human readable, plain-text format. Diffs allow converting old text to new text & visa versa.
Installs: 15
Dependents: 1
Suggesters: 0
Security: 0
Stars: 0
Forks: 1
pkg:composer/decatur-vote/text-diff
Requires (Dev)
- taeluf/code-scrawl: v0.8.x-dev
- taeluf/phtml: v0.1.x-dev
- taeluf/tester: v0.3.x-dev
Suggests
- taeluf/phtml: Used by TextDiff to beautify HTML, so that HTML is multi-line & can be diff'd.
This package is auto-updated.
Last update: 2025-10-29 03:12:17 UTC
README
PHP Text Diffs
A minimal library that produces diffs in a fairly human readable, plain-text format. Diffs allow converting old text to new text & visa versa.
Installation
composer require decatur-vote/text-diff v1.0.x-dev
or in your composer.json
{"require":{ "decatur-vote/text-diff": "v1.0.x-dev"}}
Usage
Generate a diff, and move forward or backward with the operations (opps).
<?php
$differ = new \DecaturVote\Differ();
$old = 'old text';
// $old = $differ->fix_html($old); # optional - will beautify html using DOMDocument, via composer package taeluf/phtml
$new = 'new text';
$opps = $differ->get_opps($old, $new);
// file_put_contents('file_path.txt', $opps)
$recovered_old_text = $differ->backward($new, $opps);
$restored_new_text = $differ->forward($old, $opps);
Notes
- Takes 122ms (on my machine) to generate diffs for two 15,000 character strings and then run both
forward()andbackward(). Run viaphptest LargeRandomStringDiffs phptest -test MultipleDiffs -run dup- tests a specific diff test. See the test class.- There are 27 different diff tests that are all passing, several of which were added when bugs were found via LargeRandomStringDiffs test. Most (maybe all) bugs were because of duplicate lines not being handled.
TODO maybe?
- create diff html (I don't know how that'll work with my
moveoperations, so I probably won't do this)
How it works
Overview of how the Differ works. We'll cover what diffs look like, how they're processed, THEN how they're created.
What diffs look like
This will remove lines 1 & 3, add lines 'c' & 'def' at positions 3 & 5, then move line 6 to line 4 & line 7 to line 3.
1-abc
3-baby
3+c
5+def
6>4
7>3
NOTE: 0-based line indexes. The first line is line 0. The second line is line 1. 1-abc means there is a line BEFORE abc
NOTE: remove & add line operations depend on previous instructions having already been applied. 3-baby, above, says 'Remove Line 3, which contains the text "baby".'. However, the original text has baby as line 4. 3-baby is processed AFTER 1-abc ... so the 4th line ("baby") becomes the 3rd line after line 1 is removed, so we get 3-baby
Processing Diffs
Forward diffs are processed from top to bottom, always in this order:
- remove lines
- add lines
- move lines
NOTE: For 6>4, line 4 is replaced by line 6. Line 6 is unchanged, unless a different move opp overwrites line 6.
Backward diffs are processed from bottom to top:
- Move lines
- Remove lines that are listed as
addoperations (ex5+defwould be removed) - Add lines that are listed as
removeoperations (ex1-abcwould be added)
Generating Diffs
Duplicate lines are handled at each stage (most of the complexity of the algorithm comes from managing duplicate lines).
- Generate an array of lines for
$old_textand$new_text - Find any lines in
$old_textthat don't exist in$new_text& generate opps for these lines to be removed - Find any lines in
$new_textthat don't exist in$old_text& generate opps for these lines to be added - Apply the remove & add opps to the
$old_textto create an out-of-order copy of the new text. - Compare each line of the out-of-order new text to the supplied
$new_text. Find the correct location.