bishopb / pattern
A string-matching PHP library sporting a consistent, fluent API.
Requires
- php: ~5.3
Requires (Dev)
- athletic/athletic: ~0.1
- facebook/xhprof: dev-master@dev
- mockery/mockery: 0.9.4
- phpunit/php-timer: ~1.0
- phpunit/phpunit: ~4.0
- phpunit/test-listener-xhprof: 1.0.*@dev
This package is auto-updated.
Last update: 2024-10-29 04:47:21 UTC
README
Warning: This API does not exist. After implementing it, and measuring the performance, I decided the overhead did not meet my goals and abandoned that code. The situation may be different in PHP 8. If you like this API, and wnat to see it come alive, please implement it and submit a PR.
Pattern is a string-matching PHP library sporting a consistent, fluent API.
Pattern unifies the API for strcmp
and family, fnmatch
, preg_match
, and
version_compare
, while also offering convenience methods for common
string-matching operations.
Pattern might be for you if:
- You want readable string-matching code that clearly describes your intent.
- You're frustrated that there's no simple, built-in implementation to find if a string ends with another.
- You want to avoid silly off-by-one errors when doing simple string checks.
- You want the best performing algorithm for a particular kind of string check,
regardless of whether that's
strstr
orstrpos
. - You're tired of referring to the PHP user manual for the argument order of
strpos
and friends.
Quickstart
Install with Composer: composer require bishopb/pattern:dev-master
Use:
use BishopB\Pattern; // common matching API regardless of pattern language $subjects = array ( 'Capable', 'Enabler', 'Able', ); $patterns = array ( new Pattern\Literal('Able'), new Pattern\Wildcard('*able*') ); foreach ($subjects as $subject) { foreach ($patterns as $pattern) { $pattern->matches($subject) and "$pattern matches $subject"; } } // literal matching sugar $able = new Pattern\Literal('Able')->fold(); $able->foundIn('tablet'); $able->begins('Abletic'); $able->ends('Parable'); $able->sorts->before('active'); $able->sorts->after('aardvark'); // version matching sugar $stable = new Pattern\Version('1.0.0'); $stable->matches('1.0.0'); $stable->before('1.0.1'); $stable->after('0.9.9');
Motivation
In PHP, developers have four common ways to match strings to patterns:
There are few problems with these API:
$pattern
is a plain old string, which means you can make probable mistakes like:fnmatch('^foo.*bar', $input)
strcmp
and family return an orderable result that doesn't encourage intenional programming. Consider:if (! strcasecmp('foo', $input)) { echo 'pop quiz: matches foo?'; }
- Functions to perform literal comparisons are scattered all over the place:
strcmp
,strcasecmp
,strpos
,stripos
, etc. - Both
strcasecmp
and==
are dangerous ways to compare strings. - Can be difficult to remember which argument is pattern and which is subject
(compare
strpos
andpreg_match
). - How one specifies "case-insensitive" various widely amongst the comparison functions.
- If your code initially accepts literal matches, then you want to support regular expressions, you have to re-write your code.
- Not every platform supports
fnmatch
.
This library provides a fast, thin abstraction over the built-in pattern matching functions to mitigate these problems.
Performance
This package's philosophy is simple: to deliver syntactic sugar with minimal run-time fat. API calls are a thin facade over the fastest implementation of the requested match. Space is conserved as much as possible.
Run-time benchmarks
Meaurements for different tests in operations per second.
Peak-memory consumption benchmarks
Note: All benchmarks run a minimum of 1,000 times on a small, unloaded EC2 instance
using PHP 5.3. Refer to tests/*Event.php
for actual code. Refer to the
Travis CI builds for run times on different PHP versions.
Advanced usage
Manipulating the search subjects
Typically methods in the pattern classes (Literal
, Wildcard
, and Pcre
)
take strings. However, you can also pass instances of Subject
, which is
a lightweight string class fit with methods common to string comparison:
use BishopB\Pattern; $device = new Literal('Tablet')->fold(); $version = new Version('8.1'); $subject = new Subject(' Microsoft Tablet running Windows 8.1.0RC242.')-trim(); $device->matches( $subject-> column(' ', 1) // explode at space and get the 1st index (0-based) ); $version->after( $subject-> column(' ', -1)-> // explode at space and get the last index (nth-from last) substring(0, 4) // only the first 5 characters );
Faster searching of big text or with repeated searches
When your subject text is long, or you expect to compare your literal pattern to many different subjects, it's worth it to "study" the literal pattern for improved performance.
// notice the use of study() // without this, searching would be much slower $zebra = new Literal('zebra')->fold()->study(); $words = file_get_contents('/usr/share/dict/words') or die('No dictionary'); $zebra->foundIn($words);
You may be wondering: how many characters is "long"? Or, how many iterations is "many"? Well, I suppose it depends. But, a long time ago, some PHP internals benchmarking suggested a length of 5000+ or more would make studying worth it.
FAQ
Why not just use the built-ins?
For the reasons mentioned above. Personally, I wrote this library because I kept referring to the official docs on the argument order for the built-ins and because common use cases aren't handled concisely. In summary, this library lets me write less code and be more clear in meaning.
For example, I see a lot of code following this pattern:
if (! strcmp($actual, $expected)) { $this->doSomething(); } else { throw new \RuntimeException('Actual does not match expected'); }
It's technically right. But, to me, it looks wrong. I find this much easier to read:
if ($actual->matches($expected)) { $this->doSomething(); } else { throw new \RuntimeException('Actual does not match expected'); }
There is a related side benefit. In weak-mode PHP, functions that receive
an invalid parameter emit a warning and return null
. Since null
evaluates
falsey, the example above runs doSomething
unexpectedly. Consider:
// ?password=[] if (! strcmp($_GET['password'], $user->password)) { $this->login($user); } else { throw new \RuntimeException('Invalid password'); }
Why? Because true === (! (null === strcmp(array (), '******')))
. In this
library, an exception is raised if you try to match against an array.
Why not add more stringy methods, like length()
, to Subject
?
The package overall aims to support pattern matching in the lightest weight
possible. Bulking up Subject
with methods unrelated to pattern matches
conflicts with this goal.