yoeunes / regex-parser
A powerful PCRE regex parser with lexer, AST builder, validation, ReDoS analysis, and syntax highlighting. Zero dependencies, blazing fast, and production-ready.
Fund package maintenance!
yoeunes
Installs: 3 878
Dependents: 3
Suggesters: 0
Security: 0
Stars: 12
Watchers: 1
Forks: 2
Open Issues: 0
pkg:composer/yoeunes/regex-parser
Requires
- php: >=8.2
Requires (Dev)
- phpstan/phpstan: ^2.0
- phpstan/phpstan-phpunit: ^2.0
- phpunit/phpunit: ^11.0|^12.0
- psr/cache: ^3.0
- psr/log: ^3.0
- psr/simple-cache: ^3.0
- symfony/config: ^7.4|^8.0
- symfony/console: ^7.4|^8.0
- symfony/dependency-injection: ^7.4|^8.0
- symfony/http-foundation: ^7.4|^8.0
- symfony/http-kernel: ^7.4|^8.0
- symfony/routing: ^7.4|^8.0
- symfony/validator: ^7.4|^8.0
Suggests
- phpstan/extension-installer: To automatically enable the PHPStan rule for regex validation.
- phpstan/phpstan: To run static analysis and detect invalid regex patterns.
- psr/cache: To share AST cache via PSR-6 pools.
- psr/simple-cache: To share AST cache via PSR-16 caches.
- dev-main / 1.x-dev
- v0.26.0
- v0.25.2
- v0.25.1
- v0.25.0
- v0.24.0
- v0.23.0
- v0.22.0
- v0.21.0
- v0.20.0
- v0.19.0
- v0.18.0
- v0.17.4
- v0.17.3
- v0.17.2
- v0.17.1
- v0.17.0
- v0.16.0
- v0.15.1
- v0.15.0
- v0.14.13
- v0.14.12
- v0.14.11
- v0.14.10
- v0.14.9
- v0.14.8
- v0.14.7
- v0.14.6
- v0.14.5
- v0.14.4
- v0.14.3
- v0.14.2
- v0.14.1
- v0.14.0
- v0.13.0
- v0.12.0
- v0.11.0
- v0.10.0
- v0.9.0
- v0.8.0
- v0.7.0
- v0.6.0
- v0.5.1
- v0.5.0
- v0.4.0
- v0.3.0
- v0.2.0
- v0.1.10
- v0.1.9
- v0.1.8
- v0.1.7
- v0.1.6
- v0.1.5
- v0.1.4
- v0.1.3
- v0.1.2
- v0.1.1
- v0.1.0
- dev-dev
- dev-secret
- dev-php-docs
- dev-refacto
- dev-update-phpstan-rector
- dev-pcre-compliance-tests
- dev-define
- dev-yoeunes-patch-5
- dev-fix-ci
- dev-yoeunes-patch-3
- dev-yoeunes-patch-4
- dev-yoeunes-patch-2
- dev-yoeunes-patch-1
This package is auto-updated.
Last update: 2025-12-24 20:28:09 UTC
README
Treat Regular Expressions as Code.
RegexParser transforms opaque PCRE strings into a structured Abstract Syntax Tree.
It brings static analysis, security auditing, and automated refactoring to PHP's most powerful yet misunderstood tool. Stop treating regexes as magic strings; start treating them as logic.
Core Capabilities
- Deep Parsing — Full support for advanced PCRE2 syntax including subroutines, conditionals, and recursion.
- Security Auditing — Detects Catastrophic Backtracking (ReDoS) risks and vulnerabilities at analysis time.
- Documentation — Automatically generates human-readable explanations, HTML visualizations, and valid sample strings.
- Transformation — Manipulate the AST to optimize or refactor patterns programmatically.
- PCRE Compliance — Validated against official PCRE test corpus for enterprise reliability.
Choose Your Feature
Validate Regex
Quickly check if a regex is valid and get detailed error messages.
use RegexParser\Regex; $regex = Regex::create(); $result = $regex->validate('/(foo|bar)/'); if ($result->isValid) { echo "Valid regex!"; } else { echo "Error: " . $result->error; }
Explain Regex
Generate human-readable explanations of what your regex does.
use RegexParser\Regex; $regex = Regex::create(); $explanation = $regex->explain('/^([a-z]+)\.([a-z]{2,})$/'); echo $explanation; // Outputs: Start of string, then capture group containing one or more lowercase letters, // followed by a literal dot, then capture group containing 2 or more lowercase letters, end of string.
Detect ReDoS
Analyze patterns for catastrophic backtracking vulnerabilities.
use RegexParser\Regex; $regex = Regex::create(); $analysis = $regex->redos('/(a+)+b/'); if ($analysis->severity->value === 'CRITICAL') { echo "High risk of ReDoS attack!"; }
Optimize
Automatically optimize and modernize regex patterns.
use RegexParser\Regex; $regex = Regex::create(); $optimized = $regex->optimize('/[0-9]+/'); echo $optimized->original; // / [0-9]+/ echo $optimized->optimized; // /\d+/
Performance Options
For performance-critical applications, you can enable auto-possessivization which converts safe quantifiers to possessive form:
use RegexParser\Regex; $regex = Regex::create(); $optimized = $regex->optimize('/\d+a/', ['autoPossessify' => true])->optimized; echo $optimized; // /\d++a/
Note: Auto-possessivization is opt-in for safety. Use only when you understand the performance implications and ensure no backreferences depend on backtracking.
Generate Samples
Create valid test strings that match your regex.
use RegexParser\Regex; $regex = Regex::create(); $sample = $regex->generate('/[a-z]{3}\d{2}/'); echo $sample; // e.g., "abc12"
Symfony Route Requirements
Integrate with Symfony routing for automatic regex validation.
use RegexParser\Bridge\Symfony\RegexRequirementValidator; $validator = new RegexRequirementValidator(); $isValid = $validator->validate('/[a-z]+/', 'route_pattern');
PHPStan Integration
Static analysis for regex patterns in your codebase.
// In your PHPStan config
parameters:
regexParser:
enabled: true
CI Integration
Automated linting and security checks in your build pipeline.
vendor/bin/regex lint src/ --format=json
- Integration — First-class support for Symfony and PHPStan workflows.
"Think of it as
nikic/php-parser— but for regexes."
Table of Contents
Installation
composer require yoeunes/regex-parser
Requires PHP 8.2+.
Global CLI (PHAR)
Install a standalone regex binary anywhere on your PATH:
curl -Ls https://github.com/yoeunes/regex-parser/releases/latest/download/regex.phar -o ~/.local/bin/regex && chmod +x ~/.local/bin/regex
Or with wget:
wget -O ~/.local/bin/regex https://github.com/yoeunes/regex-parser/releases/latest/download/regex.phar && chmod +x ~/.local/bin/regex
Make sure ~/.local/bin is on your PATH (or use /usr/local/bin).
Build the phar locally:
bin/build
The build requires box and phar.readonly=0.
Quick Start
Validate a regex
“Is this regex even valid?”
use RegexParser\Regex; $regex = Regex::create(); // Full PCRE string: /pattern/flags $result = $regex->validate('/^[a-z0-9._%+-]+@[a-z0-9.-]+\.[a-z]{2,}$/i'); if ($result->isValid()) { echo "OK ✅\n"; } else { echo "Invalid regex: ".$result->getErrorMessage()."\n"; }
There’s also a tolerant parse mode:
$tolerant = $regex->parse('/(unclosed(', true); if ($tolerant->hasErrors()) { foreach ($tolerant->errors as $error) { echo "Error: ".$error->getMessage()."\n"; } } // You still get a partial AST: $ast = $tolerant->ast;
Explain a regex
“What does this pattern actually do?”
use RegexParser\Regex; $regex = Regex::create(); echo $regex->explain('/^(?<user>[a-z0-9_]+)\.(?<domain>[a-z.]+)$/i');
Output example (simplified):
Start of string
Named group "user":
One or more of: letters, digits or underscore
Literal "."
Named group "domain":
One or more of: letters or dots
End of string
You can also generate HTML explanations for documentation or debug UIs:
$html = $regex->explain('/(foo|bar)+\d{2,4}/', 'html');
Check ReDoS safety
“Can this regex blow up my CPU?”
use RegexParser\Regex; use RegexParser\ReDoS\ReDoSSeverity; $regex = Regex::create(); $pattern = '/^(a+)+$/'; // classic catastrophic backtracking example $analysis = $regex->redos($pattern); echo "Severity: ".$analysis->severity->value.PHP_EOL; echo "Score: ".$analysis->score.PHP_EOL; if (!$analysis->isSafe()) { echo "Hotspot: ".($analysis->vulnerablePart ?? 'unknown').PHP_EOL; foreach ($analysis->recommendations as $recommendation) { echo "- ".$recommendation.PHP_EOL; } } // Quick boolean check (for CI, input validation, etc.) $analysis = $regex->redos($pattern, ReDoSSeverity::HIGH); if (!$analysis->isSafe()) { throw new \RuntimeException('Regex is not safe enough for untrusted input.'); }
Under the hood it inspects quantifiers, nested groups, backreferences and character sets using a real AST, not just regex‑on‑regex strings.
Configuration / Options
Regex::create() accepts a small, validated option array (or a RegexOptions value object via RegexOptions::fromArray()):
max_pattern_length(int, default:Regex::DEFAULT_MAX_PATTERN_LENGTH).max_lookbehind_length(int, default:Regex::DEFAULT_MAX_LOOKBEHIND_LENGTH).cache(null| path string |RegexParser\Cache\CacheInterface).runtime_pcre_validation(bool, default:false).redos_ignored_patterns(list of strings to skip in ReDoS analysis).
Unknown or invalid keys throw RegexParser\Exception\InvalidRegexOptionException.
Advanced Usage
Parsing bare patterns vs PCRE strings
Most high‑level methods (parse, validate, redos) expect a full PCRE string:
$ast = $regex->parse('/pattern/ims');
If you only have the pattern body, you can construct the full regex string manually:
$ast = $regex->parse('#a|b#i');
If you already have just the pattern body, you can go lower‑level:
use RegexParser\Lexer; use RegexParser\Parser; $lexer = new Lexer(); $parser = new Parser(); $stream = $lexer->tokenize('a|b'); $ast = $parser->parse($stream, flags: '', delimiter: '/', patternLength: strlen('a|b'));
Working with the AST
Every parsed regex becomes a tree of node objects under RegexParser\Node\*.
Example:
use RegexParser\Regex; use RegexParser\Node\AlternationNode; use RegexParser\Node\LiteralNode; $regex = Regex::create(); $ast = $regex->parse('/foo|bar/'); $pattern = $ast->pattern; if ($pattern instanceof AlternationNode) { foreach ($pattern->branches as $branch) { foreach ($branch->children as $child) { if ($child instanceof LiteralNode) { echo "Literal: ".$child->value.PHP_EOL; } } } }
Each node exposes:
startPosition/endPosition: byte offsets in the original pattern- Node‑specific properties (e.g.
QuantifierNode::$min,$max,$type)
Writing a custom AST visitor
For experts: the “right” way to analyse patterns is to implement your own visitor.
namespace App\Regex; use RegexParser\Node\LiteralNode; use RegexParser\Node\QuantifierNode; use RegexParser\Node\RegexNode; use RegexParser\Node\SequenceNode; use RegexParser\NodeVisitor\AbstractNodeVisitor; /** * @extends AbstractNodeVisitor<int> */ final class LiteralCountVisitor extends AbstractNodeVisitor { protected function defaultReturn(): int { return 0; } public function visitRegex(RegexNode $node): int { return $node->pattern->accept($this); } public function visitLiteral(LiteralNode $node): int { return 1; } // Aggregate over sequences and groups: public function visitSequence(SequenceNode $node): int { $sum = 0; foreach ($node->children as $child) { $sum += $child->accept($this); } return $sum; } // For nodes you don't care about, just recurse or return 0 public function visitQuantifier(QuantifierNode $node): int { return $node->node->accept($this); } }
Usage:
use App\Regex\LiteralCountVisitor; use RegexParser\Regex; $regex = Regex::create(); $ast = $regex->parse('/ab(c|d)+/'); $visitor = new LiteralCountVisitor(); $count = $ast->accept($visitor); // e.g. 3
Because NodeVisitorInterface is templated, static analysers can infer the return type (int here).
Optimizing and recompiling patterns
You can round‑trip a pattern through AST → optimizer → compiler:
use RegexParser\Regex; use RegexParser\NodeVisitor\OptimizerNodeVisitor; use RegexParser\NodeVisitor\CompilerNodeVisitor; $regex = Regex::create(); $ast = $regex->parse('/(a|a)/'); $optimizer = new OptimizerNodeVisitor(); $optimizedAst = $ast->accept($optimizer); $compiler = new CompilerNodeVisitor(); $optimizedPattern = $optimizedAst->accept($compiler); echo $optimizedPattern; // e.g. '/(a)/'
This makes it easy to implement automated refactorings (via Rector) or style rules for regexes.
Auto-Modernize Legacy Patterns
Clean up messy or legacy regexes automatically:
use RegexParser\Regex; use RegexParser\NodeVisitor\CompilerNodeVisitor; use RegexParser\NodeVisitor\ModernizerNodeVisitor; $regex = Regex::create(); $ast = $regex->parse('/[0-9]+\-[a-z]+\@(?:gmail)\.com/'); $modernizedAst = $ast->accept(new ModernizerNodeVisitor()); $modern = $modernizedAst->accept(new CompilerNodeVisitor()); echo $modern; // Outputs: /\d+-[a-z]+@gmail\.com/
What it does:
- Converts
[0-9]→\d,[a-zA-Z0-9_]→\w,[\t\n\r\f\v]→\s - Removes unnecessary escaping (e.g.,
\@→@) - Modernizes backrefs (
\1→\g{1}) - Preserves exact behavior — no functional changes
Perfect for refactoring legacy codebases or cleaning up generated patterns.
Syntax Highlighting
Make complex regexes readable with automatic syntax highlighting:
use RegexParser\Regex; use RegexParser\NodeVisitor\ConsoleHighlighterVisitor; use RegexParser\NodeVisitor\HtmlHighlighterVisitor; $regex = Regex::create(); $ast = $regex->parse('/^[0-9]+(\w+)$/'); // For console output echo $ast->accept(new ConsoleHighlighterVisitor()); // Outputs: ^[0-9]+(\w+)$ with ANSI colors // For web display echo $ast->accept(new HtmlHighlighterVisitor()); // Outputs: <span class="regex-anchor">^</span>[<span class="regex-type">\d</span>]+(<span class="regex-type">\w</span>+)$
Color Scheme:
- Meta-characters (
(,),|,[,]): Blue - Structure - Quantifiers (
*,+,?,{...}): Yellow - Repetition - Escapes/Types (
\d,\w,\n): Green - Special chars - Anchors/Assertions (
^,$,\b): Magenta - Boundaries - Literals: Default - Plain text
HTML output uses <span class="regex-*"> classes for easy styling.
ReDoS Analysis
What is ReDoS?
Regular Expression Denial of Service happens when a regex engine spends exponential time on certain inputs. This is particularly bad when patterns are applied to untrusted input (HTTP, user forms, logs, etc.).
Classic examples:
/(a+)+$/onaaaaaaaaaaaaaaaa!/^(a|a?)+$/on long strings
How RegexParser detects it
Instead of guessing from the pattern string, RegexParser:
-
Parses the pattern into an AST.
-
Walks the tree with
ReDoSProfileNodeVisitor:- Tracks unbounded quantifiers (
*,+,{m,}). - Detects nested unbounded quantifiers (star‑height).
- Looks at alternations to see if branches share characters.
- Follows backreferences and subroutines.
- Takes into account atomic groups, possessive quantifiers and PCRE control verbs (which can “shield” against backtracking).
- Tracks unbounded quantifiers (
-
Aggregates the findings into a
ReDoSAnalysis:-
Overall
severity(SAFE,LOW,MEDIUM,HIGH,CRITICAL,UNKNOWN). -
A list of
vulnerabilitieswith:- message,
- severity,
- position in pattern.
-
This is static analysis — it doesn’t execute the regex — so it’s safe to run in CI.
Severity levels
From lowest to highest:
SAFE— no dangerous constructs detected.LOW— theoretical issues, but unlikely to be exploited.UNKNOWN— analysis was inconclusive due to complex constructs.MEDIUM— potentially problematic in edge cases.HIGH— clear ReDoS risk; avoid on untrusted input.CRITICAL— classic catastrophic patterns (nested+/*etc.).
redos() returns a ReDoSAnalysis with the severity, score, vulnerable substring (if any), and recommendations. ReDoSAnalysis::isSafe() returns true only for severities considered safe/low, and exceedsThreshold() lets you gate on a specific threshold.
You choose what to tolerate:
$analysis = $regex->redos($pattern, ReDoSSeverity::HIGH); if (!$analysis->isSafe()) { // block, warn, or open a ticket }
Framework & Tooling Integration
Symfony
-
Symfony bridge provides:
- A console command to scan your app’s config for dangerous regexes (
regex-parser:check). - A unified console command (
regex:lint) that can lint, analyze ReDoS risk, suggest optimizations, and validate Symfony regex patterns with options like--analyze-redos,--optimize, and--validate-symfony, or use--allto run everything. - Ability to pre‑parse and pre‑analyze patterns on deploy.
- Easy service wiring for
Regexin your DI container.
- A console command to scan your app’s config for dangerous regexes (
Example (pseudo‑code):
services: RegexParser\Regex: factory: ['RegexParser\Regex', 'create'] arguments: - { cache: '%kernel.cache_dir%/regex', max_pattern_length: 100000, max_lookbehind_length: 255 }
PHPStan
-
PHPStan extension hooks into string arguments of functions like
preg_match,preg_replace, Symfony validators, etc. -
It can:
- Validate regex syntax at analysis time.
- Optionally report ReDoS risks as PHPStan errors or warnings.
Configuration is done via the provided extension.neon, with options such as:
parameters: regexParser: ignoreParseErrors: true reportRedos: true redosThreshold: 'high' suggestOptimizations: false optimizationConfig: digits: true word: true strictRanges: true
-
Options mirror the PHPStan bridge:
ignoreParseErrors— skip likely partial regex strings (default:true).reportRedos— emit ReDoS issues (default:true).redosThreshold— minimum severity to report (low,medium,high,critical; default:high).suggestOptimizations— surface shorter equivalent patterns when found (default:false).optimizationConfig.digits— enable[0-9]→\doptimization suggestions (default:true).optimizationConfig.word— enable[a-zA-Z0-9_]→\woptimization suggestions (default:true).optimizationConfig.strictRanges— prevent merging characters from different categories (digits, letters, symbols) into single ranges for better readability (default:true).
Performance & Caching
RegexParser is designed for high‑scale applications:
- Lexer uses a single PCRE state machine with offsets, not repeated substrings.
- Parser and Lexer instances are reused across calls and properly reset.
- Optional cache (filesystem or PSR‑compatible) stores parsed ASTs and ReDoS analyses.
Example:
use RegexParser\Regex; $regex = Regex::create([ 'cache' => '/path/to/cache/dir', // or a PSR cache instance 'max_pattern_length' => 100_000, 'max_lookbehind_length' => 255, 'runtime_pcre_validation' => false, 'redos_ignored_patterns' => [ '/^([0-9]{4}-[0-9]{2}-[0-9]{2})$/', // known safe patterns ], ]);
For Symfony, you can pre‑parse and analyze all known patterns at deploy time so runtime costs are minimal.
API Overview
Regex
final readonly class Regex { public static function create(array $options = []): self; public function parse(string $regex, bool $tolerant = false): Node\RegexNode|TolerantParseResult; public function validate(string $regex): ValidationResult; public function optimize(string $regex): OptimizationResult; public function explain(string $regex, string $format = 'text'): string; public function generate(string $regex): string; public function highlight(string $regex, string $format = 'console'): string; public function literals(string $regex): LiteralExtractionResult; public function redos(string $regex, ?ReDoS\ReDoSSeverity $threshold = null): ReDoS\ReDoSAnalysis; }
Return types like ValidationResult, TolerantParseResult, OptimizationResult, LiteralExtractionResult, and ReDoS\ReDoSAnalysis are small, well‑typed value objects.
Exceptions
Regex::create()throwsInvalidRegexOptionExceptionfor unknown/invalid options.parse()can throwLexerException,SyntaxErrorException(syntax/structure),RecursionLimitException(too deep), andResourceLimitException(pattern too long).parse(..., true)wraps those errors intoTolerantParseResultinstead of throwing.validate()converts parser/lexer errors into aValidationResult(no exception on invalid input).redos()shares the same parsing exceptions asparse().
Generic runtime errors (e.g., wrong argument types) are not part of the stable API surface.
Versioning & BC Policy
RegexParser follows Semantic Versioning:
-
Stable for 1.x (API surface we commit to keep compatible):
- Public methods and signatures on
Regex. - Value objects:
ValidationResult,TolerantParseResult,OptimizationResult,LiteralExtractionResult,LiteralSet,ReDoS\ReDoSAnalysis. - Main exception interfaces/classes:
RegexParserExceptionInterface, parser/lexer exceptions,InvalidRegexOptionException. - Supported option keys for
Regex::create()/RegexOptions.
- Public methods and signatures on
-
Best-effort, may evolve within 1.x:
- AST node classes and
NodeVisitorInterface(new node types/visit methods can be added). - Built-in visitors and analysis heuristics.
- AST node classes and
If you maintain custom visitors, plan to adjust them when new nodes appear. Breaking changes beyond this policy land in 2.0.0.
Known Limitations
While this library supports a comprehensive set of PCRE2 features, some highly specific or experimental features may not be fully supported yet. For example:
- Certain Perl-specific verbs not yet standardized in PCRE2.
- Advanced Unicode features beyond basic properties and escapes.
- Experimental or platform-specific extensions.
If you encounter an unsupported feature, please open an issue with a test case.
Support the Project
If RegexParser saves you time, you can help keep it moving:
- Star the repository on GitHub
- Share it with your team or community
- Report issues or suggest features
- Contribute code or documentation
- Sponsor the work or hire me for consulting 🤝
Contributing
Contributions are welcome! Areas where help is especially useful:
- New optimizations for the optimizer visitor.
- Additional ReDoS heuristics and exploit‑string generation.
- IDE integrations (PHPStorm plugin, etc.).
- More bridges (Laravel, Laminas, …).
Please run the full test suite before submitting a PR.
License
This library is released under the MIT License.
Further Reading
Made with ❤️ by Younes ENNAJI