loupe / matcher
Highlight and crop around search terms
Fund package maintenance!
Toflar
Installs: 13 222
Dependents: 2
Suggesters: 0
Security: 0
Stars: 4
Watchers: 1
Forks: 0
Open Issues: 0
Requires
- php: ^8.1
- ext-intl: *
Requires (Dev)
- phpstan/phpstan: ^2.0
- phpunit/phpunit: ^10.5
- symfony/finder: ^6.2
- symplify/easy-coding-standard: ^12.5
This package is auto-updated.
Last update: 2025-07-14 11:00:04 UTC
README
A PHP library for search term highlighting and text snippet generation. Transform search results into user-friendly formatted text with highlighted matches and contextual cropping.
Lorem ipsum dolor sit amet, consetetur [...] no sea takimata sanctus est lorem est ipsum dolor sit amet. Lorem ipsum dolor sit amet, consetetur [...] dolore te feugait nulla facilisi lorem ipsum dolor sit amet, consectetuer [...]
Caution
Work in progress. Expect frequent changes to API and functionality.
Installation
composer require loupe/matcher
Quick Start
Here's a simple example of how to use Loupe Matcher to highlight search terms in a text document and crop around the highlights:
use Loupe\Matcher\Tokenizer\Tokenizer; use Loupe\Matcher\Matcher; use Loupe\Matcher\Formatter; use Loupe\Matcher\FormatterOptions; $tokenizer = new Tokenizer(); $matcher = new Matcher($tokenizer); $formatter = new Formatter($matcher); $options = (new FormatterOptions()) ->withEnableHighlight() ->withEnableCrop() ->withCropLength(10); $result = $formatter->format( 'This is a long document with many words to search through and compare.', 'search words', $options ); // "...with many <em>words</em> to <em>search</em> through..." echo $result->getFormattedText();
Core Components
Tokenizer
Purpose: Breaks text into searchable tokens (words, phrases, terms) for accurate matching.
The Tokenizer
converts strings into TokenCollection
objects, handling:
- Word boundaries using
ext-intl
rules - Phrase groups (quoted terms like
"exact phrase"
) - Negated terms (prefixed with
-
) - Locale-specific tokenization
$tokenizer = new Tokenizer('en_US'); // Optional locale $tokens = $tokenizer->tokenize('search for "exact phrase" -exclude'); $tokens->all(); // All tokens $tokens->phraseGroups(); // Quoted phrases only $tokens->allNegated(); // Terms to exclude
Matcher
Purpose: Finds which tokens in your text match the search query.
The Matcher
compares tokenized text against search terms, with support for:
- Stop word filtering (ignore common words like "the", "and")
- Match span calculation (start/end positions)
- Flexible matching between token collections
$matcher = new Matcher($tokenizer, ['the', 'and', 'or']); // Stop words $matches = $matcher->calculateMatches('Text to search', 'search query'); // Get position information for highlighting $spans = $matcher->calculateMatchSpans('Text to search', 'query', $matches); foreach ($spans as $span) { echo "Match at position {$span->getStartPosition()}-{$span->getEndPosition()}"; }
Formatter
Purpose: Combines matching and highlighting to create formatted output with context.
The Formatter
orchestrates the entire process:
- Highlights matched terms with HTML tags
- Crops text to show relevant context around matches
- Configurable through
FormatterOptions
$formatter = new Formatter($matcher); $options = (new FormatterOptions()) ->withEnableHighlight() ->withHighlightStartTag('<mark>') ->withHighlightEndTag('</mark>') ->withEnableCrop() ->withCropLength(150) ->withCropMarker('...'); $result = $formatter->format($text, $query, $options); echo $result->getFormattedText();
Advanced Usage
Custom Tokenizer
Implement TokenizerInterface
for specialized tokenization:
class CustomTokenizer implements TokenizerInterface { public function tokenize(string $text): TokenCollection { // Your custom tokenization logic } public function matches(Token $token, TokenCollection $tokens): bool { // Your custom logic for checking if a token is a match } }
Pre-highlighted Text Cropping
When you already have highlighted text that needs cropping:
$cropper = new \Loupe\Matcher\Formatting\Cropper( cropLength: 50, cropMarker: '…', highlightStartTag: '<em>', highlightEndTag: '</em>' ); // "...text with <em>highlighted</em> terms." echo $cropper->cropHighlightedText('Long text with <em>highlighted</em> terms.');
Using Pre-calculated Matches
When you already have a TokenCollection
of matches (e.g., from a previous search operation or external source), you can format text directly without re-calculating matches. This approach is useful when your search engine already provides match information or you want to cache match results for performance.
// Assume you already have matches from somewhere else $existingMatches = new TokenCollection(/* ... */); // Set up the tokenizer, matcher, and formatter as usual $tokenizer = new Tokenizer(); $matcher = new Matcher($tokenizer); $formatter = new Formatter($matcher); $options = (new FormatterOptions()) ->withEnableHighlight() ->withEnableCrop() ->withCropLength(100); // Format using the existing matches - no duplicate processing $result = $formatter->format($text, $query, $options, matches: $existingMatches); echo $result->getFormattedText();