krak / lex
Functional Lexing library
Installs: 7 399
Dependents: 2
Suggesters: 0
Security: 0
Stars: 1
Watchers: 2
Forks: 0
Open Issues: 0
Requires (Dev)
- krak/fn: ^1.2
- peridot-php/peridot: ^1.18
This package is auto-updated.
Last update: 2024-10-29 04:53:00 UTC
README
Lex is a library for lexical analysis in PHP. Currently, only simple regular expression lexers are available; but considering that you shouldn't be lexing anything complex in php, this should be fine :).
Installation
composer require krak/lex
Usage
<?php require_once __DIR__ . '/vendor/autoload.php'; use function Krak\Lex\lexer, Krak\Lex\skipLexer; const TOK_INT = 'int'; const TOK_PLUS = 'plus'; const TOK_MINUS = 'minus'; const TOK_WS = 'whitespace'; // creates a lexer that will use these RE's to match input // the A (anchor flag) is required $lex = lexer([ '/\d+/A' => TOK_INT, '/\+/A' => TOK_PLUS, '/\-/A' => TOK_MINUS, '/\s+/A' => TOK_WS ]); // decorator for skipping tokens, in this case, just throw away the whitespace tokens $lex = skipLexer($lex, [TOK_WS]); // lex the input and return an iterator of tokens $toks = $lex('1 + 2 - 3'); foreach ($toks as $matched_tok) { printf( "Matched token '%s' with input '%s' at offset %d\n", $matched_tok->token, $matched_tok->match, $matched_tok->offset ); }
The following program would output
Matched token 'int' with input '1' at offset 0
Matched token 'plus' with input '+' at offset 2
Matched token 'int' with input '2' at offset 4
Matched token 'minus' with input '-' at offset 6
Matched token 'int' with input '3' at offset 8
TokenStream
A token stream is a simple interface for consuming one token at a time. This is very useful for Recursive Decent Parsers
<?php use function Krak\Lex\lexer, Krak\Lex\tokenStreamLexer; $lex = lexer(['/a/A' => 'a', '/b/A' => 'b']); $lex = tokenStreamLexer($lex); $stream = $lex('aba'); assert($stream->peek() == 'a'); assert($stream->getToken() == 'a'); assert($stream->getToken() == 'b'); assert($stream->getToken() == 'a'); assert($stream->isEmpty());
API
Lexers
Each lexer will accept a string input and then return an iterable of MatchedToken
lexer($token_map, $throw = true)
Main lexer which lexes the strings based off of the $token_map
. $throw
determines whether or not the lexer should throw an exception on unrecognized input.
skipLexer($lex, $tokens)
Lexer decorator which will skip any Matched Tokens in the set of the $tokens
passed in.
tokenStreamLexer($lex)
Lexer decorator that will convert the output of the $lex
into a TokenStream
mockLexer($tokens)
Returns $tokens
as is.
class MatchedToken
$match
Returns the text that was matched.
$token
Returns the token name that was matched.
$offset
Returns the string offset at which the match started.
interface TokenStream extends \IteratorAggregate
getToken()
Returns the current token and advances the internal pointer up by one.
peek()
Returns the current token but does not advance the internal pointer
isEmpty()
returns true if the token stream is empty, false if not.