README

Php library that converts search queries into words, phrases, hashtags, mentions, etc.

This library supports a simple search query standard. It is meant to support the most common search combinations that a user would likely enter into your website search box or dashboard application. It intentionally limits the more complex nested capabilities that you might expect from SQL builders, Lucene, etc.

Tokenizer

Tokens are split on whitespace unless enclosed in double quotes. The following tokens are extracted by the Tokenizer:

class Token implements \JsonSerializable
{
    const T_EOI              = 0;  // end of input
    const T_WHITE_SPACE      = 1;
    const T_IGNORED          = 2;  // an ignored token, e.g. #, !, etc.  when found by themselves, don't do anything with them.
    const T_NUMBER           = 3;  // 10, 0.8, .64, 6.022e23
    const T_REQUIRED         = 4;  // '+'
    const T_PROHIBITED       = 5;  // '-'
    const T_GREATER_THAN     = 6;  // '>'
    const T_LESS_THAN        = 7;  // '<'
    const T_EQUALS           = 8;  // '='
    const T_FUZZY            = 9;  // '~'
    const T_BOOST            = 10; // '^'
    const T_RANGE_INCL_START = 11; // '['
    const T_RANGE_INCL_END   = 12; // ']'
    const T_RANGE_EXCL_START = 13; // '{'
    const T_RANGE_EXCL_END   = 14; // '}'
    const T_SUBQUERY_START   = 15; // '('
    const T_SUBQUERY_END     = 16; // ')'
    const T_WILDCARD         = 17; // '*'
    const T_AND              = 18; // 'AND' or '&&'
    const T_OR               = 19; // 'OR' or '||'
    const T_TO               = 20; // 'TO' or '..'
    const T_WORD             = 21;
    const T_FIELD_START      = 22; // The "field:" portion of "field:value".
    const T_FIELD_END        = 23; // when a field lexeme ends, i.e. "field:value". This token has no value.
    const T_PHRASE           = 24; // Phrase (one or more quoted words)
    const T_URL              = 25; // a valid url
    const T_DATE             = 26; // date in the format YYYY-MM-DD
    const T_HASHTAG          = 27; // #hashtag
    const T_MENTION          = 28; // @mention
    const T_EMOTICON         = 29; // see https://en.wikipedia.org/wiki/Emoticon
    const T_EMOJI            = 30; // see https://en.wikipedia.org/wiki/Emoji

The T_WHITE_SPACE and T_IGNORED tokens are removed before the output is returned by the scan process.

QueryParser

The default query parser produces a ParsedQuery object which can be used with a builder to produce a query for a given search service.

Basic Usage

<?php

use Gdbots\QueryParser\QueryParser;
use Gdbots\QueryParser\Builder\XmlQueryBuilder;

$parser  = new QueryParser();
$builder = (new XmlQueryBuilder())->setHashtagFieldName('tags');

$result = $parser->parse('hello^5 planet:earth +date:2015-12-25 #omg');
echo $builder->addParsedQuery($result)->toXmlString();

Produces the following xml:

<?xml version="1.0"?>
<query>
  <word boost="5" rule="should_match">hello</word>
  <field name="planet">
    <word rule="should_match_term">earth</word>
  </field>
  <field name="date" bool_operator="required" cacheable="true">
    <date rule="must_match_term">2015-12-25</date>
  </field>
  <field name="tags" bool_operator="required" cacheable="true">
    <hashtag rule="must_match_term">omg</hashtag>
  </field>
</query>

To get a list of Node objects by type, use:

<?php

use Gdbots\QueryParser\Node\Hashtag;

$result = $parser->parse('#hashtag1 AND #hashtag2');
$hashtags = $result->getNodesOfType(Hashtag::NODE_TYPE);

gdbots / query-parser

Maintainers

Package info

Statistics

Security

README

Tokenizer

QueryParser

Basic Usage