hellopablo/related-content

A framework for analysing objects in your application and finding related content.

0.8.0 2023-04-27 14:28 UTC

This package is auto-updated.

Last update: 2024-10-27 17:29:40 UTC


README

This package is simple PHP framework for analysing objects in your application and an API for finding related content. Bring-your-own analysers extract relation nodes from an object which can be used to find other objects which have similar nodes.

For example, you might have Article and Blog objects which both have categories, this framework would allow you to easily define an analyser for both data types which extract category nodes, allowing you to find similarly categorised items.

Table of contents

Installing

Install using Composer:

composer require hellopablo/related-content

All classes within the package are under the namespace HelloPablo\RelatedContent and can be autoloaded using PSR-4.

Analysers

A key concept in this framework is that of Analysers. Analysers are classes which implement the Interfaces\Analyser interface. Their purpose is to examine your objects and extract data points (nodes) which can be used for indexing.

It is expected that each distinct type of data in your application has its own analyser, even if they share a similar structure. Internally the enigne uses the analyser as a key to differentiate between data types.

🧑‍💻 It is important to understand that an analyser is something that is provided by your application. The framework has no opinions on what your data looks like, nor does it infer any relationships – it is up to you to extract relationship data.

Relationship Nodes

A relationship node is a single data point which describes a part of the object which is being indexed. Nodes implement the Interfaces\Relation interface and should define a type and a value. Typically, the type is an application supplied string which describes the node (e.g. CATEGORY, TOPIC, or AUTHOR) and the value is an ID or other identifier of the type.

🙋 The framework provides a Relation\Node class which you can use in your application's analysers.

Example

For an object which looks like this:

{
    "id": 1,
    "label": "My Article",
    "body": "Nullam id dolor id nibh ultricies ... auctor fringilla.",
    "categories": [1, 2, 3],
    "topics": [5, 6, 7]
}

We might want to describe the categories and topics IDs as relationship nodes. In this example, the analyser would look like this:

namespace App\RelatedContent\Analysers;

use HelloPablo\RelatedContent\Interfaces;
use HelloPablo\RelatedContent\Relation;

class Article implements Interfaces\Analsyer
{
    /**
     * Analyses an item and returns an array of Interfaces\Relation
     *
     * @param object $item The item to analyse
     *
     * @return Relation\Node[]
     */
    public function analyse(object $item): array
    {
        $nodes = [];

        foreach ($item->categories as $categoryId) {
            $nodes[] = new Relation\Node('CATEGORY', $categoryId);
        }

        foreach ($item->topics as $topicId) {
            $nodes[] = new Relation\Node('TOPIC', $topicId);
        }

        return $nodes;
    }

    /**
     * Returns the item's unique identifier
     *
     * @param object $item
     *
     * @return int
     */
    public function getId(object $item)
    {
      return $item->id;
    }
}

Other analysers (say, for a blog post) might also return CATEGORY and TOPIC nodes, too. It's the overlap of the node types and values which are considered to be relations.

The Engine

The Engine is how your application will [mostly] interact with the related content store. It facilitates reading and writing from the data store, as well as providing an interface for indexing your content using analysers.

A new instance of the engine can be retrieved via the Factory class, you must pass an instance of the data store you wish to use.

An example using the MySQL data store:

use HelloPablo\RelatedContent;

$store = new RelatedContent\Store\MySQL([
    'user'     => 'mysql_user',
    'password' => 'mysql_password',
    'database' => 'mysql_database',
]);

/** @var Engine $engine */
$engine = RelatedContent\Factory::build($store);

Indexing

Indexing is the process of analysing an object and saving its relationship nodes to the data store. This can be achieved using the engine's index(object $item, Interfaces\Analyser $analyser): self method.

use App\RelatedContent\Analysers;

/**
 * The item we wish to index. This would typically be retrieved using a
 * model, an ORM, or some other app-orientated struct.
 */
$item = (object) [
    'id'         => 1,
    'label'      => 'My Article',
    'body'       => 'Nullam id dolor id nibh ultricies ... auctor fringilla.',
    'categories' => [1, 2, 3],
    'topics'     => [5, 6, 7]
    
];

$analyser = new Analysers\Article();

$engine->index($item, $analyser);

💁 Indexing an item will delete all previously held data for that item.

Querying

Querying is the core functionality of this library – by providing a source item (and its analyser) the engine can find matching items and return hits, sorted by score (most related first). Your application can use these results to then fetch the source items and do further logic.

Querying is facilitated by the engine's query(): Query\Hit[] method. This method accepts four arguments:

  1. The source item, i.e the item we want to find related content for.
  2. The source item's analyser, i.e. the analyser used to index the item.
  3. Any data types to restrict results to, passed as an array of analysers, i.e we only want to see related Blog results for our source Article.
  4. The number of results to return.

The query method is best explained using an example. Assuming this system stores relational data about two datatypes: Article and Blog, we might have the following analysers:

use App\RelatedContent\Analysers;

$articleAnalyser = new Analysers\Article();
$blogAnalyser    = new Analysers\Blog();

Each of these analysers extracts category data for their respective data-types. With our source Article item to hand, we can find related Article and Blog content like so:

/**
 * The item which was indexed; at minimum you need the analyser
 * to be able to acertain the item's ID.
 */
$articleItem = (object) [
    'id' => 1,
];

$results = $engine->query(
    $articleItem,
    $articleAnalyser
);

The $results array will be an array of Query\Hit objects. These objects provide three methods:

  1. getAnalyser(): Interfaces\Analyser which will return an instance of the analyser used to index the item.
  2. getId(): string|int which will return the indexed item's ID.
  3. getScore(): int which will return the score of the hit.

If we wanted to restrict the reuslts to contian just a certain data type(s) then we would pass in an array of analyser instances we'd like to restrict to as the third argument, additionally, we can limit the number of results by passing an integer as the fourth argument:

/**
 * The item which was indexed; at minimum you need the analyser
 * to be able to acertain the item's ID.
 */
$articleItem = (object) [
    'id' => 1,
];

/**
 * Return up to 3 related blog posts for our article.
 */
$results = $engine->query(
    $articleItem,
    $articleAnalyser,
    [
        $blogAnalyser
    ],
    3
);

Reading

You can read all the data stored about a particular item using the engine's read(object $item, Interfaces\Analyser $analyser): Interfaces\Relation[] method:

use App\RelatedContent\Analysers;

/**
 * The item which was indexed; at minimum you need the analyser
 * to be able to acertain the item's ID.
 */
$item = (object) [
    'id' => 1,
];

$analyser = new Analysers\Article();

$relations = $engine->read($item, $analyser);

Deleting

To delete all data held about an item, use the engine's delete(object $item, Interfaces\Analyser $analyser): self method:

use App\RelatedContent\Analysers;

/**
 * The item which was indexed; at minimum you need the analyser
 * to be able to acertain the item's ID.
 */
$item = (object) [
    'id' => 1,
];

$analyser = new Analysers\Article();

$relations = $engine->delete($item, $analyser);

Dump

If you need to dump the entire contents of the data store, you may use the engine's dump(): array method. It will return an array of all relations stored in the data store.

$items = $engine->dump();

Empty

To destrictively empty the data store you may use the engine's empty(): self method. This cannot be undone.

$engine->empty();

Data Stores

The data store is where the engine keeps its index data. You are free to use any data store you like; all adapters must implement the Interfaces\Store interface. The package provides two stores by default:

Ephemeral

Store\Ephemeral(array $config = [])

The Ephemeral store is an in-memory store. It uses a class array and any data is not intended to be persisted beyond the life-span of the class instance (except, of course, if you manually serialize the object).

It is primarily used for the test suite, but is available as a first-class citizen should you have a need. An example use case for this store might be for a long-running script which builds up a relationship model and caches the results, the actual data store does not need to be persisted.

Configuration

This store accepts the following keys in the configuration array:

MySQL

Store\MySQL(array $config = [])

This MySQL store allows you to use a MySQL table as the persistent data store for the relationship data. It utilises the \PDO class provided by PHP.

Configuration

This store accepts the following keys in the configuration array: