coffee-r/tekagami-php

PHP library for observing HTTP request behavior and recording as JSONL for specification investigation

Maintainers

Package info

github.com/coffee-r/tekagami-php

pkg:composer/coffee-r/tekagami-php

Statistics

Installs: 0

Dependents: 0

Suggesters: 0

Stars: 0

Open Issues: 0

v0.1.0 2026-06-02 09:39 UTC

This package is auto-updated.

Last update: 2026-06-02 10:50:57 UTC


README

Note

This package is experimental.

coffee-r/tekagami-php is the PHP observer for tekagami. It records what a running PHP web application actually did as one tekagami-v1 JSONL record per HTTP request.

The observer emits observed facts: HTTP input/output shapes, SQL timeline, custom events, write effects, and capture errors. It does not infer business rules, name specifications, or measure performance as an APM.

For the language-neutral JSONL contract, see contracts/schema. For deterministic summary, report, export, and diff commands, see tools/tekagami-data.

Installation

composer require coffee-r/tekagami-php

Runtime requirement: PHP 7.0 or later.

Development tests require PHP 7.3 or later because PHPUnit ^9.5 is used.

Basic Usage

Create one collector per HTTP request, call start() near request entry, add observed events during the request, then call finish() near response output.

use CoffeeR\Tekagami\Collector;
use CoffeeR\Tekagami\Config;
use CoffeeR\Tekagami\Flow;
use CoffeeR\Tekagami\Http\HttpInput;
use CoffeeR\Tekagami\Http\HttpResponse;
use CoffeeR\Tekagami\Sink\JsonlSink;
use CoffeeR\Tekagami\Sql\SqliteSqlAnalyzer;

$collector = new Collector(
    new Config(getenv('TEKAGAMI_SECRET') ?: null),
    new JsonlSink('/var/log/tekagami/your-app.production.jsonl'),
    new SqliteSqlAnalyzer()
);

$http = new HttpInput($request->method(), $request->path());
$http->pathPattern       = '/products/{id}';
$http->queryRaw          = $request->query();
$http->requestRaw        = $requestBody;
$http->requestHeadersRaw = $requestHeaders;

$collector->start($http);

// Use Flow only when a development or QA scenario needs explicit correlation.
// $collector->start($http, new Flow('qa-order-cod-001', 1));

// Preferred: placeholder SQL plus binds.
$collector->addSql($sql, $binds, ['source' => 'db-wrapper']);

// Fallback: expanded SQL from last_query() or query history.
$collector->addExpandedSql($db->last_query(), ['source' => 'ci3-last_query']);

// Stable labels can act as observation anchors for important branches.
$collector->addCustom('purchase_limit_rejected', ['reason' => 'one_time']);

$response = new HttpResponse();
$response->status             = http_response_code();
$response->responseKind       = 'json';
$response->contentType        = 'application/json';
$response->responseHeadersRaw = $responseHeaders;
$response->responseBodyRaw    = $responseData;

$collector->finish($response);

If no Flow is passed, flow.flow_id and flow.seq are recorded as null. This is the normal mode. Flow is for explicit investigation scenarios, not for inferring business workflows from production sessions.

Public API

Collector implements CollectorInterface:

Method Purpose
start(HttpInput $http, $flow = null) Starts one request trace.
getActiveTraceId() Returns the active trace id, or null.
addSql($sql, array $binds = [], array $options = []) Records placeholder SQL plus binds. Preferred when available.
addExpandedSql($sql, array $options = []) Records already-expanded SQL. Lower confidence because binds are lost.
addCustom($label, $data = null) Records a custom observation with data_shape.
addError($type, $message = null, $at = null) Records an observed application or capture error.
finish(HttpResponse $response) Builds and writes the JSONL record, then resets the collector.

All methods are designed not to throw into the observed application. If sink writing fails, tekagami writes to PHP error_log() and lets the application continue.

Configuration

Constructor:

new Config($secret = null, array $options = [])

$secret is the HMAC-SHA256 shared secret. Other settings are passed through $options.

Option Type Default Description
secret `string null` null
keepKeys array [] Case-insensitive allowlist of query/body keys whose raw values may be recorded in *_values.
keepHeaderKeys array [] Case-insensitive allowlist of request headers to record as shape/token. Empty means no request header presence is recorded.
keepResponseHeaderKeys array [] Case-insensitive allowlist of response headers to record as shape/token. Empty means no response header presence is recorded.
sqlValueAllowlist array [] SQL value allowlist, using table.column or column, for observed_values.
enabled bool true When false, collector methods return immediately. Lower cost than NullSink.
captureText bool false Stores raw SQL in statement_text. Plaintext and not recommended for production.
captureEffects bool true Emits write summaries in effects[].
tokenHmacLength int 12 Hex length of generated HMAC token fragments.
maxDepth int 10 Maximum recursion depth for shape generation.
maxShapeNodes int 10000 Maximum visited nodes during shape generation.
maxTimelineSize `int null` 500

Value Safety

Raw values are not recorded by default. Shapes preserve structure and scalar types. HMAC tokens, when secret is set, allow correlation of equal values without storing the original value.

Only values explicitly allowed by keepKeys or sqlValueAllowlist are stored as plaintext. Do not allowlist passwords, tokens, email addresses, phone numbers, addresses, postal codes, authentication data, payment data, customer ids, order ids, or other values strongly tied to people or transactions. Use tokens for correlation-only use cases.

Request and response headers are stricter: only headers named in keepHeaderKeys or keepResponseHeaderKeys are recorded, and header raw values are never stored. Authorization, Cookie, Set-Cookie, Location, and X-Api-Key are sensitive even as presence information, so keep them out unless there is a specific investigation reason.

Avoid passing raw exception messages to addError() in production. They can contain DSNs, SQL, file paths, or user data. Prefer fixed error types or sanitized messages.

SQL Capture

Use the highest-quality SQL source available.

API Input analysis.input_quality Typical source
addSql($sql, $binds, ['source' => '...']) Placeholder SQL plus binds bound_sql PDO wrapper, DB middleware, Laravel DB::listen, Doctrine middleware, patched CI3 DB driver
addExpandedSql($sql, ['source' => '...']) Expanded SQL expanded_sql CI3 last_query() or query history when binds are unavailable

addExpandedSql() records warnings such as expanded_sql_may_fragment_statement_hash. It is useful when there is no better source, but prefer addSql() for migration comparison because statement hashes are more stable.

sqlValueAllowlist extraction is best-effort and regex based. It handles common INSERT ... VALUES (...), UPDATE ... SET ..., and WHERE col = value cases, but may miss values inside functions, strings with commas, complex subqueries, or dialect-specific literals. Missing extraction does not fail the observation; statement_normalized, statement_hash, and statement_fingerprint remain the primary evidence.

SQL Dialect

The SQL analyzer is required as the third Collector constructor argument. Bundled analyzers:

  • CoffeeR\Tekagami\Sql\OracleSqlAnalyzer
  • CoffeeR\Tekagami\Sql\SqliteSqlAnalyzer
$collector = new Collector(
    $config,
    $sink,
    new CoffeeR\Tekagami\Sql\OracleSqlAnalyzer()
);

The selected dialect is recorded on each SQL event as analysis.dialect.

For PostgreSQL, MySQL, SQL Server, or another database, implement SqlAnalyzerInterface or extend AbstractSqlAnalyzer, then inject that implementation into Collector.

Custom Events

addCustom() is for observations that SQL alone cannot explain: cache access, external HTTP calls, queue pushes, file writes, or important application branches.

The data_shape is not part of behavior pattern signatures. If a branch must be compared as a distinct behavior, encode the stable distinction in the label, for example payment_method_rejected_cod rather than only payment_method_rejected with a data value.

Log Volume

Primary controls:

  • maxTimelineSize: prevents one request from producing an unbounded timeline.
  • maxShapeNodes: bounds very large JSON or view variable shapes.
  • OS-level log rotation: keeps JSONL files manageable over time.

When a limit is reached, tekagami truncates only its own observation and records a capture_failure in errors[]. The application request continues.

Data Tools

This package only writes tekagami-v1 JSONL. Process those logs from the repository root with the language-neutral data tools:

php tools/tekagami-data/bin/tekagami summary trace.jsonl
php tools/tekagami-data/bin/tekagami report trace.jsonl
php tools/tekagami-data/bin/tekagami export trace.jsonl > trace.export.json
php tools/tekagami-data/bin/tekagami diff legacy.export.json target.export.json

See tools/tekagami-data/README.md for CLI options and behavior.

Schema

Output records must conform to contracts/schema/tekagami-v1.schema.json. Tests in this package include schema conformance checks for fixtures and collector output.

Running Tests

cd observer-php
composer install
./vendor/bin/phpunit

License

MIT License - Copyright 2026 coffee-r