tivins / llm-php
PHP library for local LLMs
Requires
- ext-curl: *
Requires (Dev)
- phpstan/phpstan: ^2.1
- phpunit/phpunit: ^11.5
README
Language: English
llm-php (tivins/llm-php)
Version: 1.21.1 (see composer.json; release history in CHANGELOG.md).
PHP client library for an OpenAI-compatible HTTP API—typically llama.cpp llama-server—covering POST /v1/chat/completions (non-stream and SSE stream), plus /health, /tokenize, and model discovery via GET /v1/models.
In this README: Why not only chat() · Module map · Examples · Environment variables · Tests · Conversation logging and modern message fields · JSONL audit logs · Console output · Pitfalls · Install
Installation
Package
composer require tivins/llm-php
Requires: ext-curl.
llama.cpp server
- Install: llama.cpp install docs
- Server API: server README
Models
Use a GGUF checkpoint sized for your GPU/RAM; exact limits depend on quantization and context. The library does not download models—it only talks to a running server.
If you have a GPU with 8–16GB of VRAM, you can try these models:
- https://huggingface.co/bartowski/google_gemma-4-E4B-it-GGUF (Q5_K_M or Q6_K)
- https://huggingface.co/bartowski/gemma-2-9b-it-GGUF (Q4_K_M)
- https://huggingface.co/bartowski/Qwen2.5-7B-Instruct-GGUF (Q4_K_M)
Quick start
Prerequisite: start llama-server and keep it running so the PHP client can reach its OpenAI-compatible HTTP API. Example (adjust the GGUF path and port):
llama-server -m ./model.gguf --port 8080 --no-webui -lv 0
The snippet below assumes the default listen address http://127.0.0.1:8080.
From the repository root after composer install:
use Tivins\Llama\BehaviorPrompts; use Tivins\Llama\Conversation; use Tivins\Llama\Lama; use Tivins\Llama\Message; use Tivins\Llama\Role; require __DIR__ . '/vendor/autoload.php'; $lama = Lama::fromServerUrl('http://127.0.0.1:8080'); $conversation = new Conversation(); $conversation->addMessage(new Message(Role::System, BehaviorPrompts::HELPFUL)); $conversation->addMessage(new Message(Role::User, 'Hello.')); $text = $lama->chat($conversation);
Lama::fromServerUrl() picks the first model id from /v1/models. Check $lama->isHealthy() returns true before relying on the server.
Sampling and generation knobs (temperature, top_p, max_tokens, penalties, seed, stop, n, tools, tool_choice) go through Tivins\Llama\ChatCompletionOptions: only properties you set are merged into the JSON body; omitted keys leave server defaults.
API surface: chat() vs chatCompletions() vs chatStream()
| Method | Input → output | Preserved | Lost or narrowed |
|---|---|---|---|
Tivins\Llama\Lama::chat() |
Conversation + optional ChatCompletionOptions → string |
Final assistant content from choices[0].message (empty string if missing) |
tool_calls, native reasoning_content, usage, additional choices, full wire JSON |
Tivins\Llama\Lama::chatCompletions() |
Same → decoded JSON array (choices, usage, …) |
Everything the server returns in that response body | Nothing by the library—you inspect the array |
Tivins\Llama\Lama::chatStream() |
Same + stream callbacks + optional Tivins\Llama\SsePayloadCapture → Tivins\Llama\StreamResult |
Aggregated content, reasoning_content (when streamed), reconstructed tool_calls, finish_reason; usage, model, id when present on stream chunks (otherwise null) |
Per-chunk raw JSON unless you capture SSE payloads; shape of usage is backend-specific |
Fidelity path: use chatCompletions() and/or chatStream(), then map results into your own structures or into Tivins\Llama\Dto\NormalizedTurnOutcome (fromChatCompletionArray() / fromStreamResult()) for a single aggregate shape across modes.
Shortcut: chat() is documented in code as a thin wrapper around chatCompletions(); it is fine for plain text, not for tools or native reasoning traces.
Module map (src/Tivins/Llama/)
Client
Tivins\Llama\Lama— HTTP client:fromServerUrl(),isHealthy()/getHealth(),tokenize(),chat(),chatCompletions(),chatStream().
Conversation model
Tivins\Llama\Conversation— orderedMessagelist;toChatCompletionMessages()builds OpenAI-stylemessages(assistanttool_calls,toolrole +tool_call_id, optional assistantreasoning_content).Tivins\Llama\Message—Role,content, optionaltoolCallId/nameforRole::Tool, optionaltoolCalls(assistant), optionalreasoningContent(native reasoning);normalizeReasoningContent()treatsnulland''as absent for JSON.Tivins\Llama\Role—system,user,assistant,tool.
Request options
Tivins\Llama\ChatCompletionOptions— OpenAI-shaped optional fields merged into the chat-completions body (see class docblock for semantics and local-server caveats).
Tools
Tivins\Llama\ChatFunctionTool— build onetools[]entry (toToolArray(),toToolArrays()).Tivins\Llama\ToolCallingLoop— multi-round loop overchatCompletions(): executes tools, appendsRole::Toolmessages, copiesreasoning_contentwhen present; final assistant turn has notool_callswhen idle; throws if max rounds exhausted with pending tools.Tivins\Llama\StreamingToolCallingLoop— same orchestration overchatStream(); optionalonAssistantStreamRound(StreamResult, RawStreamTrace, int)for logging (SSE capture when callback is used).Tivins\Llama\PredefinedTools— ready-made tools (search, fetch, filesystem helpers,apply_diff, git helpers, etc.) with executors suited to examples; seeexamples/and class docblock (includes TLS-related environment variables). Paths are not confined to a project root by default.Tivins\Llama\WorkspacePaths/Tivins\Llama\WorkspaceToolExecutor— optional sandbox: resolvefile_path,path, andworking_directoryunder a workspace directory before callingPredefinedTools::runTool(). PassWorkspaceToolExecutor::asCallable()toToolCallingLoopasexecuteTool. Seeexamples/workspace_tools_demo.phpandexamples/workspace_escape_demo.php(blocked access outside the root).
DTOs and audit
Tivins\Llama\Dto\TurnRecord— one logical turn for JSONL (forCompletion/forStream,toLogArray()).Tivins\Llama\Dto\RawChatCompletionResponse— wraps non-stream completion JSON.Tivins\Llama\Dto\RawStreamTrace—events(list ofStreamEvent) plus optionalrawDataLines(verbatim SSE JSON strings).Tivins\Llama\Dto\StreamEvent/Tivins\Llama\Dto\StreamEventKind— structured stream replay types (fine-grained event lists may be empty depending on capture path).Tivins\Llama\Dto\NormalizedTurnOutcome— normalized assistant fields from completion JSON orStreamResult.Tivins\Llama\SsePayloadCapture— mutable bag of SSE JSON payload strings forRawStreamTrace::$rawDataLines.Tivins\Llama\TurnJsonlLogger— append one JSON line perTurnRecord.
Streaming aggregation
Tivins\Llama\ChatStreamAccumulator— parsesdata: {...}SSE lines intoStreamResult(shared byLama::chatStream()and tests/fixtures).Tivins\Llama\StreamResult—content,finishReason,toolCalls,reasoningContent, optionalusage,model,id.
Console rendering
Tivins\Llama\RenderOptions— ANSI, stdout/stderr injectable, reasoning channel.Tivins\Llama\HumanTurnRenderer— renderNormalizedTurnOutcome,TurnRecord, or raw completion payload (renderCompletionPayload()).Tivins\Llama\HumanTurnStreamDisplay— stream-friendly callbacks aligned withLama::chatStream()/StreamingToolCallingLoop.
Higher-level helpers (optional)
Tivins\Llama\ThinkingChat/ThinkingPrompts/ThinkingTurnResult— two HTTP rounds (reasoning prompt then answer); not the same as a single completion’s nativereasoning_content(see class docblock).Tivins\Llama\BehaviorPrompts— ready-made system prompt strings.Tivins\Llama\Translator— translation helper built onLama::chat()with optional FIFO cache.
Examples
Location: examples/. Scripts use:
require __DIR__ . '/../vendor/autoload.php';
Run from the repository root (so vendor/ resolves), with llama-server listening where the script expects (many examples use http://127.0.0.1:8080):
composer install php examples/chat.php php examples/chat_web_lookup.php php examples/completions.php php examples/tokenize.php php examples/tools_chain.php php examples/stream_tools_chain.php php examples/web_lookup_chain.php php examples/stream_web_lookup_chain.php php examples/workspace_tools_demo.php php examples/workspace_escape_demo.php php examples/filesystem_tools_demo.php
Additional demos include chat_tools.php, mediation.php, moderation.php, exemples.php, etc. Prefer reading each file’s header comment for prerequisites (e.g. patch on PATH for workspace_tools_demo.php).
Shared helpers: examples/_helpers.php (print_output(), JSONL helpers, render env parsing). Optional defaults: examples/.env (loaded without overriding variables already in the process environment).
Environment variables
Values are read via getenv() / putenv() in library or example code as documented below. Do not enable logging if completions could contain secrets.
Examples / console (examples/_helpers.php, examples/.env)
| Variable | Effect |
|---|---|
TIVINS_LLAMA_CONVERSATION_LOG |
Path to a JSONL file; TurnJsonlLogger appends one line per logical turn when examples wire logging (TurnRecord::toLogArray()). Use {session} in the path for a per-process segment (new file each CLI run). |
TIVINS_LLAMA_NO_ANSI |
Truthy (1, true, yes, on): disable ANSI in HumanTurnRenderer / HumanTurnStreamDisplay. |
TIVINS_LLAMA_REASONING_STDOUT |
Truthy: print reasoning on stdout instead of stderr. |
TIVINS_LLAMA_COMPLETION_DUMP_RAW |
Truthy: print_output() uses legacy verbose debug instead of HumanTurnRenderer. |
example_load_examples_env_file() reads examples/.env only for keys not already set in the environment.
HTTP/TLS for tool traffic (PredefinedTools, e.g. web_search / fetch_web_page)
| Variable | Effect |
|---|---|
TIVINS_LLAMA_HTTP_SSL_VERIFY |
0 / false / no / off disables TLS verification (insecure; dev only). |
With verification enabled, PredefinedTools sets CURLSSLOPT_NATIVE_CA when PHP 8.2+ provides it (OS trust store). On older PHP, configure curl.cainfo in php.ini if HTTPS tool calls fail.
Programmatic override: PredefinedTools::setHttpSslVerifyPeer() (documented on the class).
LangSearch web search (langsearch_web_search)
| Variable | Effect |
|---|---|
TIVINS_LLAMA_LANGSEARCH_API_KEY |
Bearer token for LangSearch Web Search API (https://api.langsearch.com/v1/web-search). Required to use langsearch_web_search; DuckDuckGo web_search needs no key. |
Programmatic override: PredefinedTools::setLangSearchApiKey(). Per PHP process, LangSearchRateLimiter enforces QPS 1, QPM 60, QPD 1000; when a per-minute or per-day quota is exhausted, further LangSearch tool calls in that session return a structured error with fallback_tool: web_search.
Tests
PHPUnit 11 (composer install then):
composer test # all suites composer test:unit # tests/Unit only composer test:integration # tests/Integration only
Unit tests live under tests/Unit/ (no live LLM server). Integration tests under tests/Integration/ (filesystem tools, git, optional network for web_search / fetch_web_page — group network).
Fixtures: tests/fixtures/.
Interactive / diagnostic:
tests/stream_probe.php— live server: classifies incremental vs cumulativecontentdeltas (complements fixture tests).
Conversation logging and modern message fields
Use this overview to wire audit logs, replay, and native reasoning without re-reading every subsection; details are in the linked sections.
Message and constructor compatibility
- Assistant messages support optional
reasoningContent(wire keyreasoning_content). New code should pass it with a named argument (reasoningContent: '…') so it does not collide with$toolCalls. - Existing calls that only pass
Role,content, and tool fields stay compatible. If you previously used five positional arguments for an assistant message, recall the signature is(role, content, toolCallId, name, toolCalls, reasoningContent)— add reasoning via the last named parameter rather than shifting arguments.
Tool-calling loops (behavior change)
- Since 1.14.0,
ToolCallingLoopandStreamingToolCallingLoopappend the final assistant turn (notool_calls) when the model finishes, and throw ifmaxRoundsis exhausted while tools are still pending. SeeCHANGELOG.md(1.14.0).
JSONL audit and replay
- Enable
TIVINS_LLAMA_CONVERSATION_LOGin examples (Environment variables); each line is JSON fromTurnRecord::toLogArray()(raw_completionorraw_stream+stream_result, optionalrequest_messagesfor the prompt snapshot). - Reconstruct records with
TurnRecord::fromLogArray(); terminal replay:examples/replay_turn_jsonl.php. Full field list: JSONL audit logs.
Normalized view and console
NormalizedTurnOutcomemaps both non-stream completions andStreamResultinto one shape (API surface).HumanTurnRenderer/HumanTurnStreamDisplaycover human-readable output (Console output).
Contributor-facing implementation history for these features lives in CHANGELOG.md (releases 1.14.0 through 1.20.x) and in the sections above.
JSONL audit logs (TurnJsonlLogger / TurnRecord)
TurnJsonlLoggerwrites one JSON object per line fromTurnRecord::toLogArray().- Non-stream:
TurnRecord::forCompletion()stores the full completion JSON underraw_completion. - Stream:
TurnRecord::forStream()includesStreamResultfields andRawStreamTrace; when SSE capture is used,raw_data_linesholds verbatim SSE JSON strings; structuredeventsmay be empty depending on the capture path. - Request context: when examples pass
request_messages(same shape asConversation::toChatCompletionMessages()), logs include the prompt snapshot for that request;examples/replay_turn_jsonl.phpprints it before options and assistant output.
Treat log files as sensitive if they can contain user data or downstream secrets.
Console output (HumanTurnRenderer / HumanTurnStreamDisplay)
HumanTurnRendererprints usage (if present), model/id, finish reason, reasoning block, content, and tool calls forNormalizedTurnOutcomeorTurnRecordreplay;renderCompletionPayload()adapts a rawchatCompletions()array.HumanTurnStreamDisplayseparates streamed content (stdout by default), reasoning (stderr by default), and tool fragments / summaries (stderr), consistent withexamples/stream_*.php.- On Windows, use a modern terminal (e.g. Windows Terminal) for ANSI; or set
TIVINS_LLAMA_NO_ANSI.
Pitfalls and limits
Lama::chat()discards everything except first-choicecontent—avoid it for tool calling or nativereasoning_content.- Streaming
usage: many backends omitusageon SSE chunks;StreamResult::$usagestaysnullunless the server sends a usableusageobject on a parsed chunk (the library keeps the last such object). - Wire JSON shapes vary by server version; rely on this library’s aggregation helpers and tests for supported fields, or inspect raw payloads / logs.
- JSONL and captures can persist full prompts and completions—no secrets in shared logs.
ChatCompletionOptionspasses through keys the server may ignore or reject—check your backend’s supported subset.
License
MIT — see composer.json.