adachsoft/directory-scanner-tool

Safe, configurable directory scanning and file content search tools for PHP with adachsoft/ai-tool-call integration

Installs: 3

Dependents: 0

Suggesters: 0

Security: 0

Stars: 0

Forks: 0

pkg:composer/adachsoft/directory-scanner-tool

v0.2.1 2025-12-05 07:10 UTC

This package is not auto-updated.

Last update: 2025-12-05 07:05:33 UTC


README

Safe, configurable directory scanner and file content search tool for PHP projects, designed to integrate with the adachsoft/ai-tool-call library and AI agents (e.g. Google Gemini).

It exposes two tools:

  • directory_scanner – scans a configured base directory, applies exclusions and depth/entry limits, and returns a flat list of file system entries with optional metadata.
  • file_content_search – uses the same safe directory scanning, but additionally filters results to files whose contents match a given pattern (plain/regex/similarity search modes).

Version numbers are managed via Git tags / Packagist and follow Semantic Versioning. See CHANGELOG.md for notable changes.

Features

  • Safe scanning strictly confined to a configured base path (no directory traversal above base path).
  • Support for excluded subpaths (e.g. vendor, var/cache, .git).
  • Configurable maximum recursion depth (max_allowed_depth).
  • Configurable maximum number of returned entries (max_entries) with truncation indication.
  • Optional inclusion of additional metadata for each entry (keys are present only when enabled via include_*/default_include_* flags and a non-null value is available from the filesystem):
    • file size (bytes),
    • last modification time (ISO 8601 string),
    • future‑ready fields for creation time and permissions.
  • Flat, predictable result structure (items + summary).
  • Ready‑to‑use SPI tools + factories for adachsoft/ai-tool-call:
    • DirectoryScannerTool / DirectoryScannerToolFactory,
    • FileContentSearchTool / FileContentSearchToolFactory.
  • Content‑based file filtering via file_content_search with multiple search modes:
    • plain (case‑insensitive substring),
    • plain_case_sensitive,
    • regex,
    • similarity (fuzzy match using similar_text).

Requirements

  • PHP 8.3 or higher
  • Composer

The library depends on the following AdachSoft packages at runtime:

  • adachsoft/ai-tool-call
  • adachsoft/filesystem
  • adachsoft/normalized-safe-path

These are installed automatically when you require this package.

Installation

composer require adachsoft/directory-scanner-tool

Concepts and architecture

The core pieces of this library are:

  • DirectoryScannerTool – SPI tool implementation (AdachSoft\AiToolCall\SPI\ToolInterface) that is discovered and executed by adachsoft/ai-tool-call.
  • DirectoryScannerToolFactory – factory used by AiToolCallFacadeBuilder to create configured tool instances based on a ConfigMap.
  • FileContentSearchTool – SPI tool that wraps directory scanning and then filters entries by inspecting file contents using pluggable search strategies.
  • FileContentSearchToolFactory – factory that wires the same DirectoryScannerService and filesystem configuration, and composes FileContentSearchService.
  • DirectoryScannerService / DirectoryScanRunner – services responsible for scanning the file system and collecting results.
  • FileContentSearchService – uses DirectoryScannerService plus a set of search strategies (Strategy pattern) to keep the search logic extensible and testable.
  • PathNormalizationHelper – uses adachsoft/normalized-safe-path to ensure all paths stay inside the configured base path.

You typically do not construct these objects manually. Instead, you plug the factories into AiToolCallFacadeBuilder and configure the tools using ConfigMap.

Configuration

The tools are configured by the host application (not by the AI agent) via DirectoryScannerToolFactory, FileContentSearchToolFactory and ConfigMap.

Factory configuration for directory_scanner (host application)

Example of wiring the directory scanner tool with AiToolCallFacadeBuilder:

use AdachSoft\AiToolCall\PublicApi\Builder\AiToolCallFacadeBuilder;
use AdachSoft\AiToolCall\SPI\Collection\ConfigMap;
use AdachSoft\DirectoryScannerTool\DirectoryScannerToolFactory;

$factory = new DirectoryScannerToolFactory();

$facade = AiToolCallFacadeBuilder::new()
    ->withSpiFactories([$factory])
    ->withToolConfigs([
        'directory_scanner' => new ConfigMap([
            'base_path' => '/var/www/my-project',
            'excluded_paths' => ['vendor', 'var/cache', '.git'],
            'max_allowed_depth' => 10,
            'max_entries' => 5000,
            // Optional defaults for include_* flags when the agent does not specify them
            'default_include_size' => false,
            'default_include_created_at' => false,
            'default_include_modified_at' => false,
            'default_include_permissions' => false,
        ]),
    ])
    ->build();

Supported config keys (both tools)

All config keys are passed as an array to ConfigMap for tool names directory_scanner and file_content_search:

  • base_path (string, required)

    • Absolute path that acts as the root of all scans.
    • All agent‑provided paths are resolved relative to this base path.
  • excluded_paths (string[]|optional)

    • List of relative paths (from base path) that should be excluded from scanning.
    • Both the directory itself and all its descendants are excluded.
  • max_allowed_depth (int, optional, default: 10)

    • Maximum recursion depth allowed by the host application.
    • The effective depth used for a given request is the minimum of this value and the request‑level max_depth parameter (see below).
  • max_entries (int, optional, default: 5000)

    • Maximum number of entries that will be returned from a single scan.
    • If the limit is reached, scanning stops and summary.truncated_by_max_entries is set to true.
  • default_include_size (bool, optional, default: false)

  • default_include_created_at (bool, optional, default: false)
  • default_include_modified_at (bool, optional, default: false)
  • default_include_permissions (bool, optional, default: false)
    • Default values used when the agent omits corresponding request parameters.

Internally DirectoryScannerConfig keeps PHP properties in camelCase (e.g. $basePath, $excludedPaths), but everywhere arrays/JSON are used the keys follow snake_case as shown above.

The file_content_search tool uses the same configuration, but always returns only file entries whose contents match the request pattern.

Tool invocation (AI agent request)

Once the tools are registered, AI agents (or your own code) call them through the AdachSoft\AiToolCall\PublicApi\AiToolCallFacade.

Request parameters – directory_scanner

The directory_scanner tool exposes the following parameters schema (as seen in DirectoryScannerTool::getDefinition()):

  • path (string, required)

    • Relative path to scan from base path (e.g. ., src, src/Module).
    • . means "start from the base path itself".
  • recursive (bool, default: false)

    • Whether nested directories should be scanned recursively.
  • max_depth (int|null, default: null)

    • Maximum recursion depth relative to the starting directory.
    • 1 means "only direct children".
    • The actual maximum depth used is min(max_depth, config.max_allowed_depth).
  • include_size (bool, default: false)

    • Whether to include file size in bytes (for files only).
  • include_created_at (bool, default: false)

    • Reserved for future use (creation time; currently may always be null depending on filesystem).
  • include_modified_at (bool, default: false)

    • Whether to include last modification time as an ISO 8601 string.
  • include_permissions (bool, default: false)

    • Reserved for future use (POSIX‑like permission string); may be null if unavailable.

Request parameters – file_content_search

The file_content_search tool accepts the same parameters as directory_scanner, plus:

  • pattern (string, required)

    • Text or pattern to search for in file contents.
    • Must be a non‑empty string.
  • search_mode (string, default: plain)

    • Controls how pattern is applied to file contents.
    • One of:
      • plain – case‑insensitive substring search,
      • plain_case_sensitive – case‑sensitive substring search,
      • regex – PHP regular expression, pattern is wrapped as "/{$pattern}/u",
      • similarity – fuzzy match using similar_text (internal threshold ~70%).

If search_mode = 'regex' and the pattern is not a valid regular expression, the tool throws InvalidToolCallException.

Example: calling directory_scanner via Public API

use AdachSoft\AiToolCall\PublicApi\Dto\ToolCallRequestDto as PublicToolCallRequestDto;

$request = new PublicToolCallRequestDto(
    toolName: 'directory_scanner',
    parameters: [
        'path' => '.',
        'recursive' => true,
        'max_depth' => 3,
        'include_size' => true,
        'include_modified_at' => true,
    ],
);

$result = $facade->callTool($request);

// $result->toolName === 'directory_scanner'
// $result->result is an array with keys 'items' and 'summary'

$items = $result->result['items'];
$summary = $result->result['summary'];

Example: calling file_content_search via Public API

use AdachSoft\AiToolCall\PublicApi\Dto\ToolCallRequestDto as PublicToolCallRequestDto;

$request = new PublicToolCallRequestDto(
    toolName: 'file_content_search',
    parameters: [
        'path' => '.',
        'recursive' => true,
        'max_depth' => 3,
        'pattern' => 'TODO',
        'search_mode' => 'plain',
    ],
);

$result = $facade->callTool($request);

// $result->toolName === 'file_content_search'
// $result->result has the same shape as for directory_scanner

$items = $result->result['items'];
$summary = $result->result['summary'];

Only files whose contents match the given pattern (according to search_mode) are returned in items. Directories are never included in file_content_search results.

Response structure

Both tools return a structure containing two top‑level keys: items and summary.

items

items is a flat list of scan entries:

/**
 * @var array<int, array{
 *     path: string,
 *     name: string,
 *     is_file: bool,
 *     is_directory: bool,
 *     size?: int,
 *     created_at?: string,
 *     modified_at?: string,
 *     permissions?: string,
 * }> $items
 */
$items = $result->result['items'];
  • path – relative path from the configured base path.
  • name – basename of the entry (file or directory name).
  • is_filetrue if the entry is a file.
  • is_directorytrue if the entry is a directory.
  • size – file size in bytes. Present only when include_size/default_include_size is enabled for a file entry and the filesystem provides a size.
  • created_at – creation time as ISO 8601 string. Present only when include_created_at/default_include_created_at is enabled and creation time is available from the filesystem.
  • modified_at – last modification time as ISO 8601 string. Present only when include_modified_at/default_include_modified_at is enabled and last modification time is available from the filesystem.
  • permissions – POSIX‑style permissions string. Present only when include_permissions/default_include_permissions is enabled and the filesystem exposes a permissions string.

Optional metadata keys are omitted entirely when the corresponding include flags are disabled or not requested. The tools do not emit "...": null for fields that were not explicitly asked for.

For file_content_search, the structure is identical, but only entries with is_file === true that match the content search criteria are present.

summary

summary contains metadata about the scan:

/**
 * @var array{
 *     base_path: string,
 *     requested_path: string,
 *     recursive: bool,
 *     requested_max_depth: int|null,
 *     effective_max_depth: int,
 *     actual_depth_reached: int,
 *     total_entries_found: int,
 *     returned_entries_count: int,
 *     truncated_by_max_entries: bool,
 * } $summary
 */
$summary = $result->result['summary'];
  • base_path – the configured base path used for scanning.
  • requested_path – the path value from the request.
  • recursive – whether recursive scanning was enabled for the request.
  • requested_max_depth – raw max_depth from the request (may be null).
  • effective_max_depth – actual recursion depth limit used after applying config constraints.
  • actual_depth_reached – deepest level reached during the scan.
  • total_entries_found – total number of entries encountered (before truncation).
  • returned_entries_count – number of entries actually returned in items.
  • truncated_by_max_entriestrue if scanning was stopped because maxEntries was reached.

Error handling

The tools use exceptions from adachsoft/ai-tool-call and their own domain exceptions to signal problems:

  • InvalidToolCallException

    • Thrown when request parameters are invalid (e.g. wrong types, impossible options, invalid pattern for file_content_search).
  • ToolExecutionException

    • Wraps domain and filesystem errors that occur during scanning or content search.
    • The original cause is available as the previous exception and usually contains a DirectoryScannerToolException with a more detailed message.
  • DirectoryScannerDomainException

    • Used internally for invalid or unsafe path operations (e.g. attempts to escape base path).

In typical adachsoft/ai-tool-call setups, these exceptions are translated into structured error responses returned to the AI agent.

Development

To work on the library locally, install dev dependencies and run the checks:

composer install

# Run test suite
vendor/bin/phpunit

# Run static analysis
vendor/bin/phpstan analyse

# Run coding standards fixer (dry run or fix)
PHP_CS_FIXER_IGNORE_ENV=1 vendor/bin/php-cs-fixer fix --dry-run

Versioning

This library follows Semantic Versioning. Versions are published as Git tags and exposed on Packagist. The composer.json file does not contain an explicit version field; Composer reads version information from VCS tags.

See CHANGELOG.md for a list of notable changes between versions.

License

This library is open‑source software licensed under the MIT License. See the LICENSE file for full license text.

Author

  • Arkadiusz Adach