japananimetime/ai-address-normalizer

AI-powered address normalizer using multiple AI providers (Claude, ChatGPT, Gemini, Ollama)

Installs: 0

Dependents: 0

Suggesters: 0

Security: 0

Stars: 0

Forks: 0

pkg:composer/japananimetime/ai-address-normalizer

v0.1.0 2026-01-28 01:32 UTC

This package is not auto-updated.

Last update: 2026-01-28 19:03:53 UTC


README

AI-powered address normalizer. Cleans, standardizes, and extracts geo entities from raw address strings using configurable AI providers.

Install

composer require japananimetime/ai-address-normalizer

Or via GitLab HTTPS in composer.json:

{
    "repositories": [
        {
            "type": "vcs",
            "url": "https://gitlab.com/japananimetime/ai-address-normalizer.git"
        }
    ]
}

Setup

php artisan migrate

Set API keys in .env:

GEMINI_API_KEY=your-key
ANTHROPIC_API_KEY=your-key
OPENAI_API_KEY=your-key
OLLAMA_HOST=http://localhost:11434

Provider config (priority, model, enabled) lives in the ai_normalizer_services DB table. Migration seeds 4 defaults: Ollama, Gemini, Claude, ChatGPT.

Seed cities

Define cities in your published config, then seed:

// config/ai-normalizer.php
'cities' => [
    ['name' => 'Алматы', 'alternatives' => ['Алма-Ата', 'Almaty', 'Alma-Ata']],
    ['name' => 'Астана', 'alternatives' => ['Нур-Султан', 'Astana']],
],
php artisan ai-normalizer:seed-cities

Usage

Basic

use Japananimetime\AiAddressNormalizer\Facades\AiNormalizer;

// Normalize with best available provider (priority fallback)
$result = AiNormalizer::normalize('123 Main St, apt 4B, New York');

// With explicit city context (scopes alias/district lookups to this city)
$result = AiNormalizer::normalize('Main St 15', city: 'New York');

The city parameter is a city name string. It is resolved against the ai_normalizer_cities table (canonical name or alternative names). The resolved city ID is used internally for scoping DB queries (aliases, cache, suggestions).

If city is omitted, auto-detection scans the address text against all cities in the ai_normalizer_cities table.

Using the factory directly

use Japananimetime\AiAddressNormalizer\Services\AiNormalizerFactory;

$factory = app(AiNormalizerFactory::class);

// Priority fallback (same as facade)
$result = $factory->normalize($address);

// With city context
$result = $factory->normalize($address, city: 'Алматы');

// Force a specific provider (throws on failure, no fallback)
$result = $factory->normalizeWithProvider('gemini', $address, city: 'Алматы');

Response structure

normalize() always returns a NormalizationResult:

$result->originalAddress;   // string — raw input
$result->normalizedAddress; // string — cleaned address
$result->confidence;        // float  — 0.0–1.0
$result->provider;          // string — provider name or failure reason
$result->fromCache;         // bool   — true if served from cache
$result->changes;           // string[] — human-readable changes ["expanded ул.→улица", ...]
$result->suggestion;        // ?AliasSuggestion — new alias if AI found a pattern
$result->geoEntities;       // ?ExtractedGeoEntities — structured location parts (see below)

// Helper methods
$result->wasModified();     // bool — true if address was actually changed
$result->isUsable();        // bool — true if confidence > 0.3 and address not empty
$result->getCityId();       // ?int — city ID from extracted geo entities

// Immutable builders (for enrichers)
$result->withGeoEntities($geoEntities); // new instance with different geo entities
$result->withSuggestion($suggestion);   // new instance with different suggestion

Geo entities

When available, $result->geoEntities contains structured location data:

$geo = $result->geoEntities;

$geo->region;       // ?GeoEntity — область
$geo->district;     // ?GeoEntity — район области
$geo->city;         // ?GeoEntity — город/село
$geo->cityDistrict; // ?GeoEntity — район города
$geo->microraion;   // ?GeoEntity — микрорайон
$geo->zhkComplex;   // ?GeoEntity — жилой комплекс
$geo->street;       // ?GeoEntity — улица/проспект
$geo->houseNumber;  // ?string
$geo->apartment;    // ?string

// Each GeoEntity has:
$geo->city->name;           // string — original name from address
$geo->city->normalizedName; // string — normalized name
$geo->city->confidence;     // float  — 0.0–1.0
$geo->city->matchedDbId;    // ?int   — host-app DB ID (set by enricher)

// Immutable builders
$geo->city->withDbId(42);       // new GeoEntity with matchedDbId set
$geo->withCity($newCity);       // new ExtractedGeoEntities with different city
$geo->withCityDistrict($dist);  // ...and so on for all entity types

// Helpers
$geo->hasGeocodableData(); // bool — has street, complex, or microraion
$geo->getMostSpecific();   // ?GeoEntity — most specific entity for fallback geocoding

Passthrough results

On failure (no providers, all failed, garbage input), normalize() returns a passthrough — never throws:

$result = AiNormalizer::normalize('15'); // too short / numeric

$result->normalizedAddress; // "15" (unchanged)
$result->confidence;        // 0.0
$result->provider;          // "too_short_or_numeric"

Built-in Providers

  • Ollama (local, qwen2.5:14b)
  • Gemini (Google AI)
  • Claude (Anthropic)
  • ChatGPT (OpenAI)

Extensibility

Add new providers or multiple models without modifying the library.

Custom provider

  1. Create a class extending AbstractNormalizer:
namespace App\Services\AI;

use Japananimetime\AiAddressNormalizer\Providers\AbstractNormalizer;
use Japananimetime\AiAddressNormalizer\Dto\NormalizationResult;
use Illuminate\Support\Facades\Http;

class MistralNormalizer extends AbstractNormalizer
{
    protected string $model = 'mistral-large-latest';

    public function getProviderName(): string
    {
        return 'mistral';
    }

    public function normalize(string $address, array $context): NormalizationResult
    {
        // $this->config is set by the factory from DB config + env fallback
        $apiKey = $this->config['api_key'] ?? config('services.mistral.key');

        // Build prompts (inherited from AbstractNormalizer)
        $systemPrompt = $this->buildSystemPrompt($context);
        $userPrompt = $this->buildUserPrompt($address);

        // Call your AI API
        $response = Http::withHeaders(['Authorization' => "Bearer {$apiKey}"])
            ->post('https://api.mistral.ai/v1/chat/completions', [
                'model' => $this->model, // set by factory from DB model column
                'messages' => [
                    ['role' => 'system', 'content' => $systemPrompt],
                    ['role' => 'user', 'content' => $userPrompt],
                ],
                'response_format' => ['type' => 'json_object'],
            ]);

        $content = $response->json('choices.0.message.content', '');

        // extractJsonFromResponse() handles markdown code blocks, raw JSON, etc.
        $parsed = $this->extractJsonFromResponse($content);

        if (!$parsed) {
            return NormalizationResult::passthrough($address, 'parse_error');
        }

        // parseResponse() converts the AI JSON into a NormalizationResult DTO.
        // Pass $context['city_id'] so geo entities can resolve city references.
        return $this->parseResponse($address, $parsed, $context['city_id'] ?? null);
    }
}
  1. Add a DB row:
INSERT INTO ai_normalizer_services (name, provider_class, priority, is_active, model, config)
VALUES ('mistral', 'App\Services\AI\MistralNormalizer', 5, true, 'mistral-large-latest', '{"api_key":"sk-..."}');

Multiple models from same provider

-- Two Claude models with different priorities
INSERT INTO ai_normalizer_services (name, provider_class, priority, is_active, model)
VALUES ('claude-haiku', NULL, 1, true, 'claude-3-5-haiku-20241022');

-- Two Ollama models (include host in config)
INSERT INTO ai_normalizer_services (name, provider_class, priority, is_active, model, config)
VALUES ('ollama-large', NULL, 0, true, 'qwen2.5:32b', '{"host":"http://localhost:11434"}');

When provider_class is NULL, the factory strips the suffix and resolves via built-in ProviderEnum:

  • claude-haiku → base name claudeClaudeNormalizer
  • ollama-large → base name ollamaOllamaNormalizer

What AbstractNormalizer gives you for free

If your custom provider extends AbstractNormalizer, you inherit:

MethodWhat it does
buildSystemPrompt($context)Builds the full AI prompt with rules, aliases, geo entities
buildUserPrompt($address)Wraps address in a user message
extractJsonFromResponse($text)Extracts JSON from raw text, markdown blocks, etc.
parseResponse($address, $parsed, $cityId)Converts AI JSON into NormalizationResult with geo entities and alias suggestions
setModel() / setConfig()Model and config injection (called by factory)

You only need to implement normalize() (call API) and getProviderName().

Custom context builder

The default ContextBuilder uses only package-owned tables (ai_normalizer_cities, street_name_aliases). To inject additional context from your host app (e.g., city districts, microraions), extend ContextBuilder:

namespace App\Services\AI;

use Japananimetime\AiAddressNormalizer\Services\ContextBuilder;
use Illuminate\Support\Facades\DB;

class MyContextBuilder extends ContextBuilder
{
    public function build(string $address, ?string $city = null): array
    {
        $context = parent::build($address, $city);

        // Add your host-app data to the context
        if ($context['city_id']) {
            $context['geo_entities']['city_districts'] = DB::table('city_districts')
                ->where('city_id', $context['city_id'])
                ->pluck('name')
                ->toArray();
        }

        return $context;
    }
}

Bind it in your AppServiceProvider:

use Japananimetime\AiAddressNormalizer\Contracts\ContextBuilderInterface;

public function register(): void
{
    $this->app->singleton(ContextBuilderInterface::class, \App\Services\AI\MyContextBuilder::class);
}

The package binds the default ContextBuilder via singletonIf(), so your binding takes precedence.

Result enricher

The ResultEnricherInterface is a post-processing hook called AFTER AI parsing, BEFORE caching. Host apps use it to match AI-extracted geo entities against their own database tables.

namespace App\Services\AI;

use Japananimetime\AiAddressNormalizer\Contracts\ResultEnricherInterface;
use Japananimetime\AiAddressNormalizer\Dto\NormalizationResult;
use Illuminate\Support\Facades\DB;

class MyResultEnricher implements ResultEnricherInterface
{
    public function enrich(NormalizationResult $result, array $context): NormalizationResult
    {
        $geo = $result->geoEntities;
        if (!$geo) {
            return $result;
        }

        // Match city against your DB
        if ($geo->city && !$geo->city->matchedDbId) {
            $city = DB::table('cities')
                ->where('name', 'ILIKE', "%{$geo->city->normalizedName}%")
                ->first();

            if ($city) {
                $geo = $geo->withCity($geo->city->withDbId($city->id));
            }
        }

        // Match city district
        if ($geo->cityDistrict && !$geo->cityDistrict->matchedDbId) {
            $district = DB::table('city_districts')
                ->where('name', 'ILIKE', "%{$geo->cityDistrict->normalizedName}%")
                ->first();

            if ($district) {
                $geo = $geo->withCityDistrict($geo->cityDistrict->withDbId($district->id));
            }
        }

        return $result->withGeoEntities($geo);
    }
}

Bind it in your AppServiceProvider:

use Japananimetime\AiAddressNormalizer\Contracts\ResultEnricherInterface;

public function register(): void
{
    $this->app->singleton(ResultEnricherInterface::class, \App\Services\AI\MyResultEnricher::class);
}

The default NullResultEnricher returns the result unchanged.

Artisan Commands

# Normalize a single address
php artisan ai-normalizer:normalize "ул Абая 15" --city=Алматы

# Seed cities from config
php artisan ai-normalizer:seed-cities
php artisan ai-normalizer:seed-cities --dry-run

# Seed abbreviations (global)
php artisan ai-normalizer:seed-abbreviations

# Seed historical aliases (city-specific, uses city ID from ai_normalizer_cities)
php artisan ai-normalizer:seed-historical --city=2

Config Reference

Env VariableDescription
GEMINI_API_KEYGoogle Gemini API key
ANTHROPIC_API_KEYAnthropic Claude API key
OPENAI_API_KEYOpenAI ChatGPT API key
OLLAMA_HOSTOllama server URL (default: http://localhost:11434)
AI_NORMALIZER_CACHE_ENABLEDEnable response caching (default: true)
AI_NORMALIZER_CACHE_TTLCache TTL in seconds (default: 30 days)

License

Beerware