thesubhendu / embedvector-laravel
Recommendation engine using Open AI embedding and PostgresSQL pgvector
Installs: 47
Dependents: 0
Suggesters: 0
Security: 0
Stars: 4
Watchers: 1
Forks: 2
Open Issues: 2
pkg:composer/thesubhendu/embedvector-laravel
Requires
- php: ^8.3
- illuminate/contracts: ^10.0||^11.0||^12.0
- openai-php/client: ^0.15.0
- pgvector/pgvector: ^0.2.0
- spatie/laravel-package-tools: ^1.16
Requires (Dev)
- larastan/larastan: ^2.9||^3.0
- laravel/pint: ^1.14
- nunomaduro/collision: ^8.1.1||^7.10.0
- orchestra/testbench: ^9.0.0||^8.22.0||^10.0
- pestphp/pest: ^2.34||^3.0
- pestphp/pest-plugin-arch: ^2.7||^3.0
- pestphp/pest-plugin-laravel: ^2.3||^3.0
- phpstan/extension-installer: ^1.3
- phpstan/phpstan-deprecation-rules: ^1.1
- phpstan/phpstan-phpunit: ^1.3
This package is auto-updated.
Last update: 2025-10-04 09:31:35 UTC
README
A Laravel package for building intelligent recommendation systems using OpenAI embeddings. Perfect for creating personalized content recommendations, job matching, product suggestions, and similar features where you need to find relevant matches based on user profiles or content similarity.
Features
- Batch embedding processing using OpenAI's batch API
- Separate database connection support for vector operations
- Automatic vector extension creation for PostgreSQL
- Efficient batch processing with configurable chunk sizes
- Dual Contract System: Separate contracts for embedding generation and searchable models
- Smart Model Separation: Models can be either embedding sources or searchable targets
Installation
- Install the package via Composer:
composer require thesubhendu/embedvector-laravel
- Publish the configuration and migrations:
php artisan vendor:publish --provider="Subhendu\EmbedVector\EmbedVectorServiceProvider"
- Configure your environment variables:
OPENAI_API_KEY=your_openai_api_key_here
Database Requirements: This package requires PostgreSQL with the pgvector extension for vector operations.
Optional: If you want to use a separate PostgreSQL database connection other than your application database for vector operations, you can set the EMBEDVECTOR_DB_CONNECTION
environment variable.
EMBEDVECTOR_DB_CONNECTION=pgsql
- Run the migrations
php artisan migrate
Usage
Understanding the Contract System
This package uses two distinct contracts to separate concerns based on the direction of matching:
EmbeddableContract
- For models that generate embeddings (e.g., Customer/Candidate profiles)EmbeddingSearchableContract
- For models that can be found using embeddings (e.g., Jobs)
Example Use Case: Job Matching for Candidates
If system is designed to find matching jobs for customers/candidates, not the other way around:
- Customer/Candidate implements
EmbeddableContract
→ generates embeddings from their profile, skills, preferences - Job implements
EmbeddingSearchableContract
→ can be found/recommended based on candidate embeddings - Flow: Customer embeddings are used to find relevant Jobs that match their profile
For Bidirectional Matching: If you want both ways (finding jobs for candidates AND finding candidates for jobs), then both models need to implement EmbeddingSearchableContract
.
Basic Embedding
use Subhendu\EmbedVector\Services\EmbeddingService; $embeddingService = app(EmbeddingService::class); $embedding = $embeddingService->createEmbedding('Your text here');
Implementing Contracts
For Models That Generate Embeddings (e.g., Customer)
use Illuminate\Database\Eloquent\Model; use Subhendu\EmbedVector\Contracts\EmbeddableContract; use Subhendu\EmbedVector\Traits\EmbeddableTrait; class Customer extends Model implements EmbeddableContract { use EmbeddableTrait; public function toEmbeddingText(): string { return $this->name . ' ' . $this->department . ' ' . $this->skills; } }
For Models That Can Be Searched (e.g., Job)
use Illuminate\Database\Eloquent\Model; use Illuminate\Database\Eloquent\Factories\HasFactory; use Subhendu\EmbedVector\Contracts\EmbeddingSearchableContract; use Subhendu\EmbedVector\Traits\EmbeddingSearchableTrait; class Job extends Model implements EmbeddingSearchableContract { use EmbeddingSearchableTrait; use HasFactory; public function toEmbeddingText(): string { return $this->title . ' ' . $this->description . ' ' . $this->requirements; } }
Note: EmbeddingSearchableContract
extends EmbeddableContract
, and EmbeddingSearchableTrait
automatically includes EmbeddableTrait
functionality, so you only need to use one trait.
Finding Matching Results
Basic Usage
// Find jobs that match a customer's profile $customer = Customer::find(1); $matchingJobs = $customer->matchingResults(Job::class, 10); foreach ($matchingJobs as $job) { echo "Job: {$job->title} - Match: {$job->match_percent}%"; echo "Distance: {$job->distance}"; }
Note: The matchingResults()
method automatically uses getOrCreateEmbedding()
internally, which means:
- If no embedding exists for the source model, it will be created
- If an embedding exists but needs sync (
embedding_sync_required = true
), it will be updated - This ensures you always get accurate similarity results
Advanced Usage with Filters
You can add query filters to narrow down the search results before embedding similarity is calculated:
// Find only active jobs in specific locations $customer = Customer::find(1); $matchingJobs = $customer->matchingResults( targetModelClass: Job::class, topK: 10, queryFilter: function ($query) { $query->where('status', 'active') ->whereIn('location', ['New York', 'San Francisco']) ->where('salary', '>=', 80000); } );
Method Parameters
targetModelClass
(string): The class name of the model you want to find matches fortopK
(int, default: 5): Maximum number of results to returnqueryFilter
(Closure, optional): Custom query constraints to apply before similarity matching
Return Properties
Each returned model includes additional properties:
match_percent
(float): Similarity percentage (0-100, higher is better)distance
(float): Vector distance (lower is better for similarity)
Configuration
The package publishes a configuration file to config/embedvector.php
with the following options:
return [ 'openai_api_key' => env('OPENAI_API_KEY', ''), 'embedding_model' => env('EMBEDVECTOR_MODEL', 'text-embedding-3-small'), 'distance_metric' => env('EMBEDVECTOR_DISTANCE', 'cosine'), // cosine | l2 'search_strategy' => env('EMBEDVECTOR_SEARCH_STRATEGY', 'auto'), // auto | optimized | cross_connection 'lot_size' => env('EMBEDVECTOR_LOT_SIZE', 50000), 'chunk_size' => env('EMBEDVECTOR_CHUNK_SIZE', 500), 'directories' => [ 'input' => 'embeddings/input', 'output' => 'embeddings/output', ], 'database_connection' => env('EMBEDVECTOR_DB_CONNECTION', 'pgsql'), 'model_fields_to_check' => [ // Configure fields to monitor for automatic sync // 'App\Models\Job' => ['title', 'description', 'requirements'], ], ];
Configuration Options Explained
openai_api_key
: Your OpenAI API key (required in production)embedding_model
: OpenAI embedding model to use (text-embedding-3-small, text-embedding-3-large, etc.)distance_metric
: Vector similarity calculation methodcosine
: Better for semantic similarity (recommended)l2
: Euclidean distance for geometric similarity
search_strategy
: How to perform similarity searchesauto
: Automatically choose the best strategy (recommended)optimized
: Use JOIN-based queries (same database only)cross_connection
: Two-step approach (works across different databases)
lot_size
: Maximum items per OpenAI batch (up to 50,000)chunk_size
: Items processed per chunk during batch generationdatabase_connection
: PostgreSQL connection for vector operationsmodel_fields_to_check
: Configure fields to monitor for automatic sync withFireSyncEmbeddingTrait
Batch Processing
For processing large datasets efficiently, this package provides batch processing capabilities using OpenAI's batch API, which is more cost-effective for processing many embeddings at once.
Commands
php artisan embedding:gen {model} {--type=sync|init} {--force}
- Generate batch embeddings for a specific modelphp artisan embedding:proc {--batch-id=} {--all}
- Process completed batch results
Command Options
embedding:gen
{model}
- The model class name to generate embeddings for (e.g.App\\Models\\Job
)--type=sync
- Processing type (default: sync)--force
- Force overwrite existing files
embedding:proc
--batch-id=
- Process a specific batch by ID--all
- Process all completed batches- No options - Check and process batches that are ready (default behavior)
Usage Examples
# Generate embeddings for User model (init = first time, sync = update existing) php artisan embedding:gen "App\\Models\\User" --type=init # Generate embeddings for sync (only models that need updates) php artisan embedding:gen "App\\Models\\Job" --type=sync # Check and process ready batches (default) php artisan embedding:proc # Process all completed batches php artisan embedding:proc --all # Process specific batch php artisan embedding:proc --batch-id=batch_abc123
Real-World Examples
E-commerce Product Recommendations
// Product model (searchable) class Product extends Model implements EmbeddingSearchableContract { use EmbeddingSearchableTrait; public function toEmbeddingText(): string { return $this->name . ' ' . $this->description . ' ' . $this->category . ' ' . $this->tags; } } // User model (generates embeddings from purchase history) class User extends Model implements EmbeddableContract { use EmbeddableTrait; public function toEmbeddingText(): string { $purchaseHistory = $this->orders() ->with('products') ->get() ->flatMap->products ->pluck('name') ->implode(' '); return $this->preferences . ' ' . $purchaseHistory; } } // Find recommended products for a user $user = User::find(1); $recommendations = $user->matchingResults( targetModelClass: Product::class, topK: 20, queryFilter: function ($query) { $query->where('in_stock', true) ->where('price', '<=', 500) ->whereNotIn('id', auth()->user()->purchased_product_ids); } );
Job Matching Platform
// Find jobs for a candidate with filters $candidate = Candidate::find(1); $matchingJobs = $candidate->matchingResults( targetModelClass: Job::class, topK: 15, queryFilter: function ($query) use ($candidate) { $query->where('status', 'open') ->where('remote_allowed', $candidate->prefers_remote) ->whereIn('experience_level', $candidate->acceptable_levels) ->where('salary_min', '>=', $candidate->min_salary); } ); foreach ($matchingJobs as $job) { echo "Match: {$job->match_percent}% - {$job->title} at {$job->company}"; }
Content Recommendation System
// Article model class Article extends Model implements EmbeddingSearchableContract { use EmbeddingSearchableTrait; public function toEmbeddingText(): string { return $this->title . ' ' . $this->summary . ' ' . $this->tags . ' ' . $this->category; } } // User reading history model class UserProfile extends Model implements EmbeddableContract { use EmbeddableTrait; public function toEmbeddingText(): string { $readingHistory = $this->user->readArticles() ->selectRaw('GROUP_CONCAT(title, " ", summary) as content') ->value('content'); return $this->interests . ' ' . $readingHistory; } } // Get personalized article recommendations $profile = UserProfile::where('user_id', auth()->id())->first(); $recommendations = $profile->matchingResults( targetModelClass: Article::class, topK: 10, queryFilter: function ($query) use ($profile) { $query->where('published', true) ->where('created_at', '>=', now()->subDays(7)) ->whereNotIn('id', $profile->user->read_article_ids); } );
Embedding Management Examples
Working with Embeddings
$job = Job::find(1); // Check if an embedding exists without creating one $embedding = $job->getEmbedding(); if ($embedding) { echo "Embedding exists: " . ($embedding->embedding_sync_required ? "Needs sync" : "Up to date"); } else { echo "No embedding found"; } // Get or create embedding (will create if missing or update if sync required) $embedding = $job->getOrCreateEmbedding(); echo "Embedding ready with match percentage calculation"; // Force create a fresh embedding (useful for testing or manual refresh) $freshEmbedding = $job->createFreshEmbedding(); // Queue for syncing (mark for batch update later) $job->queueForSyncing();
Batch Sync Workflow
// 1. Mark multiple models for syncing $jobs = Job::where('updated_at', '>', now()->subDays(1))->get(); foreach ($jobs as $job) { $job->queueForSyncing(); // Queue each job for sync } // 2. Process all queued embeddings in batch php artisan embedding:gen "App\\Models\\Job" --type=sync // 3. Process the completed batch php artisan embedding:proc --all
Conditional Embedding Updates
class JobController extends Controller { public function update(Request $request, Job $job) { $job->update($request->validated()); // Only queue for syncing if embedding-relevant fields changed if ($job->wasChanged(['title', 'description', 'requirements'])) { $job->queueForSyncing(); } return response()->json($job); } }
Real-time vs Batch Embedding Strategy
// Real-time embedding (immediate, good for single updates) $job = Job::create($data); $embedding = $job->getOrCreateEmbedding(); // Creates immediately // Batch embedding (efficient for bulk updates) $jobs = Job::factory()->count(100)->create(); foreach ($jobs as $job) { $job->queueForSyncing(); // Mark for batch processing } // Then run: php artisan embedding:gen "App\\Models\\Job" --type=sync
Embedding Lifecycle Management
// Scenario 1: New model creation $job = Job::create($data); // Option A: Create embedding immediately $embedding = $job->getOrCreateEmbedding(); // Option B: Queue for batch processing (more efficient) $job->queueForSyncing(); // Scenario 2: Model updates $job->update(['title' => 'Updated Title']); // Option A: Update embedding immediately $job->createFreshEmbedding(); // Option B: Queue for batch processing (recommended) $job->queueForSyncing(); // Scenario 3: Checking embedding status $embedding = $job->getEmbedding(); if (!$embedding) { echo "No embedding exists"; } elseif ($embedding->embedding_sync_required) { echo "Embedding needs update"; } else { echo "Embedding is up to date"; } // Scenario 4: Bulk operations $jobs = Job::where('department', 'Engineering')->get(); foreach ($jobs as $job) { $job->queueForSyncing(); // Queue all for batch processing } // Process in batch: php artisan embedding:gen "App\\Models\\Job" --type=sync
Best Practices
1. Optimize Your toEmbeddingText()
Method
public function toEmbeddingText(): string { // ✅ Good: Concise, relevant information return trim($this->title . ' ' . $this->description . ' ' . $this->tags); // ❌ Avoid: Too much noise or irrelevant data // return $this->created_at . ' ' . $this->id . ' ' . $this->long_legal_text; }
2. Use Appropriate Filters
// ✅ Good: Filter before similarity calculation $matches = $user->matchingResults( Product::class, 10, fn($q) => $q->where('available', true)->where('price', '<=', $budget) ); // ❌ Less efficient: Filtering after embedding calculation $allMatches = $user->matchingResults(Product::class, 100); $filtered = $allMatches->where('available', true);
3. Choose the Right Embedding Method
Understanding when to use each embedding method:
$job = Job::find(1); // ✅ Use getEmbedding() when you just want to check if embedding exists $embedding = $job->getEmbedding(); if ($embedding && !$embedding->embedding_sync_required) { // Use existing embedding } // ✅ Use getOrCreateEmbedding() for similarity matching (recommended) $matchingJobs = $customer->matchingResults(Job::class); // Uses getOrCreateEmbedding internally // ✅ Use createFreshEmbedding() when you want to force regeneration $job->update(['title' => 'New Title']); $freshEmbedding = $job->createFreshEmbedding(); // Immediate update // ✅ Use queueForSyncing() for deferred batch processing (most efficient) $job->update(['title' => 'New Title']); $job->queueForSyncing(); // Mark for later batch processing
4. Manage Embedding Sync
Manual Sync Management
// Method 1: Using queueForSyncing() (recommended) class Job extends Model implements EmbeddingSearchableContract { use EmbeddingSearchableTrait; protected static function booted() { static::updated(function ($job) { if ($job->isDirty(['title', 'description', 'requirements'])) { $job->queueForSyncing(); // Simpler and cleaner approach } }); } } // Method 2: Direct embedding update (legacy approach) class Job extends Model implements EmbeddingSearchableContract { use EmbeddingSearchableTrait; protected static function booted() { static::updated(function ($job) { if ($job->isDirty(['title', 'description', 'requirements'])) { $job->embedding()->update(['embedding_sync_required' => true]); } }); } }
Automatic Sync Management with FireSyncEmbeddingTrait
For automatic embedding sync management, use the FireSyncEmbeddingTrait
:
use Subhendu\EmbedVector\Traits\FireSyncEmbeddingTrait; class Job extends Model implements EmbeddingSearchableContract { use EmbeddingSearchableTrait, FireSyncEmbeddingTrait; // No need for manual booted() method - trait handles it automatically }
Configure which fields to monitor in your config/embedvector.php
:
return [ // ... other config options 'model_fields_to_check' => [ 'App\Models\Job' => ['title', 'description', 'requirements'], 'App\Models\Product' => ['name', 'description', 'category'], 'App\Models\User' => ['name', 'bio', 'skills'], ], ];
How it works:
- The trait automatically monitors specified fields for changes
- When any monitored field changes, it marks the embedding for re-sync
- Only triggers when fields actually change (compares old vs new values)
- Respects the configuration mapping for each model class
Troubleshooting
Common Issues
-
"No embedding found in response"
- Check your OpenAI API key is valid
- Verify the embedding model exists
- Ensure your
toEmbeddingText()
returns non-empty strings
-
"Model class must implement EmbeddingSearchableContract"
- Target models must implement
EmbeddingSearchableContract
- Source models only need
EmbeddableContract
- Target models must implement
-
Poor matching results
- Review your
toEmbeddingText()
method - it should contain relevant, semantic information - Consider using
cosine
distance for semantic similarity - Try different embedding models (text-embedding-3-large for better quality)
- Review your
-
Performance issues
- Use batch processing for large datasets
- Consider using
optimized
search strategy for same-database scenarios - Add appropriate database indexes
-
Cross-connection relationship limitations
- The
embedding()
relationship only works when both models use the same database connection - For cross-connection setups (e.g., Jobs in MySQL, embeddings in PostgreSQL), use
getEmbedding()
orgetOrCreateEmbedding()
methods instead of the relationship - Direct relationship access (
$model->embedding
) will returnnull
in cross-connection scenarios
- The
-
Embedding method confusion
- Use
getEmbedding()
when you only want to check if an embedding exists (returns null if not found) - Use
getOrCreateEmbedding()
when you need an embedding for similarity matching (creates/updates as needed) - Use
queueForSyncing()
to defer embedding updates for batch processing (most efficient for bulk updates)
- Use
Database Performance
-- Add indexes for better performance CREATE INDEX IF NOT EXISTS embeddings_model_type_idx ON embeddings (model_type); CREATE INDEX IF NOT EXISTS embeddings_sync_required_idx ON embeddings (embedding_sync_required);
Environment Variables Reference
Add these to your .env
file:
# Required OPENAI_API_KEY=your_openai_api_key_here # Optional - Customize behavior EMBEDVECTOR_MODEL=text-embedding-3-small EMBEDVECTOR_DISTANCE=cosine EMBEDVECTOR_SEARCH_STRATEGY=auto EMBEDVECTOR_LOT_SIZE=50000 EMBEDVECTOR_CHUNK_SIZE=500 EMBEDVECTOR_DB_CONNECTION=pgsql
Architecture Overview
┌─────────────────┐ ┌─────────────────┐ ┌─────────────────┐
│ Source Model │ │ Target Model │ │ Embeddings │
│ (EmbeddableContract) │ (EmbeddingSearchableContract) │ │ Table │
│ │ │ │ │ │
│ • Customer │───▶│ • Job │◀──│ • Vector data │
│ • User Profile │ │ • Product │ │ • Similarity │
│ • Candidate │ │ • Article │ │ calculations │
└─────────────────┘ └─────────────────┘ └─────────────────┘
│ │ │
▼ ▼ ▼
┌─────────────────┐ ┌─────────────────┐ ┌─────────────────┐
│ toEmbeddingText()│ │ toEmbeddingText()│ │ PostgreSQL │
│ • Generate text │ │ • Generate text │ │ with pgvector │
│ for embedding │ │ for embedding │ │ extension │
└─────────────────┘ └─────────────────┘ └─────────────────┘
API Reference
EmbeddableTrait Methods
matchingResults(string $targetModelClass, int $topK = 5, ?Closure $queryFilter = null): Collection
Find models similar to the current model.
Parameters:
$targetModelClass
: Fully qualified class name of the target model$topK
: Maximum number of results (default: 5)$queryFilter
: Optional closure to filter results before similarity calculation
Returns: Collection of models with match_percent
and distance
properties
getEmbedding(): ?Embedding
Get the existing embedding for the current model without creating a new one.
Returns: Embedding model instance or null if no embedding exists
getOrCreateEmbedding(): Embedding
Get the existing embedding or create a new one if none exists. Also handles updating embeddings when embedding_sync_required
is true.
Returns: Embedding model instance
queueForSyncing(): void
Mark the model's embedding for re-generation on the next sync. This is useful when you want to defer embedding updates until a batch process runs.
Returns: void
createFreshEmbedding(): Embedding
Force create a new embedding for the model, bypassing any existing embedding.
Returns: Newly created Embedding model instance
embedding(): MorphOne
Eloquent relationship to the embedding record.
Returns: MorphOne relationship
EmbeddingSearchableTrait Methods
queryForEmbedding(): Builder
Get the base query for models to be embedded during initial processing.
Returns: Eloquent Builder instance
queryForSyncing(): Builder
Get the query for models that need re-embedding (sync process).
Returns: Eloquent Builder instance
Configuration Methods
getConnectionName(): ?string
Get the database connection name for the model.
Returns: Database connection name or null for default
Testing
The package includes comprehensive tests. Run them with:
# Run all tests vendor/bin/pest # Run with coverage (requires Xdebug) vendor/bin/pest --coverage # Run static analysis vendor/bin/phpstan analyse
Contributing
- Fork the repository
- Create your feature branch (
git checkout -b feature/amazing-feature
) - Make your changes
- Add tests for your changes
- Ensure all tests pass (
vendor/bin/pest
) - Run static analysis (
vendor/bin/phpstan analyse
) - Commit your changes (
git commit -m 'Add amazing feature'
) - Push to the branch (
git push origin feature/amazing-feature
) - Open a Pull Request
License
The MIT License (MIT). Please see License File for more information.
Credits
- Subhendu Bhatta
- Built with Laravel
- Powered by OpenAI Embeddings
- Uses pgvector for PostgreSQL vector operations