dgtlss / semantica
A Laravel package for semantic search using vector embeddings
Installs: 1
Dependents: 0
Suggesters: 0
Security: 0
Stars: 0
Watchers: 0
Forks: 0
Open Issues: 0
pkg:composer/dgtlss/semantica
Requires
- php: ^8.4
- illuminate/cache: ^11.0
- illuminate/console: ^11.0
- illuminate/database: ^11.0
- illuminate/http: ^11.0
- illuminate/support: ^11.0
Requires (Dev)
- larastan/larastan: ^3.8
- orchestra/testbench: ^9.0
- pestphp/pest: ^4.0
- phpstan/extension-installer: *
- phpstan/phpstan: ^2.1
- rector/rector: ^3.0
README
A Laravel package that enables semantic search using vector embeddings for better relevance in content-heavy applications like blogs, e-commerce, or knowledge bases. Supports multiple AI providers including OpenAI, Google Gemini, and local Ollama models, with comprehensive security features and static analysis.
Features
- Generate text embeddings using multiple AI providers (OpenAI, Gemini, Ollama)
- Automatic embedding generation for Eloquent models with
HasEmbeddingstrait - Semantic search with configurable similarity metrics (cosine, euclidean, dot product)
- Configurable similarity thresholds and result caching
- Batch processing for performance optimization
- Artisan commands for indexing and reindexing existing data
- Comprehensive security features and input validation
- Static analysis with PHPStan and automated code quality tools
- Support for both cloud and local embedding models
Installation
Install via Composer:
composer require dgtlss/semantica
Publish the configuration and migration:
php artisan vendor:publish --provider="Dgtlss\Semantica\Providers\SemanticaServiceProvider" --tag="semantica-config" php artisan vendor:publish --provider="Dgtlss\Semantica\Providers\SemanticaServiceProvider" --tag="semantica-migrations"
Run the migration:
php artisan migrate
Configuration
Choose your embedding provider and set the appropriate API keys in your .env file:
OpenAI (Default)
SEMANTICA_PROVIDER=openai OPENAI_API_KEY=your-openai-api-key-here SEMANTICA_EMBEDDING_MODEL=text-embedding-3-small
Anthropic
SEMANTICA_PROVIDER=anthropic ANTHROPIC_API_KEY=your-anthropic-api-key-here SEMANTICA_ANTHROPIC_MODEL=claude-3-sonnet-20240229
Gemini (Google)
SEMANTICA_PROVIDER=gemini GEMINI_API_KEY=your-gemini-api-key-here SEMANTICA_GEMINI_MODEL=text-embedding-004
Ollama (Local Models)
SEMANTICA_PROVIDER=ollama OLLAMA_BASE_URL=http://localhost:11434 SEMANTICA_OLLAMA_MODEL=nomic-embed-text
Additional Configuration
SEMANTICA_AUTO_EMBED=false # Disabled by default for security SEMANTICA_CACHE_ENABLED=true SEMANTICA_CACHE_TTL=3600 SEMANTICA_BATCH_SIZE=100 SEMANTICA_SIMILARITY_THRESHOLD=0.7
Usage
Automatic Embedding
Models using the HasEmbeddings trait will automatically have embeddings generated when saved only if SEMANTICA_AUTO_EMBED=true is set in your environment file. This is disabled by default for security reasons.
// With SEMANTICA_AUTO_EMBED=true $post = Post::create([ 'title' => 'Laravel Tips', 'content' => 'Here are some useful Laravel tips...', ]); // Embedding is automatically generated // With SEMANTICA_AUTO_EMBED=false (default) $post = Post::create([ 'title' => 'Laravel Tips', 'content' => 'Here are some useful Laravel tips...', ]); // No embedding generated - use manual embedding or artisan commands
Manual Embedding
Use the service directly:
use Dgtlss\Semantica\Services\EmbeddingService; $embeddingService = app(EmbeddingService::class); $embeddingService->embed($post);
Semantic Search
Use the facade for searching:
use Dgtlss\Semantica\Facades\Semantica; $results = Semantica::search('PHP framework tutorials', App\Models\Post::class, 10, 0.8); // Or get models directly $posts = Semantica::searchModels('PHP framework tutorials', App\Models\Post::class);
Commands
Index existing records:
php artisan semantica:index App\\Models\\Post
Reindex models:
php artisan semantica:reindex App\\Models\\Post php artisan semantica:reindex --all
Model Trait
To enable automatic embeddings for a model, use the HasEmbeddings trait:
use Dgtlss\Semantica\Traits\HasEmbeddings; class Post extends Model { use HasEmbeddings; // Customize embedding fields (optional) public function getEmbeddingFields(): array { return ['title', 'excerpt', 'body']; } }
Security Considerations
API Keys and Authentication
- API keys are stored securely in environment variables and never logged
- The package validates API key presence at service initialization
- Supports multiple providers with proper key validation
Data Privacy and Protection
- Auto-embedding is disabled by default - must be explicitly enabled via
SEMANTICA_AUTO_EMBED=true - Text content is sanitized before sending to external APIs (HTML tags removed, whitespace normalized)
- Input validation prevents empty or malicious text from being processed
- Embeddings are hidden from model JSON serialization by default
Input Validation and Sanitization
- Search queries are trimmed and validated for emptiness
- Model class names are validated to prevent class injection attacks
- Configured models are verified to be valid Eloquent classes
- Text length is limited to prevent abuse (8KB max)
Performance and Abuse Prevention
- Embedding generation is rate-limited by external API constraints
- Batch processing limits prevent memory exhaustion
- Search results are capped (max 100 results)
- Similarity thresholds are clamped between 0.0 and 1.0
Network Security
- HTTPS is enforced for API communications
- HTTP client includes retry logic for resilience
- Timeouts prevent hanging requests
Configuration Security
- Sensitive configuration values are properly typed and validated
- Unsupported providers throw exceptions instead of falling back silently
Logging and Monitoring
- Errors are logged without exposing sensitive information
- API failures include status codes but not response bodies in logs
- Performance metrics (text length, provider, model) are logged for monitoring
Best Practices
- Regularly rotate API keys
- Monitor API usage and costs
- Use caching to reduce external API calls
- Test with mock providers in development
- Keep dependencies updated for security patches
Supported Providers
- OpenAI: High-quality embeddings with multiple models available
- Anthropic: Currently not supported for embeddings (API doesn't provide embedding endpoints) - placeholder implementation
- Gemini: Google's embedding models via Generative AI API
- Ollama: Run embedding models locally using Ollama
API Reference
Facade Methods
use Dgtlss\Semantica\Facades\Semantica; // Search for similar content $results = Semantica::search('query text', App\Models\Post::class, 10, 0.8); // Get models directly $posts = Semantica::searchModels('query text', App\Models\Post::class); // Embed a model manually Semantica::embed($model);
Service Methods
use Dgtlss\Semantica\Services\EmbeddingService; use Dgtlss\Semantica\Services\SearchService; $embeddingService = app(EmbeddingService::class); $searchService = app(SearchService::class); // Generate embedding for text $embedding = $embeddingService->generateEmbedding('text'); // Embed a model $embeddingService->embed($model); // Search $results = $searchService->search('query', App\Models\Post::class); $models = $searchService->searchModels('query', App\Models\Post::class);
Extending Providers
To add support for additional embedding providers, implement the EmbeddingProviderInterface:
<?php namespace Dgtlss\Semantica\Services\Providers; use Dgtlss\Semantica\Services\Providers\EmbeddingProviderInterface; class CustomProvider implements EmbeddingProviderInterface { public function __construct(array $config) { /* ... */ } public function generateEmbedding(string $text): array { /* ... */ } public function getModel(): string { /* ... */ } public function getDimensions(): int { /* ... */ } }
Then update the service provider to register your provider.
Development and Quality Assurance
Static Analysis
This package uses PHPStan for static analysis to ensure code quality:
vendor/bin/phpstan analyse
Code Quality Tools
- PHPStan: Static analysis with strict level 8 configuration
- Rector: Automated code refactoring (run with
vendor/bin/rector process) - Larastan: Laravel-specific PHPStan extensions
- Pest: Modern PHP testing framework
Testing
Run tests with Pest:
vendor/bin/pest
Troubleshooting
Common Issues
-
API Key Not Found: Ensure your API key is set in
.envfile with the correct name for your provider. -
Embedding Generation Fails: Check your API key validity and network connectivity. For Ollama, ensure the service is running.
-
Model Not Found During Search: Ensure models using
HasEmbeddingstrait have been properly indexed using the artisan commands. -
Low Similarity Scores: Adjust the similarity threshold or check if your content is being embedded correctly.
Debug Commands
Index specific models:
php artisan semantica:index App\\Models\\Post
Reindex all models:
php artisan semantica:reindex --all
Check embeddings table:
php artisan tinker >>> Dgtlss\Semantica\Models\Embedding::count()
Requirements
- PHP 8.1+
- Laravel 10.0+ or 11.0+
- API key for chosen provider (OpenAI or Gemini) or Ollama installation for local models
License
MIT License