brynj-digital / laravel-scout-vectorize
Cloudflare Vectorize driver for Laravel Scout
Installs: 12
Dependents: 0
Suggesters: 0
Security: 0
Stars: 8
Watchers: 0
Forks: 0
Open Issues: 0
pkg:composer/brynj-digital/laravel-scout-vectorize
Requires
- php: ^8.1
- guzzlehttp/guzzle: ^7.0
- illuminate/support: ^10.0|^11.0|^12.0
- laravel/scout: ^10.0|^11.0
Requires (Dev)
- orchestra/testbench: ^8.0|^9.0|^10.0
- phpunit/phpunit: ^10.0
This package is auto-updated.
Last update: 2025-12-05 12:27:02 UTC
README
A Laravel Scout driver for Cloudflare Vectorize, enabling semantic search using vector embeddings in your Laravel applications.
Features
- Semantic Search: Search by meaning, not just keywords
- Native Scout Integration: Works seamlessly with Laravel Scout
- Cloudflare Workers AI: Automatic embedding generation using Cloudflare's AI models
- Easy Setup: Simple configuration and migration from other Scout drivers
- Batch Operations: Efficient bulk indexing and deletion
- Multiple Models: Support for searching across different Eloquent models
Requirements
- PHP 8.1 or higher
- Laravel 10.x, 11.x, or 12.x
- Laravel Scout 10.x or 11.x
- A Cloudflare account with Vectorize enabled
- Cloudflare API token with Vectorize permissions
Installation
Install the package via Composer:
composer require brynj-digital/laravel-scout-vectorize
Publish the configuration file:
php artisan vendor:publish --tag=scout-vectorize-config
Configuration
1. Create a Vectorize Index
Use the provided artisan command to create a Vectorize index:
# Recommended: Using artisan command php artisan vectorize:create-index my-index # Or with custom dimensions and metric php artisan vectorize:create-index my-index --dimensions=1024 --metric=euclidean --embedding-model=@cf/baai/bge-large-en-v1.5
Alternative: Using Wrangler CLI
npx wrangler vectorize create my-index --dimensions=768 --metric=cosine
The dimensions must match your chosen embedding model:
@cf/baai/bge-small-en-v1.5: 384 dimensions@cf/baai/bge-base-en-v1.5: 768 dimensions (default)@cf/baai/bge-large-en-v1.5: 1024 dimensions
2. Create Metadata Indexes
Create metadata indexes to enable efficient filtering using the artisan commands:
# Required: Create metadata index for model filtering
php artisan vectorize:create-metadata-index model string --index-name=my-index
Note: Recent versions of this package no longer require a key metadata index, as model keys are now extracted directly from the vector ID format. This provides cleaner metadata and reduced storage requirements.
Optional: Additional Metadata Indexes for where() Clauses
You can create additional metadata indexes for any custom fields you want to filter on using Scout's where() method:
# Example: Create index for filtering by status php artisan vectorize:create-metadata-index status string --index-name=my-index # Example: Create index for filtering by category_id php artisan vectorize:create-metadata-index category_id number --index-name=my-index # Example: Create index for boolean fields php artisan vectorize:create-metadata-index in_stock boolean --index-name=my-index
Alternative: Using Wrangler CLI
# Required: Create metadata index for model filtering npx wrangler vectorize create-metadata-index my-index --property-name=model --type=string # Optional: Additional metadata indexes npx wrangler vectorize create-metadata-index my-index --property-name=status --type=string npx wrangler vectorize create-metadata-index my-index --property-name=category_id --type=number npx wrangler vectorize create-metadata-index my-index --property-name=in_stock --type=boolean
Managing Metadata Indexes
Use the provided commands to manage your metadata indexes:
# List all metadata indexes for an index php artisan vectorize:list-metadata-indexes --index-name=my-index # Delete a metadata index php artisan vectorize:delete-metadata-index status --index-name=my-index
To use these filters, include the fields in your model's toSearchableArray():
public function toSearchableArray(): array { return [ // These fields are used for both embeddings AND metadata 'name' => $this->name, 'description' => $this->description, // These fields are stored as metadata for filtering // (included in embeddings but primarily for where() clauses) 'status' => $this->status, 'category_id' => $this->category_id, 'in_stock' => $this->in_stock, ]; }
Then use where() in your searches:
Product::search('laptop') ->where('status', 'active') ->where('in_stock', true) ->get();
How it works: All fields from toSearchableArray() are:
- Converted to text and used to generate the embedding vector for semantic search
- Stored as metadata for filtering with
where()clauses
This means you can search semantically while also applying exact-match filters.
3. Create API Token
You'll need a Cloudflare API token with Vectorize permissions to allow Laravel to interact with your Vectorize index.
Create the token in Cloudflare Dashboard:
- Log in to your Cloudflare Dashboard
- Navigate to My Profile (click your user icon in the top right)
- Select API Tokens from the left sidebar
- Click Create Token
- Choose Create Custom Token
- Configure your token:
- Token name: Give it a descriptive name (e.g., "Laravel Scout Vectorize")
- Permissions: Add the following two permissions:
- Account → Vectorize → Read
- Account → Vectorize → Write
- Account Resources: Select your specific account (or "All accounts" if needed)
- TTL: Set an expiration date or leave as default
- Click Continue to summary
- Review the permissions and click Create Token
- Important: Copy the token immediately - it will only be shown once
- Store the token securely (you'll add it to your
.envfile in the next step)
Token Permissions Summary
Your token must have these permissions:
- ✅ Vectorize Read - Allows reading from your Vectorize indexes
- ✅ Vectorize Write - Allows creating, updating, and deleting vectors
Security Note: Avoid using tokens with broader permissions (like "Account Settings: Read" or "Workers: Edit") unless absolutely necessary.
4. Environment Variables
Add the following to your .env file:
SCOUT_DRIVER=vectorize CLOUDFLARE_ACCOUNT_ID=your_account_id CLOUDFLARE_API_TOKEN=your_api_token CLOUDFLARE_VECTORIZE_INDEX=my-index CLOUDFLARE_EMBEDDING_MODEL=@cf/baai/bge-base-en-v1.5
5. Scout Configuration
Ensure Scout is configured in config/scout.php:
'driver' => env('SCOUT_DRIVER', 'vectorize'),
Usage
Basic Model Setup
Add the Searchable trait to your model:
use Laravel\Scout\Searchable; class Product extends Model { use Searchable; /** * Get the indexable data array for the model. */ public function toSearchableArray(): array { return [ 'name' => $this->name, 'description' => $this->description, 'brand' => $this->brand, 'category' => $this->category, ]; } }
Custom Text Conversion (Optional)
For more control over how your model is converted to searchable text, implement a toSearchableText() method:
class Product extends Model { use Searchable; /** * Convert the model to searchable text. * This method takes precedence over toSearchableArray(). */ public function toSearchableText(): string { return implode('. ', [ $this->name, $this->brand, $this->description, implode(' ', $this->tags ?? []), ]); } public function toSearchableArray(): array { return [ 'name' => $this->name, 'description' => $this->description, ]; } }
Searching
// Simple search $products = Product::search('wireless headphones')->get(); // Limit results $products = Product::search('laptop')->take(20)->get(); // Paginate results $products = Product::search('smartphone')->paginate(15); // Get raw search results with scores $results = Product::search('tablet')->raw();
Indexing
// Index a single model $product = Product::find(1); $product->searchable(); // Index all models Product::makeAllSearchable(); // Using artisan command php artisan scout:import "App\Models\Product"
Removing from Index
// Remove a single model $product->unsearchable(); // Remove all models of a type Product::removeAllFromSearch(); // Using artisan command php artisan scout:flush "App\Models\Product"
Model Observers
Scout automatically syncs your models when you create, update, or delete them:
// Automatically indexed $product = Product::create([ 'name' => 'Wireless Headphones', 'description' => 'High-quality Bluetooth headphones', ]); // Automatically re-indexed $product->update(['name' => 'Premium Wireless Headphones']); // Automatically removed from index $product->delete();
Practical Examples
E-commerce Product Search
use Laravel\Scout\Searchable; class Product extends Model { use Searchable; public function toSearchableArray(): array { return [ 'name' => $this->name, 'brand' => $this->brand, 'description' => $this->description, 'category' => $this->category->name, 'features' => implode(', ', $this->features ?? []), // Metadata for filtering 'status' => $this->status, 'price' => $this->price, 'in_stock' => $this->in_stock, ]; } } // Search with semantic understanding $results = Product::search('laptop for programming and gaming') ->where('in_stock', true) ->where('status', 'published') ->take(20) ->get();
Blog Article Search
class Article extends Model { use Searchable; public function toSearchableArray(): array { return [ 'title' => $this->title, 'excerpt' => $this->excerpt, 'content' => strip_tags($this->content), 'author' => $this->author->name, 'tags' => $this->tags->pluck('name')->join(', '), // Metadata 'category_id' => $this->category_id, 'published_at' => $this->published_at, 'status' => $this->status, ]; } public function toSearchableText(): string { // Custom text format for better embeddings return sprintf( '%s. %s. Written by %s. Tags: %s', $this->title, $this->excerpt, $this->author->name, $this->tags->pluck('name')->join(', ') ); } } // Find related articles $related = Article::search('introduction to machine learning') ->where('status', 'published') ->where('category_id', $article->category_id) ->take(5) ->get();
Documentation Search
class Documentation extends Model { use Searchable; public function toSearchableArray(): array { return [ 'title' => $this->title, 'content' => strip_tags($this->content), 'section' => $this->section, 'version' => $this->version, ]; } } // Semantic search in docs $docs = Documentation::search('how to handle file uploads') ->where('version', config('app.docs_version')) ->get();
Customer Support Ticket Search
class SupportTicket extends Model { use Searchable; public function toSearchableArray(): array { return [ 'subject' => $this->subject, 'description' => $this->description, 'customer_name' => $this->customer->name, 'category' => $this->category, // Metadata 'status' => $this->status, 'priority' => $this->priority, ]; } } // Find similar support tickets $similar = SupportTicket::search($newTicket->description) ->where('status', 'resolved') ->take(10) ->get();
Advanced Usage
Custom Search Callbacks
For advanced search requirements, use a callback:
$results = Product::search('laptop', function ($client, $query, $options) { // $client is the VectorizeClient instance return $client->search($query, 50, [ 'model' => Product::class, 'in_stock' => true, ]); })->get();
Using Where Clauses for Filtering
You can combine semantic search with metadata filtering:
// Search with filters $products = Product::search('gaming laptop') ->where('status', 'published') ->where('price', '< 2000') ->get(); // Multiple filters $articles = Article::search('machine learning') ->where('category', 'technology') ->where('published_at', '>', now()->subDays(30)) ->get();
Note: Filters are applied to metadata stored in Vectorize. Make sure the fields you filter on are:
- Included in your model's
toSearchableArray() - Have corresponding metadata indexes created in Vectorize (see Configuration section)
Querying the Client Directly
use ScoutVectorize\VectorizeClient; $client = app(VectorizeClient::class); // Get index information $info = $client->getIndexInfo(); // Manual search with filters $results = $client->search( query: 'wireless headphones', topK: 10, filter: ['status' => 'active'] ); // Generate embedding for text $embedding = $client->generateEmbedding('sample text'); // Batch upsert documents $client->batchUpsert([ [ 'id' => 'doc_1', 'text' => 'Document content', 'metadata' => ['category' => 'tech'], ], // ... more documents ]); // Delete vectors by IDs $client->deleteVectors(['doc_1', 'doc_2']);
Queueing Scout Operations
For better performance in production, queue your Scout operations:
// In config/scout.php 'queue' => true, // Specify queue connection and queue name 'queue' => [ 'connection' => env('SCOUT_QUEUE_CONNECTION', 'redis'), 'queue' => env('SCOUT_QUEUE_NAME', 'default'), ],
This will queue all indexing operations, preventing API rate limits and improving response times.
Available Commands
This package provides custom commands for managing Vectorize indexes and metadata indexes, plus the standard Laravel Scout commands:
Vectorize Index Management
# Create a new Vectorize index php artisan vectorize:create-index # Create index with custom dimensions and metric php artisan vectorize:create-index my-index --dimensions=1024 --metric=euclidean --embedding-model=@cf/baai/bge-large-en-v1.5 # Drop (delete) a Vectorize index php artisan vectorize:drop-index my-index # Force drop without confirmation (use with caution) php artisan vectorize:drop-index my-index --force
Options for vectorize:create-index:
name(optional): Index name (uses config value if not provided)--dimensions: Vector dimensions (default: 768)--metric: Distance metric - cosine, euclidean, or dotproduct (default: cosine)--embedding-model: Cloudflare embedding model (default: @cf/baai/bge-base-en-v1.5)
Options for vectorize:drop-index:
name(optional): Index name (uses config value if not provided)--force: Skip confirmation prompts
Metadata Index Management
# Create a metadata index for filtering php artisan vectorize:create-metadata-index property-name type --index-name=my-index # List all metadata indexes php artisan vectorize:list-metadata-indexes --index-name=my-index # Delete a metadata index php artisan vectorize:delete-metadata-index property-name --index-name=my-index # Force delete without confirmation php artisan vectorize:delete-metadata-index property-name --index-name=my-index --force
Arguments for vectorize:create-metadata-index:
property-name: The metadata property to indextype: Property type (string, number, boolean)
Arguments for vectorize:delete-metadata-index:
property-name: The metadata property to delete
Options for metadata index commands:
--index-name: Vectorize index name (uses config value if not provided)--force: Skip confirmation prompts (delete command only)
Standard Scout Commands
# Import all records of a model php artisan scout:import "App\Models\Product" # Flush all vectors for a specific model php artisan scout:flush "App\Models\Product"
How It Works
-
Indexing: When a model is indexed, the driver:
- Calls
toSearchableText()or flattenstoSearchableArray()to text - Generates an embedding using Cloudflare Workers AI
- Stores the vector in Cloudflare Vectorize with metadata
- Calls
-
Searching: When you search:
- Your query text is converted to an embedding
- Vectorize finds the most similar vectors
- Results are mapped back to your Eloquent models
- Models are fetched from your database and returned
-
Vector IDs: The driver prefixes vector IDs with the model class name to support multiple model types in one index (e.g.,
App_Models_Product_123)
Limitations
- No traditional filters: Vector search doesn't support WHERE clauses like traditional search engines. Apply filters in PHP after retrieval or use metadata filtering (which may not work reliably in all cases)
- No offset-based pagination: Vector search returns top-K results. Use cursor-based pagination or retrieve more results upfront
- Metadata filtering: Cloudflare Vectorize metadata filtering may not be reliable for all use cases. Consider filtering in your application layer
- Eventual consistency: There may be a slight delay between indexing/deletion and seeing changes in search results
Configuration Reference
// config/scout-vectorize.php return [ 'cloudflare' => [ 'account_id' => env('CLOUDFLARE_ACCOUNT_ID'), 'api_token' => env('CLOUDFLARE_API_TOKEN'), ], 'index' => env('CLOUDFLARE_VECTORIZE_INDEX', 'default'), 'embedding_model' => env('CLOUDFLARE_EMBEDDING_MODEL', '@cf/baai/bge-base-en-v1.5'), ];
Troubleshooting
Search returns no results
- Ensure your models are indexed: Run
php artisan scout:import "App\Models\Product" - Check your Vectorize index has vectors: Use the Cloudflare dashboard or API to verify
- Verify your API credentials: Double-check
CLOUDFLARE_ACCOUNT_IDandCLOUDFLARE_API_TOKENin your.env - Check model filters: The driver automatically filters by model class. Ensure you're searching the right model
Indexing is slow
- API overhead: Vector embedding generation requires API calls to Cloudflare Workers AI
- Use batch operations: Use
makeAllSearchable()for bulk indexing (more efficient than individual saves) - Enable queuing: Set
'queue' => trueinconfig/scout.phpto process indexing in the background - Rate limits: Cloudflare has rate limits on API calls. Implement throttling or use queues
Errors about dimensions
- Dimension mismatch: Ensure your Vectorize index dimensions match your embedding model
@cf/baai/bge-small-en-v1.5: 384 dimensions@cf/baai/bge-base-en-v1.5: 768 dimensions (default)@cf/baai/bge-large-en-v1.5: 1024 dimensions
- Recreate index: If you changed embedding models, you'll need to create a new index with the correct dimensions
Authentication errors
- Invalid API token: Verify your
CLOUDFLARE_API_TOKENhas Vectorize permissions - Incorrect account ID: Double-check your
CLOUDFLARE_ACCOUNT_ID - Token permissions: Ensure your API token has
Vectorizeread and write permissions
Metadata filtering not working
- Create metadata indexes: Metadata filters require indexes. Run:
npx wrangler vectorize create-metadata-index my-index --property-name=your_field --type=string
- Check field types: Ensure the metadata index type matches your data (string, number, boolean)
- Include in searchable array: The field must be in your model's
toSearchableArray()
Performance optimization
- Limit result size: Use
take()orpaginate()to limit results - Cache frequent queries: Cache search results for common queries
- Use metadata filters wisely: Filters can reduce the search space and improve performance
- Optimize text conversion: Keep
toSearchableText()concise to reduce embedding generation time
Architecture
Package Structure
src/
├── Engines/
│ └── VectorizeEngine.php # Scout engine implementation
├── VectorizeClient.php # Cloudflare API client
└── VectorizeServiceProvider.php # Service provider
tests/
├── TestCase.php # Base test case
└── VectorizeEngineTest.php # Engine tests
How Embeddings Work
This package uses Cloudflare Workers AI to generate embeddings:
- Text Preparation: Your model data is converted to text using
toSearchableText()or by flatteningtoSearchableArray() - Embedding Generation: The text is sent to Cloudflare Workers AI which returns a vector (array of floats)
- Vector Storage: The vector is stored in Vectorize along with metadata (model class and searchable data)
- Semantic Search: When you search, your query is also converted to a vector and compared against stored vectors using cosine similarity
Supported Embedding Models
| Model | Dimensions | Best For |
|---|---|---|
@cf/baai/bge-small-en-v1.5 |
384 | Faster processing, lower memory |
@cf/baai/bge-base-en-v1.5 |
768 | Balanced (default) |
@cf/baai/bge-large-en-v1.5 |
1024 | Higher accuracy, slower |
Vector ID Format
Vectors are stored with IDs in the format: {ModelClass}_{ModelKey}
Example: App_Models_Product_123
This allows multiple model types to coexist in the same Vectorize index.
Testing
The package includes comprehensive tests covering all engine functionality:
# Run all tests composer test # Run with coverage vendor/bin/phpunit --coverage-html coverage # Run specific test vendor/bin/phpunit tests/VectorizeEngineTest.php
Test Coverage
The test suite includes 23+ tests covering:
- Update operations: Empty collections, valid models, custom text conversion, array values
- Delete operations: Empty collections, model deletion
- Search operations: Default limits, custom limits, filters, callbacks, pagination
- Result mapping: ID extraction, model mapping, ordering
- Flush operations: Batch deletion, different embedding models
- Index operations: Create/delete (no-op for Vectorize)
Running Tests
Tests use Orchestra Testbench to simulate a Laravel environment and Mockery to mock the VectorizeClient, ensuring tests run without making actual API calls.
# Install dependencies composer install # Run tests ./vendor/bin/phpunit # Run tests with detailed output ./vendor/bin/phpunit --testdox
Best Practices
Optimizing Search Quality
-
Use descriptive text: Include context in your searchable content
public function toSearchableText(): string { // Good: Includes context return "Product: {$this->name}. Brand: {$this->brand}. {$this->description}"; // Not ideal: Just raw values return "{$this->name} {$this->brand} {$this->description}"; }
-
Avoid overly long text: Embeddings work best with focused, relevant content
public function toSearchableArray(): array { return [ 'title' => $this->title, 'excerpt' => Str::limit($this->content, 500), // Limit long content 'category' => $this->category->name, ]; }
-
Include relevant metadata: Add fields you'll filter on
public function toSearchableArray(): array { return [ 'content' => $this->content, // Always include filterable fields 'status' => $this->status, 'created_at' => $this->created_at, 'author_id' => $this->author_id, ]; }
Performance Tips
-
Enable queueing for production: Prevent blocking requests
// config/scout.php 'queue' => env('SCOUT_QUEUE', true),
-
Use batch operations: Import in bulk rather than one-by-one
# Efficient php artisan scout:import "App\Models\Product" # Less efficient Product::all()->each->searchable();
-
Limit search results: Only fetch what you need
// Good: Limited results Product::search('laptop')->take(20)->get(); // Avoid: Fetching everything Product::search('laptop')->get();
-
Cache frequent queries: Use Laravel's cache for popular searches
$results = Cache::remember( "search:{$query}", now()->addMinutes(10), fn() => Product::search($query)->take(20)->get() );
Security Considerations
-
Sanitize user input: Always validate and sanitize search queries
$query = request()->validate(['q' => 'required|string|max:255'])['q']; $results = Product::search($query)->get();
-
Protect API credentials: Never commit API tokens to version control
# .env (not in version control) CLOUDFLARE_API_TOKEN=your_secret_token
-
Use scopes for access control: Filter by user permissions
$results = Article::search('security') ->where('visibility', 'public') ->orWhere('author_id', auth()->id()) ->get();
Comparison with Other Search Solutions
| Feature | Vectorize (this package) | Algolia | Meilisearch | Elasticsearch |
|---|---|---|---|---|
| Semantic Search | ✅ Built-in | ❌ Keyword only | ⚠️ Limited | ⚠️ Via plugins |
| Setup Complexity | ⭐⭐ Easy | ⭐ Very Easy | ⭐⭐ Easy | ⭐⭐⭐⭐ Complex |
| Cost | 💰 Cloudflare pricing | 💰💰💰 Premium | 💰 Free/Cheap | 💰💰 Moderate |
| Latency | Fast (edge network) | Very Fast | Fast | Moderate |
| Filtering | ⚠️ Basic metadata | ✅ Advanced | ✅ Good | ✅ Advanced |
| Typo Tolerance | ❌ No | ✅ Yes | ✅ Yes | ✅ Yes |
| Relevance by Keywords | ❌ No | ✅ Excellent | ✅ Good | ✅ Excellent |
| Relevance by Meaning | ✅ Excellent | ❌ No | ⚠️ Limited | ⚠️ Via plugins |
| Infrastructure | Serverless | Managed | Self-host/Managed | Self-host/Managed |
When to Use Vectorize
Good fit:
- Semantic/conceptual search (finding by meaning, not keywords)
- Multi-language search (embeddings understand concepts across languages)
- Finding similar content or recommendations
- Applications already using Cloudflare
- Budget-conscious projects needing semantic search
Not ideal for:
- Exact keyword matching
- Complex filtering and faceting requirements
- Typo-tolerant search
- Traditional full-text search
- Applications requiring instant consistency
FAQ
Q: Can I use multiple models in the same index? A: Yes! The driver automatically namespaces vectors by model class, so multiple models can coexist in one index.
Q: How accurate is semantic search compared to keyword search? A: Semantic search excels at understanding intent and meaning, but may miss exact keyword matches. Consider your use case.
Q: Can I migrate from Algolia/Meilisearch to Vectorize? A: Yes, but be aware that Vectorize uses semantic search, which behaves differently from keyword-based search engines.
Q: What happens if I change the embedding model? A: You'll need to create a new index with the correct dimensions and re-index all your data.
Q: Is there a limit on the number of vectors? A: Check Cloudflare's Vectorize pricing and limits for your account tier.
Q: Can I use this with multilingual content? A: Yes! The BGE embedding models support multiple languages and can find semantically similar content across languages.
Contributing
Contributions are welcome! Please submit pull requests or open issues on GitHub.
Development Setup
# Clone the repository git clone https://github.com/brynj-digital/laravel-scout-vectorize.git cd laravel-scout-vectorize # Install dependencies composer install # Run tests composer test # Run code style checks composer format
License
This package is open-source software licensed under the MIT license.
Credits
- Built for use with Cloudflare Vectorize
- Integrates with Laravel Scout
Support
For issues, questions, or contributions, please visit the GitHub repository.