condoedge / ai
AI-powered text-to-query system with RAG, Neo4j, and Qdrant
Installs: 1
Dependents: 0
Suggesters: 0
Security: 0
Stars: 0
Watchers: 0
Forks: 0
Open Issues: 0
pkg:composer/condoedge/ai
Requires
- php: ^8.1
- ext-curl: *
- ext-json: *
- binarytorch/larecipe: ^2.8
- condoedge/utils: ^0.2.100
- phpoffice/phpword: ^1.4
- smalot/pdfparser: ^2.12
Requires (Dev)
- fakerphp/faker: ^1.23
- mockery/mockery: ^1.6
- orchestra/testbench: ^9.15
- phpunit/phpunit: ^10.0
This package is not auto-updated.
Last update: 2025-11-22 13:59:14 UTC
README
A Laravel package providing intelligent text-to-Cypher query generation with RAG (Retrieval-Augmented Generation), dual-storage coordination (Neo4j + Qdrant), and auto-discovery from Eloquent models.
Table of Contents
- Overview
- Architecture
- Key Features
- Security Architecture
- Quick Start
- Artisan Commands
- Project Structure
- Testing
- Documentation
- Development
- Technical Decisions
Overview
What This Package Does
The AI package transforms natural language questions into executable Neo4j Cypher queries using RAG-powered LLMs. It automatically discovers entity configurations from your existing Eloquent models, eliminating manual setup while maintaining dual-storage synchronization between Neo4j (graph relationships) and Qdrant (vector embeddings).
Core Value Proposition:
- Zero configuration for most use cases (convention over configuration)
- Automatic discovery from Eloquent models - no duplication
- Dual-storage coordination with consistency guarantees
- Production-ready security (injection protection, retry logic, circuit breakers)
- RAG-powered intelligent query generation
Key Technologies:
- Neo4j: Graph database for relationship storage and pattern matching
- Qdrant: Vector database for semantic similarity search
- Laravel: PHP framework integration with Eloquent ORM
- OpenAI/Anthropic: LLM providers for query generation and embeddings
Example
// 1. Make your model Nodeable (zero config needed) class Customer extends Model implements Nodeable { use HasNodeableConfig; protected $fillable = ['name', 'email', 'status']; public function scopeActive($query) { return $query->where('status', 'active'); } } // 2. Data auto-syncs on create/update/delete $customer = Customer::create(['name' => 'John Doe', 'email' => 'john@example.com']); // Automatically stored in Neo4j + Qdrant // 3. Ask questions in natural language $response = AI::chat("How many active customers do we have?"); // Generated Cypher: MATCH (n:Customer) WHERE n.status = 'active' RETURN count(n) // Response: "You have 1,250 active customers in the system."
Architecture
High-Level Overview
┌─────────────────────────────────────────────────────────────────────┐
│ User Question │
│ "Show active customers in USA" │
└────────────────────────────┬────────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────────────────┐
│ ChatOrchestrator │
│ ┌─────────────────────────────────────────────────────────┐ │
│ │ 1. Context Retrieval (RAG) │ │
│ │ - Vector search for similar past queries │ │
│ │ - Fetch graph schema from Neo4j │ │
│ │ - Retrieve example entities │ │
│ └─────────────────────────────────────────────────────────┘ │
│ ┌─────────────────────────────────────────────────────────┐ │
│ │ 2. Query Generation │ │
│ │ - Build LLM prompt with context │ │
│ │ - Generate Cypher query │ │
│ │ - Validate (injection check, complexity, safety) │ │
│ └─────────────────────────────────────────────────────────┘ │
│ ┌─────────────────────────────────────────────────────────┐ │
│ │ 3. Query Execution │ │
│ │ - Execute against Neo4j with timeout │ │
│ │ - Format results │ │
│ └─────────────────────────────────────────────────────────┘ │
│ ┌─────────────────────────────────────────────────────────┐ │
│ │ 4. Response Generation │ │
│ │ - Transform to natural language │ │
│ │ - Extract insights (trends, outliers, patterns) │ │
│ │ - Suggest visualizations │ │
│ └─────────────────────────────────────────────────────────┘ │
└────────────────────────────┬────────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────────────────┐
│ Natural Language Answer + Insights + Suggestions │
└─────────────────────────────────────────────────────────────────────┘
┌─────────────────────────────────────────────────────────────────────┐
│ Data Ingestion Flow │
└─────────────────────────────────────────────────────────────────────┘
Eloquent Model Event (create/update/delete)
│
▼
HasNodeableConfig Trait
(Auto-sync listener)
│
▼
EntityAutoDiscovery
- Reflect on model
- Extract properties from $fillable, $casts
- Discover relationships (belongsTo)
- Detect text fields for embedding
- Convert scopes to Cypher patterns
│
▼
DataIngestionService
┌──────────────────────────────────┐
│ Compensating Transaction Pattern │
└──────────────────────────────────┘
│
├─────────────────────┬──────────────────────┐
▼ ▼ ▼
Generate Embedding Neo4j Store Qdrant Store
(OpenAI/Anthropic) (Graph + Relations) (Vector + Metadata)
│ │ │
└─────────────────────┴──────────────────────┘
│
┌───────────┴──────────┐
│ Success? │
│ - Both stores OK │
│ - Rollback on fail │
└──────────────────────┘
Core Components
1. Auto-Discovery System
- EntityAutoDiscovery: Introspects Eloquent models using PHP Reflection
- CypherScopeAdapter: Converts Eloquent scopes to Cypher patterns
- SchemaInspector: Extracts database schema hints
- ConfigCache: Caches expensive discovery operations
2. Dual-Storage Coordination
- DataIngestionService: Orchestrates writes to both stores with compensating transactions
- Neo4j (GraphStore): Node storage, relationships, pattern matching
- Qdrant (VectorStore): Vector embeddings, semantic search, metadata filtering
- Auto-Sync: Automatic synchronization via Laravel model events
3. RAG System
- ContextRetriever: Fetches similar queries + schema + examples
- PatternLibrary: Pre-defined query patterns for common questions
- QueryGenerator: LLM-powered Cypher generation with validation
- QueryExecutor: Safe query execution with timeouts and limits
- ResponseGenerator: Natural language explanations with insights
4. Security Layer
- CypherSanitizer: Injection prevention for labels, types, property keys
- RetryPolicy: Exponential backoff with jitter
- CircuitBreaker: Fail-fast pattern for cascading failure prevention
- SensitiveDataSanitizer: API key and credential redaction in logs
Key Features
Auto-Discovery from Eloquent Models
Eliminates duplication by extracting entity configuration directly from your models:
- Properties: Auto-detected from
$fillable,$casts,$dates - Relationships: Auto-discovered from
belongsTo()methods - Scopes: Auto-converted from
scopeX()methods to Cypher patterns - Embed Fields: Text fields automatically identified for vector embeddings
- Aliases: Generated from table names for semantic matching
Three-tier fallback: Explicit config > Legacy config file > Auto-discovery
Dual-Storage Coordination
Synchronized writes to Neo4j (graph) and Qdrant (vector) with consistency guarantees:
- Compensating Transactions: Automatic rollback on partial failure
- No Orphaned Data: Either both stores succeed or both roll back
- Independent Resilience: One store failing doesn't break the other
- Batch Operations: Efficient bulk ingestion
RAG-Powered Query Generation
Intelligent Cypher generation using retrieval-augmented generation:
- Context-Aware: Similar past queries inform new query generation
- Schema-Aware: Graph structure guides query construction
- Example-Based: Sample data provides reference patterns
- Pattern Matching: Pre-defined templates for common queries
- Validation: Syntax, safety, and complexity checks
Auto-Sync Capabilities
Zero-boilerplate synchronization via Laravel events:
- Event-Driven: Automatic sync on create, update, delete
- Async Support: Optional queue processing
- Configurable: Per-model, per-operation granularity
- Error Handling: Silent failure with logging or exception throwing
Security Architecture
The package implements defense-in-depth security with multiple layers of protection. All security features are enabled by default with no configuration required.
1. Injection Protection
Cypher Injection Prevention (CypherSanitizer):
- Validates all labels, relationship types, and property keys against strict patterns
- Regex:
[a-zA-Z_][a-zA-Z0-9_]*(alphanumeric + underscore, must start with letter) - Blocks reserved Cypher keywords (MATCH, DELETE, DROP, CREATE, etc.)
- Maximum length validation (255 characters)
- Backtick escaping as additional defense layer
SQL Injection Prevention (SchemaInspector):
- Table/index name validation in SQLite PRAGMA queries
- Prevents malicious identifiers in auto-discovery schema introspection
- Parameter binding for MySQL/PostgreSQL
Example Protection:
// Automatic injection protection CypherSanitizer::validateLabel("User}); DELETE (n) //"); // Throws: CypherInjectionException CypherSanitizer::validateLabel("User_Profile"); // Returns: "User_Profile" ✓
2. Data Consistency Guarantees
Compensating Transactions (DataIngestionService):
- Two-phase commit pattern for dual-store operations
- Automatic rollback on vector store failure
- Automatic restoration on deletion failure
- Critical error logging when compensation fails
Transaction Flow:
1. Write to Neo4j → Success
2. Write to Qdrant → Failure
3. Rollback Neo4j → Success
4. Throw DataConsistencyException
3. Resilience & Fault Tolerance
Circuit Breaker Pattern (CircuitBreaker):
- States: CLOSED → OPEN → HALF_OPEN → CLOSED
- Prevents cascading failures
- Configurable failure threshold (default: 5 failures)
- Configurable recovery timeout (default: 30 seconds)
- Fail-fast when circuit open
Retry Policy (RetryPolicy):
- Exponential backoff with jitter
- Prevents thundering herd problem
- Configurable max attempts (default: 3-5 depending on operation)
- Separate policies for API calls, database operations, network requests
Example:
// Automatic retry and circuit breaking $neo4j = new Neo4jStore(); // Includes retry + circuit breaker $result = $neo4j->createNode('User', $properties); // Retries on transient failures, fails fast if circuit open
4. Sensitive Data Protection
Log Sanitization (SensitiveDataSanitizer):
- Automatic redaction of API keys, passwords, tokens, secrets
- Pattern detection for multiple formats:
- OpenAI keys:
sk-... - Anthropic keys:
sk-ant-... - AWS credentials
- Bearer tokens
- Database passwords
- OpenAI keys:
- Stack trace sanitization
- Absolute path removal
Protected Patterns:
// Automatic sanitization Log::error('API failed', SensitiveDataSanitizer::forLogging([ 'api_key' => 'sk-abc123...', // Logged as: ***REDACTED*** 'error' => $exception->getMessage(), ]));
5. Recursion & Resource Protection
Auto-Discovery Guards:
- Maximum stack depth: 5 levels
- Circular reference detection
- Automatic cycle breaking
- Deep merge protection: 10 level limit
Resource Limits:
- Query timeout: 30 seconds (configurable)
- Result limit: 100 rows (configurable)
- Max query complexity scoring
- Identifier length limits
Security Testing
The package includes comprehensive security test coverage:
- Injection Testing: Adversarial inputs, malicious patterns, edge cases
- Data Consistency Testing: Partial failure scenarios, rollback verification
- Resilience Testing: Retry logic, circuit breaker state transitions, timeout handling
- Sanitization Testing: API key patterns, stack traces, nested objects
All security tests are passing. See tests/Unit/StressTests/ for details.
Quick Start
Prerequisites
- PHP 8.1+
- Laravel 9.x+
- Neo4j 4.4+
- Qdrant 1.0+
- OpenAI or Anthropic API key
Installation
# Install package composer require condoedge/ai # Publish config (optional) php artisan vendor:publish --tag=ai-config
Configuration
Add to .env:
# Neo4j NEO4J_HOST=http://localhost:7474 NEO4J_USER=neo4j NEO4J_PASSWORD=your-password # Qdrant QDRANT_HOST=http://localhost:6333 # LLM Provider (OpenAI or Anthropic) OPENAI_API_KEY=sk-your-key AI_LLM_PROVIDER=openai AI_EMBEDDING_PROVIDER=openai
Basic Usage
// 1. Make models Nodeable use Condoedge\Ai\Domain\Contracts\Nodeable; use Condoedge\Ai\Domain\Traits\HasNodeableConfig; class Customer extends Model implements Nodeable { use HasNodeableConfig; protected $fillable = ['name', 'email', 'status']; } // 2. Discover and generate config php artisan ai:discover // This generates config/entities.php with discovered configuration: // - Neo4j label, properties, relationships // - Qdrant collection, embed fields // - Aliases for natural language queries // Review and customize config/entities.php as needed // 3. Bulk ingest existing data (one-time setup) php artisan ai:ingest // This ingests all existing entities into Neo4j + Qdrant // - Processes in batches for efficiency // - Shows progress bar // - Reports success/failure counts // 4. New data auto-syncs (after initial ingest) public function orders() { return $this->hasMany(Order::class); } public function scopeActive($query) { return $query->where('status', 'active'); } } // 2. Data auto-syncs $customer = Customer::create([ 'name' => 'John Doe', 'email' => 'john@example.com', 'status' => 'active' ]); // Automatically stored in Neo4j + Qdrant // 3. Ask questions use Condoedge\Ai\Facades\AI; $response = AI::chat("How many active customers do we have?"); // Generates Cypher, executes, returns natural language answer
Manual Configuration Override
use Condoedge\Ai\Domain\ValueObjects\NodeableConfig; class Customer extends Model implements Nodeable { use HasNodeableConfig; public function nodeableConfig(): NodeableConfig { return NodeableConfig::discover($this) ->embedFields(['name', 'bio']) // Override embed fields ->addAlias('client') // Add custom alias ->addRelationship('HAS_ORDER', 'Order', 'customer_id') ->disableVectorStore(); // Graph-only entity } }
Project Structure
ai/
├── config/
│ ├── ai.php # Main package configuration
│ └── ai-patterns.php # Query pattern definitions
├── docs/
│ ├── ARCHITECTURE.md # Detailed technical architecture
│ └── GETTING-STARTED.md # User guide and tutorials
├── examples/
│ └── *.php # Working examples and demos
├── src/
│ ├── Contracts/ # Service interfaces
│ │ ├── DataIngestionServiceInterface.php
│ │ ├── GraphStoreInterface.php
│ │ ├── VectorStoreInterface.php
│ │ ├── LlmProviderInterface.php
│ │ └── ...
│ ├── Domain/ # Domain models and value objects
│ │ ├── Contracts/
│ │ │ └── Nodeable.php # Entity interface
│ │ ├── Traits/
│ │ │ └── HasNodeableConfig.php # Auto-sync + discovery
│ │ └── ValueObjects/
│ │ ├── GraphConfig.php
│ │ ├── VectorConfig.php
│ │ └── NodeableConfig.php
│ ├── Services/ # Core services
│ │ ├── Discovery/
│ │ │ ├── EntityAutoDiscovery.php
│ │ │ ├── CypherScopeAdapter.php
│ │ │ ├── SchemaInspector.php
│ │ │ └── ...
│ │ ├── Resilience/
│ │ │ ├── RetryPolicy.php
│ │ │ └── CircuitBreaker.php
│ │ ├── Security/
│ │ │ └── SensitiveDataSanitizer.php
│ │ ├── DataIngestionService.php
│ │ ├── ContextRetriever.php
│ │ ├── QueryGenerator.php
│ │ ├── QueryExecutor.php
│ │ └── ResponseGenerator.php
│ ├── GraphStore/ # Neo4j implementation
│ │ ├── Neo4jStore.php
│ │ └── CypherSanitizer.php
│ ├── VectorStore/ # Qdrant implementation
│ │ └── QdrantStore.php
│ ├── LlmProviders/ # LLM integrations
│ │ ├── OpenAiLlmProvider.php
│ │ └── AnthropicLlmProvider.php
│ ├── EmbeddingProviders/
│ │ ├── OpenAiEmbeddingProvider.php
│ │ └── AnthropicEmbeddingProvider.php
│ ├── Jobs/ # Queue jobs
│ │ ├── IngestEntityJob.php
│ │ ├── SyncEntityJob.php
│ │ └── RemoveEntityJob.php
│ ├── Exceptions/ # Custom exceptions
│ │ ├── CypherInjectionException.php
│ │ ├── DataConsistencyException.php
│ │ ├── CircuitBreakerOpenException.php
│ │ └── ...
│ └── Facades/
│ └── AI.php # Main facade
├── tests/
│ ├── Unit/ # Unit tests
│ │ ├── Domain/
│ │ ├── Services/
│ │ └── StressTests/ # Security & resilience tests
│ ├── Integration/ # Integration tests
│ │ ├── EntityAutoDiscoveryTest.php
│ │ ├── DualStorageCoordinationTest.php
│ │ └── ...
│ └── Fixtures/ # Test models
└── composer.json
Key Files & Purposes
Core Services:
DataIngestionService.php: Dual-store coordination with compensating transactionsEntityAutoDiscovery.php: Model introspection and config extractionCypherScopeAdapter.php: Eloquent scope to Cypher conversionContextRetriever.php: RAG context fetching (similar queries + schema)QueryGenerator.php: LLM-powered Cypher generation with validationQueryExecutor.php: Safe query execution with timeoutsResponseGenerator.php: Natural language response generation
Security:
CypherSanitizer.php: Injection prevention for Cypher identifiersSensitiveDataSanitizer.php: API key/credential redaction in logsRetryPolicy.php: Exponential backoff retry logicCircuitBreaker.php: Circuit breaker pattern for resilience
Configuration:
HasNodeableConfig.php: Trait providing auto-sync and discoveryNodeableConfig.php: Fluent builder for entity configurationGraphConfig.php,VectorConfig.php: Store-specific configurations
Testing
Running Tests
# All tests composer test # Unit tests only composer test-unit # Integration tests only composer test-integration # With coverage report composer test-coverage
Test Organization
tests/
├── Unit/
│ ├── Domain/ # Domain model tests
│ ├── Services/
│ │ ├── Discovery/ # Auto-discovery tests
│ │ ├── Resilience/ # Retry + circuit breaker tests
│ │ └── Security/ # Sanitization tests
│ └── StressTests/
│ ├── AdversarialSecurityTest.php # Injection tests
│ ├── DualStorageFailureTest.php # Consistency tests
│ └── ...
├── Integration/
│ ├── EntityAutoDiscoveryTest.php # End-to-end discovery
│ ├── DualStorageCoordinationTest.php # Store coordination
│ └── RealBusinessScenarioTest.php # Business logic tests
└── Fixtures/
└── Test*.php # Test models
Test Coverage
- Unit tests: 150+ tests
- Integration tests: 20+ tests
- Security tests: 54 tests (all passing)
- 30 Cypher injection scenarios
- 5 data consistency scenarios
- 19 resilience scenarios (retry + circuit breaker)
Documentation
- Foundations Track (
resources/docs/1.0/foundations/): Requirements, installation, infrastructure, configuration, and troubleshooting (rendered at/ai-docs/foundations). - Usage & Extension Track (
resources/docs/1.0/usage/): Quick start, AI facade APIs, ingestion/context guides, Laravel integration, testing, and extension playbooks. - Internals & Architecture Track (
resources/docs/1.0/internals/): Architecture diagrams, component deep dives, data flows, storage reference, resilience/security details. - Examples: Working code samples in
examples/directory. - Tests: Test suite demonstrates usage patterns.
Artisan Commands
php artisan ai:discover
Auto-discover Nodeable entities and generate config/entities.php.
# Discover all Nodeable models php artisan ai:discover # Discover specific model php artisan ai:discover --model="App\Models\Customer" # Overwrite existing config php artisan ai:discover --force # Preview without writing php artisan ai:discover --dry-run
What it does:
- Scans
app/Modelsfor classes implementingNodeable - Analyzes models to discover:
- Neo4j label, properties, relationships
- Qdrant collection, embed fields
- Aliases for natural language queries
- Eloquent scopes for Cypher conversion
- Generates/merges
config/entities.php
When to run:
- Initial setup
- After adding new Nodeable models
- After changing model structure (new columns, relationships, scopes)
php artisan ai:ingest
Bulk ingest existing Nodeable entities into Neo4j and Qdrant.
# Ingest all entities php artisan ai:ingest # Ingest specific model php artisan ai:ingest --model="App\Models\Customer" # Clear stores before ingesting php artisan ai:ingest --fresh # Custom batch size (default: 100) php artisan ai:ingest --chunk=500 # Preview without ingesting php artisan ai:ingest --dry-run
What it does:
- Finds all Nodeable models
- Processes existing database records in batches
- Ingests into Neo4j (graph) + Qdrant (vectors)
- Shows progress bar and success/failure counts
When to run:
- Initial setup (when you have existing data)
- After migrating to the AI package
- To rebuild graph/vector stores from scratch (
--fresh)
Note: After initial ingestion, new/updated entities auto-sync via model events.
Development
Contributing
- Fork the repository
- Create a feature branch:
git checkout -b feature/amazing-feature - Make your changes
- Run tests:
composer test - Commit:
git commit -m "Add amazing feature" - Push:
git push origin feature/amazing-feature - Open a pull request
Code Standards
- PSR-12: PHP coding standard
- PHP 8.1+: Type hints, readonly properties, intersection types
- Interface-based: Depend on interfaces, not implementations
- Test Coverage: All new features must have tests
- Documentation: Update docs for API changes
Running Development Environment
# Install dependencies composer install # Run tests composer test # Generate coverage report composer test-coverage
Technical Decisions
Why Neo4j + Qdrant?
Neo4j (Graph Database):
- Native graph storage optimized for relationship traversal
- Cypher query language provides expressive pattern matching
- Efficient multi-hop queries for complex relationships
- ACID transactions for data consistency
Qdrant (Vector Database):
- High-performance vector similarity search
- Metadata filtering alongside vector search
- Scales to millions of vectors
- Easy deployment (Docker, cloud)
Why Dual-Storage?
- Neo4j excels at relationship queries ("who is connected to whom")
- Qdrant excels at semantic search ("find similar entities")
- Together: Powerful hybrid queries combining structure and semantics
Why Compensating Transactions vs 2PC?
Two-Phase Commit (2PC) requires:
- Coordinator service
- Prepare phase locks
- Complex failure recovery
- Increased latency
Compensating Transactions provide:
- Simpler implementation (rollback on failure)
- No distributed coordinator needed
- Lower latency (no prepare phase)
- Adequate consistency for this use case
Trade-off: Brief window of inconsistency on failure (acceptable for AI indexing, not financial transactions)
Why Interface-Based Design?
Benefits:
- Testability: Easy to mock dependencies in unit tests
- Flexibility: Swap implementations (e.g., switch from OpenAI to Anthropic)
- Loose Coupling: Components depend on contracts, not concrete classes
- Laravel Integration: Works naturally with service container binding
Example:
// Bind interface to implementation $this->app->singleton(GraphStoreInterface::class, Neo4jStore::class); // Swap to different implementation $this->app->singleton(GraphStoreInterface::class, ArangoDbStore::class);
Why Auto-Discovery vs Manual Config?
Manual Configuration Issues:
- Duplication between model definition and config file
- Config drift when models change
- Maintenance burden
- Error-prone
Auto-Discovery Benefits:
- Single source of truth (Eloquent model)
- Zero duplication
- Automatic updates when models change
- Laravel philosophy: Convention over configuration
Escape Hatch: Override via nodeableConfig() method when needed
License
MIT License - see LICENSE file for details
Support
- Issues: GitHub Issues for bug reports and feature requests
- Documentation: See
resources/docs/1.0/(rendered via LaRecipe at/ai-docs) - Examples: Working code samples in
examples/directory
Built with Laravel | Powered by RAG | Secured by Design