orangecat / feed
Generate product feed for AI Crawlers
Installs: 1
Dependents: 0
Suggesters: 0
Security: 0
Stars: 0
Watchers: 0
Forks: 0
Open Issues: 0
Type:magento2-module
pkg:composer/orangecat/feed
Requires
- php: ^7.4 || ^8.1
- magento/framework: *
- orangecat/core: ^1.0.2
- orangecat/faqs: ^1.0.0
README
The Orangecat Feed module is a specialized tool designed to generate high-quality, structured product feeds specifically optimized for AI Crawlers and Large Language Models (LLMs). By providing data in a machine-readable format (JSON), it ensures that AI agents can accurately index, understand, and retrieve your product catalog information.
Key Features
- AI-Optimized Output: Generates feeds in JSON format, which is the preferred structure for modern AI training and retrieval systems.
- Automated Generation: Uses Magento's Cron system to automatically generate and update feeds on a schedule you define, ensuring your data is always fresh.
- Chunking Support: Capable of handling large catalogs by splitting feeds into smaller, manageable chunks (e.g., 500 products per file).
- Customizable Content:
- Select specific product attributes to include (e.g., description, price, stock status).
- Option to include product images.
- FAQ Integration: Seamlessly integrates with the Orangecat Faqs module to include product-specific FAQs in the feed, providing richer context for AI models.
- Multistore Support: Generates separate feeds for each store view, respecting localizations and currency settings.
Configuration
You can configure the module settings in Stores > Configuration > Orange Cat > Product Feed for AI.
General Settings
- Enable Feed Generation: Turn the automatic feed generation on or off.
- Cron Schedule: Define how often the feed should be regenerated using standard cron expression syntax (default: daily at 2 AM).
- Feed Filename: Set a custom base name for your feed files (e.g.,
products). - Products Per Chunk: Define the number of products per JSON file. Set to
0to generate a single large file. - Product Attributes: Select which Magento product attributes to include in the feed payload. SKU, Name, and Price are always included.
- Include Product Image: Toggle the inclusion of the main product image URL.
- Include FAQs: If the Orangecat Faqs module is installed, this option allows you to embed related FAQs directly into the product data object.
- Output Format: Choose the output format:
- JSON: Standard structured data.
Log Cleanup
- Log Retention (Days): Automatically clean up old generation logs after a specified number of days to save database space.
For Developers
Feed Location
Generated feeds are stored in the pub/media/feed directory (or similar public path depending on configuration), making them easily accessible for external crawlers via HTTP.
Extensibility
The module uses a modular architecture for data collection:
- Data Collectors: You can implement additional data collectors to inject custom data into the feed by extending the module's service layer.
- Events: Dispatch events during feed generation to allow other modules to modify the data stream.