daikazu / robotstxt
Dynamically generate robots.txt content based on the current Laravel environment.
Fund package maintenance!
Daikazu
Installs: 6
Dependents: 0
Suggesters: 0
Security: 0
Stars: 0
Watchers: 0
Forks: 0
Open Issues: 0
pkg:composer/daikazu/robotstxt
Requires
- php: ^8.4
- illuminate/contracts: ^11.0||^12.0
- spatie/laravel-package-tools: ^1.16
Requires (Dev)
- larastan/larastan: ^3.0
- laravel/pint: ^1.14
- nunomaduro/collision: ^8.8
- orchestra/testbench: ^10.0.0||^9.0.0
- pestphp/pest: ^4.0
- pestphp/pest-plugin-arch: ^4.0
- pestphp/pest-plugin-laravel: ^4.0
- phpstan/extension-installer: ^1.4
- phpstan/phpstan-deprecation-rules: ^2.0
- phpstan/phpstan-phpunit: ^2.0
- rector/rector: ^2.2
README
Dynamic robots.txt for your Laravel app.
A Laravel package for dynamically generating robots.txt files with environment-specific configurations. Control how search engines and AI crawlers interact with your site using modern features like Cloudflare's Content Signals Policy, per-environment rules, and flexible content signal directives.
Perfect for applications that need different robots.txt rules across environments (production, staging, local) or want granular control over AI training, search indexing, and content access.
Features
- Environment-Specific Configuration - Different robots.txt rules for production, staging, local, etc.
- Content Signals Support - Implement Cloudflare's Content Signals Policy to control AI training, search indexing, and AI content usage
- Flexible User-Agent Rules - Define global rules or per-agent directives (disallow, allow, content signals)
- Host Directive - Specify preferred domain for crawlers
- Sitemap Management - Automatically include sitemap URLs
- Custom Text - Add custom content to your robots.txt file
- Human-Readable Policies - Optional policy comment blocks with custom or default text
Installation
You can install the package via composer:
composer require daikazu/robotstxt
You can publish the config file with:
php artisan vendor:publish --tag="robotstxt-config"
Nginx Configuration (Required for Production)
If you're getting a 404 status (but still seeing content), you need to configure Nginx to pass robots.txt requests to Laravel:
For Laravel Herd:
Add this to your site's Nginx config (via Herd UI or ~/.config/herd/Nginx/[site].conf):
location = /robots.txt { try_files $uri /index.php?$query_string; access_log off; log_not_found off; }
For Laravel Forge/Vapor: Add the same location block to your Nginx configuration.
For custom servers: Add to your server block in your Nginx config file.
Then restart Nginx/Herd.
Usage
After installation, the package automatically registers a route at /robots.txt that serves your dynamically generated robots.txt file.
Basic Configuration
The default configuration file (config/robotstxt.php) provides environment-specific settings. Here's a simple example:
return [ 'environments' => [ 'production' => [ 'paths' => [ '*' => [ 'disallow' => ['/admin', '/api'], 'allow' => ['/'], ], ], 'sitemaps' => [ 'sitemap.xml', ], ], ], ];
This generates:
Sitemap: https://example.com/sitemap.xml
User-agent: *
Disallow: /admin
Disallow: /api
Allow: /
Content Signals (AI & Search Control)
Control how AI crawlers and search engines use your content with Cloudflare's Content Signals Policy:
'production' => [ // Enable human-readable policy comment block 'content_signals_policy' => [ 'enabled' => true, 'custom_policy' => null, // or provide your own HEREDOC text ], // Global content signals (applied at top level) 'content_signals' => [ 'search' => true, // Allow search indexing 'ai_input' => false, // Block AI input/RAG 'ai_train' => false, // Block AI training ], 'paths' => [ '*' => [ 'disallow' => [], 'allow' => ['/'], ], ], ],
Generates:
# As a condition of accessing this website, you agree to abide by the following
# content signals:
# [Full policy text...]
Content-Signal: search=yes, ai-input=no, ai-train=no
User-agent: *
Allow: /
Per-Agent Content Signals
You can also define content signals for specific user agents:
'paths' => [ '*' => [ 'disallow' => [], 'allow' => ['/'], ], 'Googlebot' => [ 'content_signals' => [ 'search' => true, 'ai_input' => true, 'ai_train' => false, ], 'disallow' => ['/private'], 'allow' => ['/'], ], ],
Generates:
Content-Signal: search=yes, ai-input=no, ai-train=no
User-agent: *
Allow: /
User-agent: Googlebot
Content-Signal: search=yes, ai-input=yes, ai-train=no
Disallow: /private
Allow: /
Host Directive
Specify the preferred domain for crawlers:
'production' => [ 'host' => 'https://www.example.com', // ... other config ],
Generates:
Host: https://www.example.com
Custom Text
Add arbitrary custom content to the end of your robots.txt:
'production' => [ 'custom_text' => <<<'TEXT' # Custom crawl-delay for specific bots User-agent: Bingbot Crawl-delay: 1 TEXT, // ... other config ],
Environment-Specific Rules
Define different rules for each environment:
return [ 'environments' => [ 'production' => [ 'paths' => [ '*' => [ 'disallow' => [], 'allow' => ['/'], ], ], 'content_signals' => [ 'search' => true, 'ai_input' => false, 'ai_train' => false, ], ], 'staging' => [ 'paths' => [ '*' => [ 'disallow' => ['/'], ], ], ], 'local' => [ 'paths' => [ '*' => [ 'disallow' => ['/'], ], ], ], ], ];
Content Signal Values
trueor'yes'- Permission grantedfalseor'no'- Permission deniednull- No preference specified (signal not included)
Content Signal Types
- search - Building a search index and providing search results (excludes AI summaries)
- ai_input - Inputting content into AI models (RAG, grounding, AI Overviews)
- ai_train - Training or fine-tuning AI models
Complete Configuration Example
return [ 'environments' => [ 'production' => [ // Content Signals Policy 'content_signals_policy' => [ 'enabled' => true, 'custom_policy' => null, ], // Global Content Signals 'content_signals' => [ 'search' => true, 'ai_input' => false, 'ai_train' => false, ], // User-Agent Rules 'paths' => [ '*' => [ 'disallow' => ['/admin', '/api'], 'allow' => ['/'], ], 'Googlebot' => [ 'content_signals' => [ 'search' => true, 'ai_input' => true, 'ai_train' => false, ], 'disallow' => [], 'allow' => ['/'], ], ], // Sitemaps 'sitemaps' => [ 'sitemap.xml', 'sitemap-news.xml', ], // Host 'host' => 'https://www.example.com', // Custom Text 'custom_text' => <<<'TEXT' # Additional custom directives User-agent: Bingbot Crawl-delay: 1 TEXT, ], ], ];
Testing
composer test
Roadmap
- Crawl-delay Directive - Add native support for crawl-delay configuration per user-agent
Changelog
Please see CHANGELOG for more information on what has changed recently.
Contributing
Please see CONTRIBUTING for details.
Security Vulnerabilities
Please review our security policy on how to report security vulnerabilities.
Credits
License
The MIT License (MIT). Please see License File for more information.