daikazu/robotstxt

Dynamically generate robots.txt content based on the current Laravel environment.

Fund package maintenance!
Daikazu

Installs: 6

Dependents: 0

Suggesters: 0

Security: 0

Stars: 0

Watchers: 0

Forks: 0

Open Issues: 0

pkg:composer/daikazu/robotstxt

v1.0.0 2025-11-20 21:39 UTC

This package is auto-updated.

Last update: 2025-11-21 15:32:19 UTC


README

Logo for ROBOTS.TXT

Latest Version on Packagist GitHub Tests Action Status GitHub Code Style Action Status Total Downloads

Dynamic robots.txt for your Laravel app.

A Laravel package for dynamically generating robots.txt files with environment-specific configurations. Control how search engines and AI crawlers interact with your site using modern features like Cloudflare's Content Signals Policy, per-environment rules, and flexible content signal directives.

Perfect for applications that need different robots.txt rules across environments (production, staging, local) or want granular control over AI training, search indexing, and content access.

Features

  • Environment-Specific Configuration - Different robots.txt rules for production, staging, local, etc.
  • Content Signals Support - Implement Cloudflare's Content Signals Policy to control AI training, search indexing, and AI content usage
  • Flexible User-Agent Rules - Define global rules or per-agent directives (disallow, allow, content signals)
  • Host Directive - Specify preferred domain for crawlers
  • Sitemap Management - Automatically include sitemap URLs
  • Custom Text - Add custom content to your robots.txt file
  • Human-Readable Policies - Optional policy comment blocks with custom or default text

Installation

You can install the package via composer:

composer require daikazu/robotstxt

You can publish the config file with:

php artisan vendor:publish --tag="robotstxt-config"

Nginx Configuration (Required for Production)

If you're getting a 404 status (but still seeing content), you need to configure Nginx to pass robots.txt requests to Laravel:

For Laravel Herd: Add this to your site's Nginx config (via Herd UI or ~/.config/herd/Nginx/[site].conf):

location = /robots.txt {
      try_files $uri /index.php?$query_string;
      access_log off;
      log_not_found off;
  }

For Laravel Forge/Vapor: Add the same location block to your Nginx configuration.

For custom servers: Add to your server block in your Nginx config file.

Then restart Nginx/Herd.

Usage

After installation, the package automatically registers a route at /robots.txt that serves your dynamically generated robots.txt file.

Basic Configuration

The default configuration file (config/robotstxt.php) provides environment-specific settings. Here's a simple example:

return [
    'environments' => [
        'production' => [
            'paths' => [
                '*' => [
                    'disallow' => ['/admin', '/api'],
                    'allow' => ['/'],
                ],
            ],
            'sitemaps' => [
                'sitemap.xml',
            ],
        ],
    ],
];

This generates:

Sitemap: https://example.com/sitemap.xml

User-agent: *
Disallow: /admin
Disallow: /api
Allow: /

Content Signals (AI & Search Control)

Control how AI crawlers and search engines use your content with Cloudflare's Content Signals Policy:

'production' => [
    // Enable human-readable policy comment block
    'content_signals_policy' => [
        'enabled' => true,
        'custom_policy' => null, // or provide your own HEREDOC text
    ],

    // Global content signals (applied at top level)
    'content_signals' => [
        'search'   => true,   // Allow search indexing
        'ai_input' => false,  // Block AI input/RAG
        'ai_train' => false,  // Block AI training
    ],

    'paths' => [
        '*' => [
            'disallow' => [],
            'allow' => ['/'],
        ],
    ],
],

Generates:

# As a condition of accessing this website, you agree to abide by the following
# content signals:
# [Full policy text...]

Content-Signal: search=yes, ai-input=no, ai-train=no

User-agent: *
Allow: /

Per-Agent Content Signals

You can also define content signals for specific user agents:

'paths' => [
    '*' => [
        'disallow' => [],
        'allow' => ['/'],
    ],
    'Googlebot' => [
        'content_signals' => [
            'search'   => true,
            'ai_input' => true,
            'ai_train' => false,
        ],
        'disallow' => ['/private'],
        'allow' => ['/'],
    ],
],

Generates:

Content-Signal: search=yes, ai-input=no, ai-train=no

User-agent: *
Allow: /

User-agent: Googlebot
Content-Signal: search=yes, ai-input=yes, ai-train=no
Disallow: /private
Allow: /

Host Directive

Specify the preferred domain for crawlers:

'production' => [
    'host' => 'https://www.example.com',
    // ... other config
],

Generates:

Host: https://www.example.com

Custom Text

Add arbitrary custom content to the end of your robots.txt:

'production' => [
    'custom_text' => <<<'TEXT'
# Custom crawl-delay for specific bots
User-agent: Bingbot
Crawl-delay: 1
TEXT,
    // ... other config
],

Environment-Specific Rules

Define different rules for each environment:

return [
    'environments' => [
        'production' => [
            'paths' => [
                '*' => [
                    'disallow' => [],
                    'allow' => ['/'],
                ],
            ],
            'content_signals' => [
                'search' => true,
                'ai_input' => false,
                'ai_train' => false,
            ],
        ],
        'staging' => [
            'paths' => [
                '*' => [
                    'disallow' => ['/'],
                ],
            ],
        ],
        'local' => [
            'paths' => [
                '*' => [
                    'disallow' => ['/'],
                ],
            ],
        ],
    ],
];

Content Signal Values

  • true or 'yes' - Permission granted
  • false or 'no' - Permission denied
  • null - No preference specified (signal not included)

Content Signal Types

  • search - Building a search index and providing search results (excludes AI summaries)
  • ai_input - Inputting content into AI models (RAG, grounding, AI Overviews)
  • ai_train - Training or fine-tuning AI models

Complete Configuration Example

return [
    'environments' => [
        'production' => [
            // Content Signals Policy
            'content_signals_policy' => [
                'enabled' => true,
                'custom_policy' => null,
            ],

            // Global Content Signals
            'content_signals' => [
                'search'   => true,
                'ai_input' => false,
                'ai_train' => false,
            ],

            // User-Agent Rules
            'paths' => [
                '*' => [
                    'disallow' => ['/admin', '/api'],
                    'allow' => ['/'],
                ],
                'Googlebot' => [
                    'content_signals' => [
                        'search'   => true,
                        'ai_input' => true,
                        'ai_train' => false,
                    ],
                    'disallow' => [],
                    'allow' => ['/'],
                ],
            ],

            // Sitemaps
            'sitemaps' => [
                'sitemap.xml',
                'sitemap-news.xml',
            ],

            // Host
            'host' => 'https://www.example.com',

            // Custom Text
            'custom_text' => <<<'TEXT'
# Additional custom directives
User-agent: Bingbot
Crawl-delay: 1
TEXT,
        ],
    ],
];

Testing

composer test

Roadmap

  • Crawl-delay Directive - Add native support for crawl-delay configuration per user-agent

Changelog

Please see CHANGELOG for more information on what has changed recently.

Contributing

Please see CONTRIBUTING for details.

Security Vulnerabilities

Please review our security policy on how to report security vulnerabilities.

Credits

License

The MIT License (MIT). Please see License File for more information.