schenke-io/laravel-url-cleaner

check and cleans url from seo or tracking data

v1.0.0 2024-11-19 16:20 UTC

This package is auto-updated.

Last update: 2025-03-19 17:10:18 UTC


README

Latest Version on Packagist GitHub Tests Action Status Total Downloads

The Laravel URL Cleaner package sanitizes URLs by removing unnecessary SEO parameters, tracking information, and other clutter, ensuring clean and efficient URL handling in your Laravel applications.

To install just run:

  composer require schenke-io/laravel-url-cleaner

Here a code example:

<?php

use SchenkeIo\LaravelUrlCleaner\UrlCleaner;


$shortUrl = (new UrlCleaner)->handle($longUrl);    

Operation principle

The core UrlCleaner class iteratively applies a series of specialized cleaner classes to a given URL. Each cleaner class performs a specific modification to check and clean the URL for the following reasons:

  • Reducing URL clutter: Removes unnecessary SEO parameters and tracking information.
  • Improving data storage efficiency: Stores cleaner, more concise URLs.
  • Enhancing performance: Optimizes URL processing and caching.
  • Securing sensitive information: Prevents exposure of tracking parameters.
  • Enhancing data analysis: Simplifies data analysis by removing noise from URLs.

This cleaner classes are highly extensible, allowing for customization and the creation of new modification types.

Config

A default configuration file can be installed and later modified, you can install it with:

php artisan url-cleaner:install

A typical result could be:

[
    'cleaners' => [
        MarketingBroad::class,
        RemoveLongValues::class,
        PreventInvalidHost::class
    ],
    'max_length_value' => 40,
    'masks' => ['dd3','vv67'],
    'protected_keys' => ['search']   
]
key type description cleaner
cleaners array list of cleaner classes applied to the given URL any
max_length_value int values longer than this are removed by RemoveLongValues
masks array additional masks to be used RemoveConfigMasks
protected_keys array key names which are guard against removal any

List of cleaner classes

class name # masks description
Marketing0068Manual collected list of parameters for cleaning
Marketing0194tracking-query-params-registry from https://github.com/mpchadwick
Marketing0243url-parameter-tracker-list from https://github.com/spekulatius
Marketing03170Neat-URL from https://github.com/Smile4ever
Marketing0491platform-url-click-id-parameters from https://github.com/henkisdabro
MarketingBroad226prioritize generic masks from all sources
MarketingNarrow309prioritize specific masks from all sources
MarketingUnique348all masks from all sources
PreventInvalidHost-do not allow urls with invalid host names
PreventLocalhost-do not allow urls from localhost
PreventNonHttps-do not allow urls different from the scheme https
PreventUserPassword-do not allow urls using user and passwords
RemoveConfigMasks-remove keys defined in the config
RemoveLongValues-remove overly long parameters.
RemoveSearch-remove typical search parameters
ShortAmazonProductUrl-Amazon product url cleaner
SortParameters-the query parameters get alphabetical sorted

The use of masks

The core process of URL parameter removal utilizes specific masks.

Description Example mask
exact match of one query key on any domain utm_campaign
match of some keys on any domain utm_*
*tm_*
exact match of one query key on one domain utm_campaign@test.net
exact match of one query key on some domains utm_campaign@test.*
utm_campaign@*test.*
match of some keys on one domain utm_*@test.net
*x*@test.net
match of some keys on some domains utm_*@test.*
*x*@*test.*

Soem examples are outlined in the table below.

Mask URL 1
test.com/?a=1&b=2
URL 2
test.net/?a=1&abb=2
URL 3
test2.com/?a=1&b=2
a test.com/?b=2 test.net/?abb=2 test2.com/?b=2
a* test.com/?b=2 test.net/ test2.com/?b=2
test.com@a test.com/?b=2 test.net/?a=1&abb=2 test2.com/?a=1&b=2
test.*@a test.com/?b=2 test.net/?abb=2 test2.com/?a=1&b=2

Build your own cleaner by extending special classes

To extend the list of cleaners you can build your own cleaners and put them in the config file config/url-cleaner.php

The following cleaners are prepared to be extended for custom applications:

Prevent domain names

Extend PreventLocalhost and overwrite the $hostRegExes array with regular expressions matching unwanted hostnames.

<?php

use SchenkeIo\LaravelUrlCleaner\Cleaners\PreventLocalhost;

class MyCleaner extends PreventLocalhost {

    protected array $hostRegExes = [
        '/test\.com/',
        '/test\.net/',
    ];
    
}

Prevent schemes

Extend PreventNonHttps and overwrite the $allowedSchemes array with scheme you allow to pass.

<?php

use SchenkeIo\LaravelUrlCleaner\Cleaners\PreventNonHttps;

class MyCleaner extends PreventNonHttps {

    protected array $allowedSchemes = [
        'https',
        'http',
        'sftp',
    ];
    
}

Use your own masks

Extend RemoveSearch and overwrite the $masks array with masks you want to exclude.

<?php

use SchenkeIo\LaravelUrlCleaner\Cleaners\RemoveSearch;

class MyCleaner extends RemoveSearch {

    protected array $masks = [
        'utm_*',
        'test*',
        'q@test.net'
    ];
    
}

Rewrite urls

Extend ShortAmazonProductUrl and overwrite the clean() method using the class as an example.

<?php

use SchenkeIo\LaravelUrlCleaner\Cleaners\ShortAmazonProductUrl;

class MyCleaner extends ShortAmazonProductUrl {

    public function clean(UrlData &$urlData): void
    {
        // check if the hostname is right
        if (preg_match(/* regular expression   */, $urlData->host)) {
            // check for the path to be replaced
            if (preg_match(/* regular expression */, $urlData->path, $matches)) {
                
                // your code 

                $urlData->path = /* new path */;
                $urlData->fragment = '';  // clean if applicable
                $urlData->query = '';     // clean if applicable
                $urlData->parameter = []; // clean if applicable
            }
        }
    } 
}

Data sources

Currently, the following sources are used: