schenke-io / laravel-url-cleaner
check and cleans url from seo or tracking data
Fund package maintenance!
spatie
Requires
- php: ^8.1|^8.2|^8.3
- ext-curl: *
- ext-json: *
- ext-simplexml: *
- archtechx/enums: ^1.1
- badges/poser: ^2.0|^3.0
- guzzlehttp/guzzle: ^7.0
- spatie/laravel-package-tools: ^1.0
Requires (Dev)
- laravel/framework: ^10.0|^11.0
- laravel/pint: ^1.18
- mockery/mockery: ^1.5
- orchestra/testbench: ^8.0|^9.0
- pestphp/pest: ^1.22|^2.0|^3.0
- phpstan/phpstan-phpunit: ^1.0
- spatie/ray: ^1.40
README
The Laravel URL Cleaner package sanitizes URLs by removing unnecessary SEO parameters, tracking information, and other clutter, ensuring clean and efficient URL handling in your Laravel applications.
To install just run:
composer require schenke-io/laravel-url-cleaner
Here a code example:
<?php use SchenkeIo\LaravelUrlCleaner\UrlCleaner; $shortUrl = (new UrlCleaner)->handle($longUrl);
Operation principle
The core UrlCleaner
class iteratively applies a series of specialized
cleaner classes to a given URL. Each cleaner class performs a specific modification
to check and clean the URL for the following reasons:
- Reducing URL clutter: Removes unnecessary SEO parameters and tracking information.
- Improving data storage efficiency: Stores cleaner, more concise URLs.
- Enhancing performance: Optimizes URL processing and caching.
- Securing sensitive information: Prevents exposure of tracking parameters.
- Enhancing data analysis: Simplifies data analysis by removing noise from URLs.
This cleaner classes are highly extensible, allowing for customization and the creation of new modification types.
Config
A default configuration file can be installed and later modified, you can install it with:
php artisan url-cleaner:install
A typical result could be:
[ 'cleaners' => [ MarketingBroad::class, RemoveLongValues::class, PreventInvalidHost::class ], 'max_length_value' => 40, 'masks' => ['dd3','vv67'], 'protected_keys' => ['search'] ]
key | type | description | cleaner |
---|---|---|---|
cleaners | array | list of cleaner classes applied to the given URL | any |
max_length_value | int | values longer than this are removed by | RemoveLongValues |
masks | array | additional masks to be used | RemoveConfigMasks |
protected_keys | array | key names which are guard against removal | any |
List of cleaner classes
class name | # masks | description |
---|---|---|
Marketing00 | 68 | Manual collected list of parameters for cleaning |
Marketing01 | 94 | tracking-query-params-registry from https://github.com/mpchadwick |
Marketing02 | 43 | url-parameter-tracker-list from https://github.com/spekulatius |
Marketing03 | 170 | Neat-URL from https://github.com/Smile4ever |
Marketing04 | 91 | platform-url-click-id-parameters from https://github.com/henkisdabro |
MarketingBroad | 226 | prioritize generic masks from all sources |
MarketingNarrow | 309 | prioritize specific masks from all sources |
MarketingUnique | 348 | all masks from all sources |
PreventInvalidHost | - | do not allow urls with invalid host names |
PreventLocalhost | - | do not allow urls from localhost |
PreventNonHttps | - | do not allow urls different from the scheme https |
PreventUserPassword | - | do not allow urls using user and passwords |
RemoveConfigMasks | - | remove keys defined in the config |
RemoveLongValues | - | remove overly long parameters. |
RemoveSearch | - | remove typical search parameters |
ShortAmazonProductUrl | - | Amazon product url cleaner |
SortParameters | - | the query parameters get alphabetical sorted |
The use of masks
The core process of URL parameter removal utilizes specific masks.
Description | Example mask |
---|---|
exact match of one query key on any domain | utm_campaign |
match of some keys on any domain | utm_* *tm_* |
exact match of one query key on one domain | utm_campaign@test.net |
exact match of one query key on some domains | utm_campaign@test.* utm_campaign@*test.* |
match of some keys on one domain | utm_*@test.net *x*@test.net |
match of some keys on some domains | utm_*@test.* *x*@*test.* |
Soem examples are outlined in the table below.
Mask | URL 1 test.com/?a=1&b=2 |
URL 2 test.net/?a=1&abb=2 |
URL 3 test2.com/?a=1&b=2 |
---|---|---|---|
a | test.com/?b=2 | test.net/?abb=2 | test2.com/?b=2 |
a* | test.com/?b=2 | test.net/ | test2.com/?b=2 |
test.com@a | test.com/?b=2 | test.net/?a=1&abb=2 | test2.com/?a=1&b=2 |
test.*@a | test.com/?b=2 | test.net/?abb=2 | test2.com/?a=1&b=2 |
Build your own cleaner by extending special classes
To extend the list of cleaners you can build your own
cleaners and put them in the config
file config/url-cleaner.php
The following cleaners are prepared to be extended for custom applications:
Prevent domain names
Extend PreventLocalhost
and overwrite the $hostRegExes
array with regular
expressions matching unwanted hostnames.
<?php use SchenkeIo\LaravelUrlCleaner\Cleaners\PreventLocalhost; class MyCleaner extends PreventLocalhost { protected array $hostRegExes = [ '/test\.com/', '/test\.net/', ]; }
Prevent schemes
Extend PreventNonHttps
and overwrite the $allowedSchemes
array with scheme
you allow to pass.
<?php use SchenkeIo\LaravelUrlCleaner\Cleaners\PreventNonHttps; class MyCleaner extends PreventNonHttps { protected array $allowedSchemes = [ 'https', 'http', 'sftp', ]; }
Use your own masks
Extend RemoveSearch
and overwrite the $masks
array with masks you want to exclude.
<?php use SchenkeIo\LaravelUrlCleaner\Cleaners\RemoveSearch; class MyCleaner extends RemoveSearch { protected array $masks = [ 'utm_*', 'test*', 'q@test.net' ]; }
Rewrite urls
Extend ShortAmazonProductUrl
and overwrite the clean()
method using
the class as an example.
<?php use SchenkeIo\LaravelUrlCleaner\Cleaners\ShortAmazonProductUrl; class MyCleaner extends ShortAmazonProductUrl { public function clean(UrlData &$urlData): void { // check if the hostname is right if (preg_match(/* regular expression */, $urlData->host)) { // check for the path to be replaced if (preg_match(/* regular expression */, $urlData->path, $matches)) { // your code $urlData->path = /* new path */; $urlData->fragment = ''; // clean if applicable $urlData->query = ''; // clean if applicable $urlData->parameter = []; // clean if applicable } } } }
Data sources
Currently, the following sources are used:
- https://docs.flyingpress.com/en/article/ignore-query-parameters-yfejfj/
- https://support.cloudways.com/en/articles/8437462-how-to-enable-ignore-query-string-for-varnish-cache
- https://github.com/mpchadwick/tracking-query-params-registry
- https://github.com/spekulatius/url-parameter-tracker-list
- https://github.com/Smile4ever/Neat-URL
- https://github.com/henkisdabro/platform-url-click-id-parameters
- https://data.iana.org/TLD/tlds-alpha-by-domain.txt