php-anonymizer / anonymizer
Library to remove confidential data from different data containers
Installs: 9 093
Dependents: 0
Suggesters: 0
Security: 0
Stars: 2
Watchers: 1
Forks: 0
Open Issues: 0
Requires
- php: ^8.2
- myclabs/deep-copy: ^1.12
- thecodingmachine/safe: ^2.5
Requires (Dev)
- ext-json: *
- ext-yaml: *
- ergebnis/composer-normalize: ^2.45
- fakerphp/faker: ^1.24
- friendsofphp/php-cs-fixer: ^3.68
- icanhazstring/composer-unused: dev-main
- infection/infection: ^0.29.6
- phpmd/phpmd: ^2.15
- phpstan/phpstan: ^1.12
- phpstan/phpstan-phpunit: ^1.4
- phpunit/phpunit: ^11
- rector/rector: ^1.2
- symfony/serializer: ^7.2
- thecodingmachine/phpstan-safe-rule: ^1.2
Suggests
- ext-json: JSON extension is required to use the JSON data encoder
- ext-yaml: YAML extension is required to use the YAML data encoder
- fakerphp/faker: Data library for generation of fake data
- symfony/serializer: Serializer component to normalize and denormalize data objects
This package is auto-updated.
Last update: 2025-03-16 15:19:23 UTC
README
00 Preliminary
Table of Contents
- 00 Preliminary
- 01 Basic usage
- 02 Writing definition rules
- 03 Using Faker as a data provider
- 04 Data Encoding
- 06 Extended Information
Purpose
This library is a simple data anonymization toolkit that allows to define rules for anonymizing data in a structured
way. By using this library it is possible to skip writing a lot of boilerplate code to navigate through your data
structures again and again. The library is designed to be flexible and extensible, so that it can be used in a wide
range of use cases. It also ships with support of fakerphp/faker
as a provider for randomized fake data.
Getting started
composer require toniliesche/anonymizer
01 Basic usage
Creating an Anonymizer instance
If you want to start with the most basic usage of modifying data in an array structure, all you have to do is to create
an instance of the Anonymizer
class, register a rule set by using registerRuleSet
and call the run
method with the
data you want to modify.
// examples/01_basic_usage.php <?php declare(strict_types=1); use PhpAnonymizer\Anonymizer\AnonymizerBuilder; $anonymizer = (new AnonymizerBuilder()) ->withDefaults() ->build(); $anonymizer->registerRuleSet( 'order', [ 'order.person.first_name', 'order.person.last_name', ], ); $data = [ 'order' => [ 'person' => [ 'first_name' => 'John', 'last_name' => 'Doe', ], ], ]; $anonymizedData = $anonymizer->run('order', $data); echo PHP_EOL . 'Original data:' . PHP_EOL; print_r($data); echo PHP_EOL . 'Anonymized data:' . PHP_EOL; print_r($anonymizedData);
Example Output
Original data:
Array
(
[order] => Array
(
[person] => Array
(
[first_name] => John
[last_name] => Doe
)
)
)
Anonymized data:
Array
(
[order] => Array
(
[person] => Array
(
[first_name] => ****
[last_name] => ***
)
)
)
02 Writing definition rules
02.01 Basic rule syntax
The default syntax when navigating the data to be anonymized is using dot notation. Every word separated by a dot
represents a level in the data structure. The following example shows how to write a rule for anonymizing the
first_name
and last_name
fields in the order.person
structure.
// examples/02_01_definitions_basic_rules.php $anonymizer->registerRuleSet( 'order', [ 'order.person.first_name', 'order.person.last_name', ], );
02.02 Array rule syntax
Additionally it is possible to make use of array notation to tell the anonymizer engine that there is a list of items at
a certain level. This can be realized by putting []
in front of a keyword.
// examples/02_02_definitions_array_rules.php $anonymizer->registerRuleSet( 'order', [ '[]orders.person.first_name', '[]orders.person.last_name', ], );
02.03 Property access syntax [complex rule parser only]
The previous examples all assumed that the data structure to be passed is an array. Apart from that this library also supports different ways of accessing object properties. This can be passed to any layer directly after the name of the property.
Note: The definition of the access method is optional. If omitted, the anonymizer will fall back to the configured default access method.
Example with direct property access on object. This methods requires the properties to be public and not readonly.
// examples/02_03_01_definitions_property_access_by_property.php $anonymizer->registerRuleSet( 'order', [ 'order.person.firstName[property]', 'order.person.lastName[property]', ], );
Example with property access via getter and setter method. This method requires the properties to have a matching getPropertyName and setPropertyName method.
// examples/02_03_02_definitions_property_access_by_setter.php $anonymizer->registerRuleSet( 'order', [ 'order.person.firstName[setter]', 'order.person.lastName[setter]', ], );
The safest way to access properties on objects, is to use reflection.
// examples/02_03_03_definitions_property_access_via_reflection.php $anonymizer->registerRuleSet( 'order', [ 'order.person.firstName[reflection]', 'order.person.lastName[reflection]', ], );
Of course it is also possible to mix these access methods in one rule set and make the array property access more verbose.
// 02_03_04_definitions_property_access_mixed.php $anonymizer->registerRuleSet( 'order', [ '[]orders[array].person[setter].firstName[property]', '[]orders[array].person[setter].lastName[property]', ], );
In case there are any more specific requirements for accessing object properties, it is possible to implement a custom
data accessor by implementing the PhpAnonymizer\Anonymizer\DataAccess\DataAccessorInterface
.
Supported access methods as of now are:
Access Method | Description |
---|---|
array |
Access array elements by key name |
property |
Access object properties by name |
reflection |
Access object properties via reflection classes |
setter |
Access object properties by getter and setter methods |
02.04 Fake data type annotation [complex rule parser only]
! Notice: This feature CANNOT be combined with the nested data type annotation.
Until now all that we have achieved, is to replace the data with starred out place holders that have the same length as the original data. As sometimes it is more desirable to replace the data with more real world-like fake data, it is possible to tell the Anonymizer which kind of data we want to set as a field's replacement.
Note: to get this feature working, the fakerphp/faker
library must be installed. See section
03 Using Faker as a data provider
for more information.
To introduce the use of fake data, you can add a type annotation to the property field with a precending #
symbol
within the square brackets, e.g. order.person.firstName[#firstName]
.
// examples/02_04_01_definitions_fake_data.php $anonymizer->registerRuleSet( 'order', [ 'order.person.firstName[#firstName]', 'order.person.lastName[#lastName]', ], );
Of course, also in this case it is possible to mix the fake data type annotations with the other access methods. In this
case, the fake data type must be preceded by the access method, e.g. order.person.firstName[property#firstName]
.
// examples/02_04_02_definitions_fake_data_with_property_access.php $anonymizer->registerRuleSet( 'order', [ 'order.person.firstName[property#firstName]', 'order.person.lastName[property#lastName]', ], );
02.05 Nested data type annotation [complex rule parser only]
! Notice: This feature CANNOT be combined with the fake data type annotation.
In some cases it can be necessary to define a way of anonymizing data that have been stored in a nested data type. This can happen when a string formatted field contains a complete json document for example. In this case we need two information to be defined: The type of the nested data (e.g. json) and the rule set that should handle the nested data (as this data is handled as a separate object).
The data type will be annotated after a leading ?
symbol, followed by a /
separated rule name, e.g. order.address[?json/address]
.
// examples/02_05_01_definitions_nested_data.php $anonymizer->registerRuleSet( 'order', [ 'order.address[?json/address]', ], );
And again, also in this case it is possible to mix the nested data type annotations with the other access methods. In this case, the fake data type must be preceded by the access method, e.g. order.address[property?json/address]
.
// examples/02_05_02_definitions_nested_data_with_property_access.php $anonymizer->registerRuleSet( 'order', [ 'order.address[property?json/address]', ], );
03 Using Faker as a data provider
Before using Faker as a data provider, you need to install the fakerphp/faker
library. This can be realized by issuing
a simple composer command.
composer require fakerphp/faker
Afterwards, you need to register an instance of Faker to the Anonymizer instance. There are different ways of achieving this.
03.01 Use default Faker instance of builder
By default it is possible to use the generic Faker instance that is shipped with the AnonymizerBuilder. This instance is
created with the default locale en_US
and all default providers.
// examples/03_01_default_faker_instance.php use PhpAnonymizer\Anonymizer\AnonymizerBuilder; use PhpAnonymizer\Anonymizer\Enum\NodeParser; $builder = (new AnonymizerBuilder()) ->withDefaults() ->withNodeParserType(NodeParser::COMPLEX->value) ->withFaker(true) ->build();
03.02 Use custom Faker instance
If you want to have more control over how the Faker instance is created, you can pass an instance of Faker\Generator
directly to the AnonymizerBuilder.
// examples/03_02_custom_faker_instance.php use PhpAnonymizer\Anonymizer\AnonymizerBuilder; use PhpAnonymizer\Anonymizer\Enum\NodeParser; use Faker\Factory; $faker = Factory::create('de_DE'); $builder = (new AnonymizerBuilder()) ->withDefaults() ->withNodeParserType(NodeParser::COMPLEX->value) ->withCustomFaker($faker) ->build();
03.03 Set seed for Faker instance
It is also possible to set a seed for the Faker instance. This can be useful if you want to have reproducible results. In our case we use a string as a keyword that will be hashed to an integer value (md5).
// examples/03_03_seeded_faker_instance.php use PhpAnonymizer\Anonymizer\AnonymizerBuilder; use PhpAnonymizer\Anonymizer\Enum\NodeParser; $builder = (new AnonymizerBuilder()) ->withDefaults() ->withNodeParserType(NodeParser::COMPLEX->value) ->withFaker(true) ->withFakerSeed('my_seed') ->build();
04 Data Encoding
The Analyzer
main service class supports the use of a data encoding class. The sole purpose of this data encoding
class is to make the input data accessible for transformation:
Encoder | Description | Input Encode | Output Encode | Input Decode | Output Decode |
---|---|---|---|---|---|
NoOpEncoder | does not change the input data | mixed |
mixed |
mixed |
mixed |
CloneEncoder | clones objects on decode | mixed |
mixed |
mixed |
mixed |
JsonEncoder | encodes data as JSON | array |
string |
string |
array |
YamlEncoder | encodes data as YAML | array |
string |
string |
array |
SymfonyEncoder | transforms objects to array | object |
array |
array |
object |
SymfonyToJsonEncoder | transforms object to json | object |
array |
array |
string |
SymfonyToArrayEncoder | transforms objects to array | object |
array |
array |
array |
ArrayToJsonEncoder | transforms array to json | array |
array |
array |
string |
04.01 NoOpEncoder
The NoOpEncoder
simply does nothing. It takes an argument and passes the same object back to the consumer on both
methods (encode
and decode
).
Notice: It will pass arguments by value, not by reference, so if you pass a non-object value, it will NOT update the input variable as long as you don't override it manually.
// examples/04_01_01_noop_encoder_array.php $data = [ 'order' => [ 'person' => [ 'first_name' => 'John', 'last_name' => 'Doe', ], ], ]; $anonymizedData = $anonymizer->run('order', $data, 'noop');
Example output for the noop encoder on an array.
Check $data == $anonymizedData
bool(false)
Check $data === $anonymizedData
bool(false)
// examples/04_01_02_noop_encoder_object.php $person = new Person( 'John', 'Doe' ); $data = [ 'order' => [ 'person' => $person, ], ]; $anonymizedData = $anonymizer->run('order', $data, 'noop');
Example output for the noop encoder on an object.
Check $data == $anonymizedData
bool(true)
Check $data === $anonymizedData
bool(true)
04.02 CloneEncoder
When calling the decode
method of the CloneEncoder
, this encoder creates a COPY of the input values.
By this mean, the Anonymizer will update the objects within the data to be anonymized on a new copy of the object, rather than manipulating the initial object.
As the clone
keyword only creates a shallow copy of the top level object, we use the very popular library
myclabs/deep-copy
to achieve a recursive cloning of all nested objects.
// examples/04_02_01_clone_encoder_change.php $anonymizer->registerRuleSet( 'order', [ 'person.firstName', 'person.lastName', ], DataAccess::AUTODETECT->value, ); $anonymizedData = $anonymizer->run('order', $data, 'clone');
Example output for the clone encoder after data has been changed.
Check $data == $anonymizedData
bool(false)
Check $data === $anonymizedData
bool(false)
In the next step, we change our rule set to not modify any data within the cloned object.
// examples/04_02_02_clone_encoder_no_change.php $anonymizer->registerRuleSet( 'order', [], DataAccess::AUTODETECT->value, ); $anonymizedData = $anonymizer->run('order', $data, 'clone');
Example output for the clone encoder after data hasn't been changed.
Check $data == $anonymizedData
bool(true)
Check $data === $anonymizedData
bool(false)
04.03 JsonEncoder
The JsonEncoder
is an encoder that can help you handle json data. With this encoder it is possible to modify sensitive
data directly within the json document's string representation.
For the encoder to work, the json
php extension is required (which is part of the php core since 8.0 anyway).
The decode
method will transform a json string
into an array
, the encode
method will transform an array
back
into json string
notation (single line without PRETTY_PRINT).
04.04 YamlEncoder
The YamlEncoder
is an encoder that can help you handle yaml data. With this encoder it is possible to modify sensitive
data directly within the yaml document's string representation.
For the encoder to work, the yaml
php extension is required. This can be installed via pecl, for instance. Some Linux
distributions also offer pre-compiled packages as an alternative to manual building already.
The decode
method will transform a yaml string
into an array
, the encode
method will transform an array
back
into yaml string
notation.
04.05 SymfonyEncoder
The last encoder is the SymfonyEncoder
. This encoder is a bit more complex than the others, as it is able to transform
objects into arrays and vice versa.
For this encoder to work, you will need to have the symfony/serializer
package installed and have to setup a
Normalizer and a Denormalizer that follow Symfony's NormalizerInterface and DenormalizerInterface (e.g.
ObjectNormalizer).
To make the SymfonyEncoder
work, it is essential that your object can be normalized and denormalized properly by using
these Normalizer and Denormalizer objects.
You can install the symfony/serializer
package via composer:
composer require symfony/serializer
06 Extended Information
06.01 Manual setup of Anonymizer
06.01.01 RuleSet parser
// examples/06_01_manual_setup.php use PhpAnonymizer\Anonymizer\Parser\Node\ComplexRegexParser; use PhpAnonymizer\Anonymizer\Parser\RuleSet\DefaultRuleSetParser; // 06.01.01 RuleSet parser /** * RuleSetParser: * - must implement PhpAnonymizer\Anonymizer\Parser\RuleSet\RuleSetParserInterface * * DefaultRuleSetParser: * - takes optional argument $nodeParser: * - must implement PhpAnonymizer\Anonymizer\Parser\Node\NodeParserInterface * - defaults to PhpAnonymizer\Anonymizer\Parser\Node\SimpleRegexParser */ $ruleSetParser = new DefaultRuleSetParser( nodeParser: new ComplexRegexParser(), );
06.01.02 DependencyChecker
// examples/06_01_manual_setup.php use PhpAnonymizer\Anonymizer\Dependency\DefaultDependencyChecker; // 06.01.02 DependencyChecker /** * DependencyChecker: * - must implement PhpAnonymizer\Anonymizer\Dependency\DependencyCheckerInterface * * DefaultDependencyChecker: * - takes no arguments * - needs composer to work */ $dependencyChecker = new DefaultDependencyChecker();
06.01.03 DataAccessProvider
// examples/06_01_manual_setup.php use PhpAnonymizer\Anonymizer\DataAccess\ArrayDataAccess; use PhpAnonymizer\Anonymizer\DataAccess\Provider\DefaultDataAccessProvider; use PhpAnonymizer\Anonymizer\DataAccess\Provider\Factory\DefaultDataAccessProviderFactory; // 06.01.03 DataAccessProvider /** * DataAccessProvider: * - must implement PhpAnonymizer\Anonymizer\DataAccess\Provider\DataAccessProviderInterface * - can be created via factory * * DefaultDataAccessProvider: * - takes no arguments * - can have custom data access implementations registered via registerCustomDataAccess method */ $dataAccessProvider = new DefaultDataAccessProvider(); $dataAccessProvider->registerCustomDataAccess('custom', new ArrayDataAccess()); /** * DataAccessProviderFactory: * - must implement PhpAnonymizer\Anonymizer\DataAccess\Provider\Factory\DataAccessProviderFactoryInterface * * DefaultDataAccessProviderFactory: * - takes no arguments * - can have custom data access provider implementations registered via registerCustomDataAccessProvider method */ $provider = new DefaultDataAccessProviderFactory(); $provider->registerCustomDataAccessProvider('custom', $dataAccessProvider); $dataAccessProvider = $provider->getDataAccessProvider('custom');
06.01.04 DataGenerationProvider
// examples/06_01_manual_setup.php use PhpAnonymizer\Anonymizer\DataEncoding\Provider\DefaultDataEncodingProvider; use PhpAnonymizer\Anonymizer\DataGeneration\FakerAwareStringGenerator; use PhpAnonymizer\Anonymizer\DataGeneration\Provider\DefaultDataGeneratorProvider; use PhpAnonymizer\Anonymizer\DataGeneration\Provider\Factory\DefaultDataGenerationProviderFactory; // 06.01.04 DataGenerationProvider /** * DataGenerationProvider: * - must implement PhpAnonymizer\Anonymizer\DataGeneration\Provider\DataGenerationProviderInterface * - can be created via factory * * DefaultDataGeneratorProvider: * - takes required parameter $generators: * - must be an array of PhpAnonymizer\Anonymizer\DataGeneration\DataGeneratorInterface * - takes optional argument $dependencyChecker: * - must implement PhpAnonymizer\Anonymizer\Dependency\DependencyCheckerInterface * - defaults to PhpAnonymizer\Anonymizer\Dependency\DefaultDependencyChecker * - can have custom data generator implementations registered via registerCustomDataGenerator method */ $dataGenerationProvider = new DefaultDataGeneratorProvider( generators: [ new FakerAwareStringGenerator(new StarMaskedStringGenerator()), ], dependencyChecker: $dependencyChecker, ); $dataGenerationProvider->registerCustomDataGenerator(new StarMaskedStringGenerator()); /** * DataGenerationProviderFactory: * - must implement PhpAnonymizer\Anonymizer\DataGeneration\Provider\Factory\DataGenerationProviderFactoryInterface * * DefaultDataGenerationProviderFactory: * - takes no arguments * - can have custom data generation provider implementations registered via registerCustomDataGenerationProvider method */ $dataGenerationProviderFactory = new DefaultDataGenerationProviderFactory(); $dataGenerationProviderFactory->registerCustomDataGenerationProvider('custom', $dataGenerationProvider); $dataGenerationProvider = $dataGenerationProviderFactory->getDataGenerationProvider('custom');
06.01.05 DataEncodingProvider
// examples/06_01_manual_setup.php use PhpAnonymizer\Anonymizer\DataEncoding\JsonEncoder; use PhpAnonymizer\Anonymizer\DataEncoding\Provider\DefaultDataEncodingProvider; // 06.01.05 DataEncodingProvider /** * DataEncodingProvider: * - must implement PhpAnonymizer\Anonymizer\DataEncoding\Provider\DataEncodingProviderInterface * * DefaultDataEncodingProvider: * - takes optional argument $normalizer: * - must implement Symfony\Component\Serializer\Normalizer\NormalizerInterface * - defaults to null * - takes optional argument $denormalizer: * - must implement Symfony\Component\Serializer\Normalizer\DenormalizerInterface * - defaults to null * - takes optional argument $dependencyChecker: * - must implement PhpAnonymizer\Anonymizer\Dependency\DependencyCheckerInterface * - defaults to PhpAnonymizer\Anonymizer\Dependency\DefaultDependencyChecker * - can have custom data encoder implementations registered via registerCustomDataEncoder method */ $dataEncodingProvider = new DefaultDataEncodingProvider(); $dataEncodingProvider->registerCustomDataEncoder('custom', new JsonEncoder());
06.01.06 DataProcessor
// examples/06_01_manual_setup.php use PhpAnonymizer\Anonymizer\Processor\DefaultDataProcessor; use PhpAnonymizer\Anonymizer\Processor\Factory\DefaultDataProcessorFactory; // 06.01.06 DataProcessor /** * DataProcessor: * - must implement PhpAnonymizer\Anonymizer\DataProcessing\DataProcessorInterface * - can be created via factory * * DefaultDataProcessor: * - takes required parameter $dataAccessProvider: * - must implement PhpAnonymizer\Anonymizer\DataAccess\Provider\DataAccessProviderInterface * - takes required parameter $dataGenerationProvider: * - must implement PhpAnonymizer\Anonymizer\DataGeneration\Provider\DataGenerationProviderInterface * - takes required parameter $dataEncodingProvider: * - must implement PhpAnonymizer\Anonymizer\DataEncoding\Provider\DataEncodingProviderInterface */ $dataProcessor = new DefaultDataProcessor( dataAccessProvider: $dataAccessProvider, dataGenerationProvider: $dataGenerationProvider, dataEncodingProvider: $dataEncodingProvider, ); /** * DataProcessorFactory: * - must implement PhpAnonymizer\Anonymizer\Processor\Factory\DataProcessorFactoryInterface * * DefaultDataProcessorFactory: * - takes no arguments * - can have custom data processor implementations registered via registerCustomDataProcessor method */ $processorFactory = new DefaultDataProcessorFactory(); $processorFactory->registerCustomDataProcessor('custom', $dataProcessor); $dataProcessor = $processorFactory->getDataProcessor('custom');
06.01.07 Anonymizer
// examples/06_01_manual_setup.php use PhpAnonymizer\Anonymizer\Anonymizer; // 06.01.07 Anonymizer /** * Anonymizer: * - must implement PhpAnonymizer\Anonymizer\AnonymizerInterface * * Anonymizer: * - takes required parameter $ruleSetParser: * - must implement PhpAnonymizer\Anonymizer\Parser\RuleSet\RuleSetParserInterface * - takes required parameter $dataProcessor: * - must implement PhpAnonymizer\Anonymizer\DataProcessing\DataProcessorInterface */ $anonymizer = new Anonymizer( ruleSetParser: $ruleSetParser, dataProcessor: $dataProcessor, );