ivuorinen/monolog-gdpr-filter

Monolog processor for GDPR masking with regex and dot-notation paths

Installs: 26

Dependents: 0

Suggesters: 0

Security: 0

Stars: 1

Watchers: 0

Forks: 0

Open Issues: 2

pkg:composer/ivuorinen/monolog-gdpr-filter

1.0.0 2025-07-28 12:14 UTC

README

A PHP library providing a Monolog processor for GDPR compliance. Mask, remove, or replace sensitive data in logs using regex patterns, field-level configuration, custom callbacks, and advanced features like streaming, rate limiting, and k-anonymity.

Features

Core Masking

  • Regex-based masking for patterns like SSNs, credit cards, emails, IPs, and more
  • Field-level masking using dot-notation paths with flexible configuration
  • Custom callbacks for advanced per-field masking logic
  • Data type masking to mask values based on their PHP type
  • Serialized data support for JSON, print_r, var_export, and serialize formats

Enterprise Features

  • Fluent builder API for readable processor configuration
  • Streaming processor for memory-efficient large file processing
  • Rate-limited audit logging to prevent log flooding
  • Plugin system for extensible pre/post-processing hooks
  • K-anonymity support for statistical privacy guarantees
  • Retry and recovery with configurable failure modes
  • Conditional masking based on log level, channel, or context

Framework Integration

  • Monolog 3.x compatible with ProcessorInterface implementation
  • Laravel integration with service provider, middleware, and console commands
  • Audit logging for compliance tracking and debugging

Requirements

  • PHP 8.2 or higher
  • Monolog 3.x

Installation

composer require ivuorinen/monolog-gdpr-filter

Quick Start

use Monolog\Logger;
use Monolog\Handler\StreamHandler;
use Monolog\Level;
use Ivuorinen\MonologGdprFilter\GdprProcessor;
use Ivuorinen\MonologGdprFilter\FieldMaskConfig;

// Create processor with default GDPR patterns
$processor = new GdprProcessor(
    patterns: GdprProcessor::getDefaultPatterns(),
    fieldPaths: [
        'user.email' => FieldMaskConfig::remove(),
        'user.ssn' => FieldMaskConfig::replace('[REDACTED]'),
    ]
);

// Integrate with Monolog
$logger = new Logger('app');
$logger->pushHandler(new StreamHandler('app.log', Level::Warning));
$logger->pushProcessor($processor);

// Sensitive data is automatically masked
$logger->warning('User login', [
    'user' => [
        'email' => 'john@example.com',  // Will be removed
        'ssn' => '123-45-6789',         // Will be replaced with [REDACTED]
    ]
]);

Core Concepts

Regex Patterns

Define regex patterns to mask sensitive data in log messages and context values:

$patterns = [
    '/\b[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Z|a-z]{2,}\b/' => '***EMAIL***',
    '/\b\d{3}-\d{2}-\d{4}\b/' => '***SSN***',
    '/\b\d{4}[\s-]?\d{4}[\s-]?\d{4}[\s-]?\d{4}\b/' => '***CARD***',
];

$processor = new GdprProcessor(patterns: $patterns);

Use GdprProcessor::getDefaultPatterns() for a comprehensive set of pre-configured patterns covering SSNs, credit cards, emails, phone numbers, IBANs, IP addresses, and more.

Field Path Masking (FieldMaskConfig)

Configure masking for specific fields using dot-notation paths:

use Ivuorinen\MonologGdprFilter\FieldMaskConfig;

$fieldPaths = [
    // Remove field entirely from logs
    'user.password' => FieldMaskConfig::remove(),

    // Replace with static value
    'payment.card_number' => FieldMaskConfig::replace('[CARD]'),

    // Apply processor's regex patterns to this field
    'user.bio' => FieldMaskConfig::useProcessorPatterns(),

    // Apply custom regex pattern
    'user.phone' => FieldMaskConfig::regexMask('/\d{3}-\d{4}/', '***-****'),
];

Custom Callbacks

Provide custom masking functions for complex scenarios:

$customCallbacks = [
    'user.name' => fn($value) => strtoupper(substr($value, 0, 1)) . '***',
    'user.id' => fn($value) => hash('sha256', (string) $value),
];

$processor = new GdprProcessor(
    patterns: [],
    fieldPaths: [],
    customCallbacks: $customCallbacks
);

Basic Usage

Direct GdprProcessor Usage

use Ivuorinen\MonologGdprFilter\GdprProcessor;
use Ivuorinen\MonologGdprFilter\FieldMaskConfig;

$processor = new GdprProcessor(
    patterns: GdprProcessor::getDefaultPatterns(),
    fieldPaths: [
        'user.ssn' => FieldMaskConfig::remove(),
        'payment.card' => FieldMaskConfig::replace('[REDACTED]'),
        'contact.email' => FieldMaskConfig::useProcessorPatterns(),
    ],
    customCallbacks: [
        'user.name' => fn($v) => strtoupper($v),
    ],
    auditLogger: function($path, $original, $masked) {
        // Log masking operations for compliance
        error_log("Masked: $path");
    },
    maxDepth: 100,
);

Using GdprProcessorBuilder (Recommended)

The builder provides a fluent, readable API:

use Ivuorinen\MonologGdprFilter\Builder\GdprProcessorBuilder;
use Ivuorinen\MonologGdprFilter\FieldMaskConfig;

$processor = GdprProcessorBuilder::create()
    ->withDefaultPatterns()
    ->addPattern('/custom-secret-\w+/', '[SECRET]')
    ->addFieldPath('user.email', FieldMaskConfig::remove())
    ->addFieldPath('user.ssn', FieldMaskConfig::replace('[SSN]'))
    ->addCallback('user.id', fn($v) => hash('sha256', (string) $v))
    ->withMaxDepth(50)
    ->withAuditLogger(function($path, $original, $masked) {
        // Audit logging
    })
    ->build();

Advanced Features

Conditional Masking

Apply masking only when specific conditions are met:

use Ivuorinen\MonologGdprFilter\ConditionalRuleFactory;
use Monolog\Level;

$processor = new GdprProcessor(
    patterns: GdprProcessor::getDefaultPatterns(),
    conditionalRules: [
        // Only mask error-level logs
        'error_only' => ConditionalRuleFactory::createLevelBasedRule([Level::Error]),

        // Only mask specific channels
        'app_channel' => ConditionalRuleFactory::createChannelBasedRule(['app', 'security']),

        // Custom condition
        'has_user' => fn($record) => isset($record->context['user']),
    ]
);

Data Type Masking

Mask values based on their PHP type:

use Ivuorinen\MonologGdprFilter\MaskConstants;

$processor = new GdprProcessor(
    patterns: [],
    dataTypeMasks: [
        'integer' => MaskConstants::MASK_INT,
        'double' => MaskConstants::MASK_FLOAT,
        'boolean' => MaskConstants::MASK_BOOL,
    ]
);

Rate-Limited Audit Logging

Prevent audit log flooding in high-volume applications:

use Ivuorinen\MonologGdprFilter\RateLimitedAuditLogger;

$baseLogger = function($path, $original, $masked) {
    // Your audit logging logic
};

// Create rate-limited wrapper (100 logs per minute)
$rateLimitedLogger = new RateLimitedAuditLogger($baseLogger, 100, 60);

$processor = new GdprProcessor(
    patterns: GdprProcessor::getDefaultPatterns(),
    auditLogger: $rateLimitedLogger
);

// Available rate limit profiles via factory
$strictLogger = RateLimitedAuditLogger::create($baseLogger, 'strict');   // 50/min
$defaultLogger = RateLimitedAuditLogger::create($baseLogger, 'default'); // 100/min
$relaxedLogger = RateLimitedAuditLogger::create($baseLogger, 'relaxed'); // 200/min

Streaming Large Files

Process large log files with memory-efficient streaming:

use Ivuorinen\MonologGdprFilter\Streaming\StreamingProcessor;
use Ivuorinen\MonologGdprFilter\MaskingOrchestrator;

$orchestrator = new MaskingOrchestrator(GdprProcessor::getDefaultPatterns());
$streaming = new StreamingProcessor($orchestrator, chunkSize: 1000);

// Process file line by line
$lineParser = fn(string $line) => ['message' => $line, 'context' => []];

foreach ($streaming->processFile('large-app.log', $lineParser) as $maskedRecord) {
    // Write to output file or process further
    fwrite($output, $maskedRecord['message'] . "\n");
}

// Or process to file directly
$formatter = fn(array $record) => json_encode($record);
$count = $streaming->processToFile($records, 'masked-output.log', $formatter);

Laravel Integration

Service Provider

// app/Providers/AppServiceProvider.php
namespace App\Providers;

use Illuminate\Support\ServiceProvider;
use Ivuorinen\MonologGdprFilter\GdprProcessor;
use Ivuorinen\MonologGdprFilter\FieldMaskConfig;

class AppServiceProvider extends ServiceProvider
{
    public function boot(): void
    {
        $processor = new GdprProcessor(
            patterns: GdprProcessor::getDefaultPatterns(),
            fieldPaths: [
                'user.email' => FieldMaskConfig::remove(),
                'user.password' => FieldMaskConfig::remove(),
            ]
        );

        $this->app['log']->getLogger()->pushProcessor($processor);
    }
}

Tap Class

// app/Logging/GdprTap.php
namespace App\Logging;

use Monolog\Logger;
use Ivuorinen\MonologGdprFilter\GdprProcessor;
use Ivuorinen\MonologGdprFilter\FieldMaskConfig;

class GdprTap
{
    public function __invoke(Logger $logger): void
    {
        $processor = new GdprProcessor(
            patterns: GdprProcessor::getDefaultPatterns(),
            fieldPaths: [
                'user.email' => FieldMaskConfig::remove(),
                'payment.card' => FieldMaskConfig::replace('[CARD]'),
            ]
        );

        $logger->pushProcessor($processor);
    }
}

Reference in config/logging.php:

'channels' => [
    'single' => [
        'driver' => 'single',
        'path' => storage_path('logs/laravel.log'),
        'level' => 'debug',
        'tap' => [App\Logging\GdprTap::class],
    ],
],

Console Commands

The library provides Artisan commands for testing and debugging:

# Test a pattern against sample data
php artisan gdpr:test-pattern '/\b\d{3}-\d{2}-\d{4}\b/' 'SSN: 123-45-6789'

# Debug current GDPR configuration
php artisan gdpr:debug

Plugin System

Extend the processor with custom pre/post-processing hooks:

use Ivuorinen\MonologGdprFilter\Contracts\MaskingPluginInterface;
use Ivuorinen\MonologGdprFilter\Builder\GdprProcessorBuilder;

class CustomPlugin implements MaskingPluginInterface
{
    public function getName(): string
    {
        return 'custom-plugin';
    }

    public function getPriority(): int
    {
        return 10; // Lower = earlier execution
    }

    public function preProcessMessage(string $message): string
    {
        // Modify message before masking
        return $message;
    }

    public function postProcessMessage(string $message): string
    {
        // Modify message after masking
        return $message;
    }

    public function preProcessContext(array $context): array
    {
        return $context;
    }

    public function postProcessContext(array $context): array
    {
        return $context;
    }
}

$processor = GdprProcessorBuilder::create()
    ->withDefaultPatterns()
    ->addPlugin(new CustomPlugin())
    ->buildWithPlugins();

Default Patterns Reference

GdprProcessor::getDefaultPatterns() includes patterns for:

Category Data Types
Personal IDs Finnish SSN (HETU), US SSN, Passport numbers, National IDs
Financial Credit cards, IBAN, Bank account numbers
Contact Email addresses, Phone numbers (E.164)
Technical IPv4/IPv6 addresses, MAC addresses, API keys, Bearer tokens
Health Medicare numbers, European Health Insurance Card (EHIC)
Dates Birth dates in multiple formats

Performance Considerations

Pattern Optimization

Order patterns from most specific to most general:

// Recommended: specific patterns first
$patterns = [
    '/\b\d{3}-\d{2}-\d{4}\b/' => '***SSN***',     // Specific format
    '/\b\d+\b/' => '***NUMBER***',                // Generic fallback
];

Memory-Efficient Processing

For large datasets:

  • Use StreamingProcessor for file-based processing
  • Configure appropriate maxDepth to limit recursion
  • Use rate-limited audit logging to prevent memory growth

Pattern Caching

Patterns are validated and cached internally. For high-throughput applications, the library automatically caches compiled patterns.

Troubleshooting

Pattern Not Matching

// Test pattern in isolation
$pattern = '/your-pattern/';
if (preg_match($pattern, $testString)) {
    echo 'Pattern matches';
}

// Validate pattern safety
try {
    GdprProcessor::validatePatternsArray([
        '/your-pattern/' => '***MASKED***'
    ]);
} catch (PatternValidationException $e) {
    echo 'Invalid pattern: ' . $e->getMessage();
}

Performance Issues

  • Reduce pattern count to essential patterns only
  • Use field-specific masking instead of broad regex patterns
  • Profile with audit logging to identify slow operations

Audit Logger Issues

// Safe audit logging (never log original sensitive data)
$auditLogger = function($path, $original, $masked) {
    error_log(sprintf(
        'GDPR Audit: %s - type=%s, masked=%s',
        $path,
        gettype($original),
        $original !== $masked ? 'yes' : 'no'
    ));
};

Testing and Quality

# Run tests
composer test

# Run tests with coverage report
composer test:coverage

# Run all linters
composer lint

# Auto-fix code style issues
composer lint:fix

Security

  • All patterns are validated for safety before use to prevent regex injection attacks
  • The library includes ReDoS (Regular Expression Denial of Service) protection
  • Dangerous patterns with recursive structures or excessive backtracking are rejected

For security vulnerabilities, please see SECURITY.md for responsible disclosure guidelines.

Legal Disclaimer

This library helps mask and filter sensitive data for GDPR compliance, but it is your responsibility to ensure your application fully complies with all applicable legal requirements. This tool is provided as-is without warranty. Review your logging and data handling policies regularly with legal counsel.

Contributing

Contributions are welcome. Please read CONTRIBUTING.md for development setup and guidelines.

License

This project is licensed under the MIT License. See the LICENSE file for details.