hschletz / formotron
Form processing and validation
Requires
- php: ^8.1 || ^8.2 || ^8.3
- psr/container: ^1.0 || ^2.0
Requires (Dev)
- bamarni/composer-bin-plugin: ^1.8
- phpunit/phpunit: ^10
This package is auto-updated.
Last update: 2024-12-29 11:17:05 UTC
README
Formotron processes an array of input data into an application-defined data object. Processing rules are defined by the data object itself, via property types and custom attributes.
This documentation focuses on form data, but input data can be any associative array, including but not limited to:
- submitted form data (
$_POST
) - URI query parameters (
$_GET
) - database records
- decoded JSON data
Key features
- Easy to use, concise, reusable and self-documenting: Processing rules are defined by the data object itself, having all logic gathered in a single place.
- Always valid: The data object is instantiated, populated and validated in a single step. Invalid data will cause an exception. Calling code never sees an object in an invalid state.
- Extensible: Define your own processing rules and attach them to the data object via attributes.
- Integrates seamlessly: Write your own validators or use external validation libraries. Message formatting is left to the application, allowing usage of the applications's established i18n framework.
- DI friendly: Data processors and validators are instantiated from a container, allowing them to have arbitrary dependencies injected.
- Robust: Formotron leverages PHP features like property types, attributes and enums to have validation rules enforced at language level where possible. Formotron does not rely on docblock comments.
Installation
Add Formotron to your project via composer:
composer require hschletz/formotron
Formotron requires an adequately configured
PSR-11 compatible container to
instantiate transformers, validators and preprocessors. The concrete
implementation does not matter as long as it implements
Psr\Container\ContainerInterface
. If your application already uses a
container, you can use just that.
Usage
The main interface is the Formotron\DataProcessor
class. First, set up a
PSR-11 container and pass it to the constructor:
$dataProcessor = new \Formotron\DataProcessor($container);
Many containers are configured to resolve Psr\Container\ContainerInterface
to
itself. With proper configuration (autowiring or a factory), the form processor
can be instantiated from the container:
$dataProcessor = $container->get(\Formotron\DataProcessor::class);
In practice, the form processor will likely not be instantiated directly, but injected into a class by the container:
// MyClass will be instantiated by the container class MyClass { public function __construct(private \Formotron\DataProcessor $dataProcessor) { } }
The DataProcessor
class has a single public method process()
which receives
the input data array and the name of a data object class. It returns a fully
populated instance of that class or throws a
Formotron\AssertionFailedException
if input data is invalid.
$data = $dataProcessor->process($_POST, FormData::class);
Defining a data object
A data object is an object whose properties are populated from a corresponding input field. By default, for every property defined in the class, a key with the same name must be present in the input array. Its value will be assigned to the property. Example:
class DataObject { public $foo; public $bar; } $input = [ 'foo' => 'value1', 'bar' => 'value2', ]; $dataObject = $dataProcessor->process($input, DataObject::class);
The data object is constructed via reflection. Properties need not be declared public. Protected and private properties can be populated too. A constructor is bypassed.
It's strongly recommended to declare a data type for properties to enable
additional functionality and to avoid unexpected behavior. If a distinct type is
not suitable, a property can be explicitly declared as mixed
. This has
advantages over untyped properties in regards to handling default values (see
below).
Data type handling
string
String properties accept any input value that can be safely converted:
- string
- int, float (default PHP conversion, not locale-specific)
- objects with a
__toString()
method
array
Array objects accept only array input values. Types of array values are not checked.
enum
Enum properties accept strings that match a defined enum value. This is very useful for constraining values to a limited set (for example, radio buttons and dropdowns with a static set of values). Valid input values are defined by the type of enum:
- Basic enums accept the symbolic name (case sensitive).
- Backed enums accept the backing value only, but not the name.
enum Basic { case Foo; case Bar; } enum Backed: string { case Foo = 'foo'; case Bar = 'bar'; } class DataObject { public Basic $basic; // valid values: 'Foo', 'Bar' public Backed $backed; // valid values: 'foo', 'bar' }
class
Properties of a class type accept only instances of that class or a subclass.
mixed
Properties declared as mixed
or without a type accept any input value.
Other types
Other types are not supported yet. They will be implemented in the future. Until
then, use mixed
for any property with an unsupported type.
Missing input fields and default values
The input array must contain keys for all properties defined in the data object. Arrays with missing keys are invalid. This can be changed by setting a default value:
class DataObject { public string $foo = 'bar'; } $dataObject = $dataProcessor->process([], DataObject::class); // valid, $foo will be 'bar'.
If the key is present, its value will be processed as usual.
For untyped properties, a limitation in the Reflection API prevents distinguishing between an explicit and an implicit default of NULL. To avoid bugs resulting from missing keys, Formotron does not support a default of NULL for untyped properties, even where PHP would:
class DataObject { public $foo; // implicit default NULL public $bar = null; } $dataObject = $dataProcessor->process([], DataObject::class); // Both keys would be treated as missing.
If you need a default of NULL and cannot declare a specific type, declare the
property as mixed
which does not suffer from this limitation.
Unmappable input fields
Input arrays with fields that cannot be mapped to a property in the data object are invalid. This helps preventing bugs where input data processing may be incomplete. If you need to ignore unmappable keys, remove them from the input array first.
Using different names for keys and properties
Sometimes, the direct mapping between property names and input array keys may
not be appropriate. A different name may be more meaningful in a different
context. The mapping can be changed by setting the Key
attribute on a
property.
use Formotron\Attribute\Key; class DataObject { #[Key('bar')] public string $foo; } $dataObject = $dataProcessor->process(['bar' => 'baz'], DataObject::class);
In this example, $foo
will receive its value from the bar
field. foo
is
now an invalid key, unless mapped to a different property. This may become
confusing though, and is not recommended.
Transforming values
Values from the input array may not be suitable for in-application representation. Further processing is often necessary, like
- trimming surrounding whitespace from a string
- converting empty strings to NULL
- converting a timestamp string to a
DateTime
orDateTimeImmutable
object - fetching an object from a database using the input value as key
This is supported by attaching a transformer to a property. A transformer is a
class implementing the Formotron\Transformer
interface:
interface Transformer { public function transform(mixed $value): mixed; }
Implementations can do anything with the input value: leave it as is, modify it, throw an exception, ... The returned value is still subject to the property's validation rules.
Transformers are attached to a property via the Transform
attribute. It takes
the name of a service – typically a class name – which will be pulled from the
container supplied to the DataProcessor's constructor. The container must
resolve the name to an object implementing the Transformer
interface.
use Formotron\Attribute\Transform; // The container must resolve Trim::class to an instance of this class. class Trim implements Formotron\Transformer { public function transform(mixed $value): mixed { if (!is_string($value)) { throw new Formotron\AssertionFailedException('not a string'); } return trim($value); } } class DataObject { #[Transform(Trim::class)] public string $foo; }
Only one transformer can be attached to a single property. Otherwise the order of execution could not be relied upon, and the outcome would become hard to predict. If you need multple transformations, (for example, trim whitespace first, only then convert an empty string to NULL), write a transformer that does all transformations in a single step.
Transformers are run before validation. Only the output of the transfomer is subject to validation rules, not the raw input value.
use Formotron\Attribute\Transform; class UserMapper implements Formotron\Transformer { public function transform(mixed $value): mixed { // fetch_user_from_database() accepts a string/int key and returns // a corresponding User object, or throws an exception if the key is // invalid. return fetch_user_from_database($value); } } class DataObject { #[Transform(UserMapper::class)] public User $user; }
In this example, the user
key must hold a string/int value (whatever the
database uses as key), and the data object will receive the resulting object,
which must be of the User
class or a subclass of User
.
Because validation occurs only after transformation, transformer implementations must account for potentially invalid data.
Validating values
Input data often comes from an untrusted source and must be validated before being consumed by the application. Even with trusted sources, sanity checks improve application robustness.
Basic validation is already achieved by declaring properties with a datatype. Formotron (and, as a last resort, PHP itself) will guarantee that the property is populated with a value that fits its datatype, or can be safely converted.
Validation often goes beyond simple type checks. An input value for a string property may be a string, but not satisfy a length constraint. A value may be constrained to a list of valid values that, unlike enums, is determined dynamically at runtime.
Validation rules can become rather complex. Arbitrary rules can be checked by a
validator. A validator is a class implementing the Formotron\Validator
interface:
interface Validator { public function getValidationErrors(mixed $value): array; }
Implementations receive the input value with all transformations already applied, do all necessary checks and return the result. The return value is a list of validation errors. If the input value is valid, an empty array must be returned. The input value may fail more than one check, in which case multiple errors can be reported simultaneously.
Individual error values can be anything like a simple string, a symbolic error code, or an error object holding a message template and message arguments. Errors are currently not further evaluated, so a simple debug message may be sufficient at the moment. Support for detailed error reporting is planned for the future. Until then, a validator may throw a custom exception which supplies datailed error information and can be caught and evaluated by the calling code.
Validators are attached to a property via the Assert
attribute. It takes the
name of a service – typically a class name – which will be pulled from the
container supplied to the DataProcessor's constructor. The container must
resolve the name to an object implementing the Validator
interface.
use Formotron\Attribute\Assert; // The container must resolve MaxLength::class to an instance of this class. class MaxLength implements Formotron\Validator { public function getValidationErrors(mixed $value): array { if (is_string($value) && mb_strlen($value) <= 100) { return []; } else { return ['Maximum length exceeded']; } } } class DataObject { #[Assert(MaxLength::class)] public string $foo; }
Multiple validators can be attached to a single property, but the order of execution should not be relied upon. If order is significant, write a validator that does all necessary checks in a single step.
Unlike similar packages, Formotron does not ship with any validator
implementations, but only provides the validation framework. Many validators are
trivial and easy to implement. You can still use an external validation library
and wrap its functions in a Validator
object.
Preprocessing input data
Transformers and validators operate on individual data object properties and their associated input array elements, isolated and independent of each other. They have no knowledge of the context in which the input value was submitted.
Sometimes this is not sufficient. Some processing rules need access to the whole input array, or the array needs modification before individual fields can be processed. Common operations include
- removing keys that are not mapped to the data object
- providing non-static default values for missing keys
- transformations or validations that depend on more than one field
This could be done manually before calling DataProcessor::process()
, but that
would have several drawbacks:
- There is now a contract to follow, which must be explicitly documented and enforced.
- The extra processing is disconnected from the data object. Everything else is defined in the data object class via property types and attributes. While the actual implementation is located in transformer and validator objects, the general processing logic is outlined in the data object class, but the preprocessing logic is not.
- The calling code becomes more complex and difficult to test.
Instead, preprocessing should be done in a class that implements the
Formotron\PreProcessor
interface:
interface PreProcessor { public function process(array $formData): array; }
Implementations receive the input array (or the output of another preprocessor) and return an array that will be fed to the next preprocessor (if any) and finally used to populate the data object. The implementation can do anything: add or remove keys, modify values, replace the whole array, validate data and throw exceptions, ...
Preprocessors are attached to the data object class via the PreProcess
attribute. It takes the name of a service – typically a class name – which will
be pulled from the container supplied to the DataProcessor's constructor. The
container must resolve the name to an object implementing the PreProcessor
interface.
Multiple preprocessors can be attached to a single class, but the order of execution should not be relied upon. If order is significant, write a preprocessor that does all necessary operations in a single step.
The following example demonstrates the use of a preprocessor for CSRF protection. The form data contains a token, which is compared to the token stored in session data. (The actual implementation is not provided here.) Once validation has succeeded, the token is no longer useful and therefore not part of the data object.
use Formotron\Attribute\PreProcess; // The container must resolve CsrfProtection::class to an instance of this class. class CsrfProtection implements Formotron\PreProcessor { public function process(array $formData): array { if (!is_token_valid($formData['token'] ?? '')) { throw new Formotron\AssertionFailedException('invalid token'); } // The 'token' key is not mapped to a property and must be removed. unset($formData['token']); return $formData; } } #[PreProcess(CsrfProtection::class)] class DataObject { // Only fields of interest are defined here. // The token is only relevant for the preprocessor. public string $foo; }
Error handling
Handling logical errors
Most errors reported by Formotron are failed sanity checks where input data is formally incorrect: a value has an unsuitable data type or does not map to an enum. This typically means that something went seriously wrong: A bug in the backend, or malformed input data that could not have originated from the expected form.
These errors are reported via a Formotron\AssertionFailedException
.
Transformers, validators and preprocessors can also throw this exception for
hard errors. In fact, Formotron does not catch any exceptions, so it's possible
to throw anything that is appropriate.
End-users are not expected to see these errors as a response to incorrect user action, like leaving a required field empty. In most cases, no special handling is required for these exceptions. The application's default exception handler is probably the most appropriate tool to handle them. This is a response to a bug or a technical error, and a standard message, optionally with a backtrace, will convey more useful debugging information than a beautifully formatted error message.
Handling invalid user input
Not all errors are caused by malfunction. Many forms are not immune to user errors: some required fields could be left blank or an entered string may be too long or too short.
Throwing an exception without special handling would be inappropriate for this kind of error. Invalid user input is typically handled by collecting errors and re-displaying the form with error messages.
Formotron does not yet support advanced error handling, but you can implement your own. You could throw a custom exception in a transformer, validator or preprocessor, and catch it:
try { $formData = $dataProcessor->process($input, DataObject::class); } catch (MyCustomException $exception) { // Handle invalid user input }
This has the disadvantage of aborting on the first error. Additional errors would stay unnoticed until the form is re-submitted with valid data for the first field. As an alternative, instead of throwing an exception, you could collect errors somewhere and continue, and evaluate the errors afterwards. However, this breaks the promise of an always valid data object, and there is now a contract to follow. This can be avoided by wrapping this kind of error handling in a reusable method:
function process(array $input, string $className) { $dataObject = $this->dataProcessor->process($input, $className); $errors = $this->getErrors(); // retrieve collected errors if ($errors) { // Report detailed errors to calling code throw new ValidationException($errors); } return $dataObject; }
This wrapper returns a valid data object or throws an exception for invalid input. Calling code can catch the special exception and evaluate the detailed error information contained wihin.
Taking advantage of frontend form validation
Modern web applications often validate form data by themselves, either while typing or immediately before submission. While this frontend validation enhances the user experience, it does not remove the necessity for thorough backend validation. The submitted form data is still untrusted, and some validations can only be done in the backend.
However, with frontend validation in place, backend error handling can be simplified. When it can be assumed that the data has already been validated once and validation errors have been gracefully handled in the frontend, validation failures in the backend can be treated as unexpected errors, because something went wrong:
- Frontend validation has somehow been bypassed, which should not happen if the application is used as intended.
- Mismatch between validation rules, the backend rules are more strict than the frontend rules.
- Some other bug.
These unexpected situations have to be handled, but they don't have to be handled nicely. Like with logical errors, simply throwing an exception and have regular exception handling deal with it may be the best choice. Complex detailed handling of invalid user input is only necessary for failures that are not covered in the frontend.
There are many JavaScript libraries for validation in the frontend, but HTML form elements already provide attributes for basic constraints that are followed by modern browsers and only need to be verified in the backend.
Handling HTML form data
Submitted HTML form data can be straightforward (most elements produce just a string that contains the entered value), but sometimes surprising and annoying. Knowing how form element values are represented can make transformation and validation in Formotron quite simple.
Form data is generally transmitted as a map of string values. Many elements constrain the format of their values by their type. Additional constraints can be applied by HTML attributes. These constraints are an excellent starting point for validation in the backend, and it is recommended to make use of them as much as possible. Anything that does not fullify these constraints can be handled as an unexpected error. More sophisticated error handling is mostly useful for additional constraints that cannot be expressed in HTML.
The procedures below only apply to single values. If a form element's name ends
with []
, the result will be an array of strings. Formotron can only verify
that the input value is an array. The array values themselves are not validated
automatically. This must be done by a custom validator that iterates over the
array and validates the values according to the rules below.
Most HTML inputs support the required
attribute which will by default prevent
form submission if no value is entered. The backend can assume required fields
to be present and non-empty. Formotron will throw an exception if the key is
missing and the data object property has no default value. As with all frontend
validation, it cannot be relied upon on its own.
Frontend validation via HTML attributes only affects default form submission. JavaScript code may send form data bypassing builtin validations.
Text, password, search and telephone inputs
<input type="text">
produces a string with the element's value. The
minlength
, maxlength
and pattern
attributes can constrain the content.
<input type="password">
, <input type="search">
and <input type="tel">
behave the same way in regards to submitted data. The only difference is how
browsers may style the element and control interaction with the user.
<textarea>
supports only the minlength
and maxlength
attributes, but not
pattern
.
Number and range inputs
<input type="number">
produces a string that can be parsed as a number or an
empty string. The max
, min
and step
attributes further constrain the range
of possible values.
The number is represented in a non-localized format without thousands separator.
Integer values for the step
attribute (default is 1) allow integers only.
Fractional step
values allow integer and fractional input values, the latter
with a dot as fractional separator, regardless of locale.
<input type="range">
behaves identically in regards to submitted data. The
only difference is how browsers display the element.
Hidden elements
<input type="hidden">
produces a string with the element's value.
Interpretation is up to the application. There are no client-side constraints.
Checkboxes
<input type="checkbox">
produces a string with the element's value
attribute
if the checkbox is checked. The field will be missing completely otherwise.
Generating a single checked/unchecked value is more complex than for other
element types.
Some applications and frameworks deal with this problem by adding a hidden element to the form with the same name as the checkbox element and the unchecked value, so that the key will always be present. This moves the logic to the form markup, which is not the best place for this implementation detail.
Formotron can handle checkbox data (and the lack thereof) without this hack. To handle the missing key, the unchecked value must be set as default on the property. A transformer must handle the checked value.
use Formotron\Attribute\Transform; class Exists implements Formotron\Transformer { public function transform(mixed $value): mixed { // This is only encountered if the checkbox is checked, so we can ignore // the actual value unless this transformer may be used in a different // context. return true; } } class DataObject { #[Transform(Exists::class)] public $foo = false; // Default to unchecked value to allow missing key }
Checkboxes are often mapped to true
/false
, but any other pair of values can
be used the same way. When using a string, the transformer may not be necessary,
but the checked value must be validated. Alternatively, an enum with 2 values
can be used without transformers and validators:
enum Checkbox { case Checked; // The checkbox element must set its value attribute to "Checked" case Unchecked; } class DataObject { public Checkbox $foo = Checkbox::Unchecked; }
Radio buttons
<input type="radio">
produces a string with the value of the selected element,
or no data at all if no button was selected. The latter can only happen if no
button was preselected via the checked
attribute. In that case the missing
array key must be accounted for, for example with a default value for the data
object property.
Radio button groups with a static set of values can be easily mapped to an enum. If a string property is used or the set of values is generated at runtime, validation must be done manually.
Dropdowns
<select>
produces a string with the value of the selected entry. Options with
a static set of values can be easily mapped to an enum. If a string property is
used or the set of values is generated at runtime, validation must be done
manually.
One option is always selected - the first if no option has the selected
attribute. An empty option gives the illusion of no selection, but its value (or
an empty string when it has no value
attribute) is submitted. This must be
accounted for when processing the value. The appropriate strategy depends on
your use case. The most straightforward strategies are:
- Define an enum, with the "empty" value as a member. Empty strings are supported by backed enums only, other values can be mapped to any enum type.
- Define the data object property as nullable, and apply a transformer to convert the "empty" value to NULL.
E-mail addresses
<input type="email">
produces a string that contains a single e-mail address,
a comma-separated list of addresses (if the multiple
attribute is set) or an
empty string. The minlength
, maxlength
and pattern
attributes can
constrain the value further.
Input is validated to be a syntactically correct e-mail address, but the address
is not guaranteed to exist. The validation ruleset may also be incomplete.
Validating e-mail addresses is tricky, with some non-obvious rules and edge
cases. It is recommended to wrap an existing validation library (don't write
your own, it will most likely be incorrect) in a Validator
class and handle
failures more gracefully than just throwing an exception.
URL
<input type="url">
produces a string that contains a syntactically valid URL or an
empty string. The minlength
, maxlength
and pattern
attributes can
constrain the value further.
Validation is limited to syntax checking. The URL does not have to lead to an existing ressource, or even contain a meaningful scheme. The backend may need to validate the URL more thoroughly:
- Check for adequate scheme (
http
orhttps
in most cases) - Try to contact the ressource (for example, try to make a HTTP request)
Failures should be handled more gracefully than just throwing an exception.
Date/time pickers
Date/time pickers produce a string that can parsed with the formats below, or an
empty string if no value was entered. The max
, min
and step
attributes
further constrain the range of possible values.
<input type="date">
:Y-m-d
<input type="time">
:H:i
orH:i:s
depending on thestep
attribute<input type="datetime-local">
Y-m-d\TH:i
orY-m-d\TH:i:s
depending on thestep
attribute
Month/week pickers
<input type="month">
and <input type="week">
are not recommended because
Firefox will just show a text input that allows entering arbitrary text. With
browsers that support it, they produce a string that can be parsed as Y-m
(month) or Y-\WW
(week) or an empty string. The max
, min
and step
attributes further constrain the range of possible values. Because of the
possibility of invalid input, validation failure should be handled more
gracefully than just by throwing an exception.
Color pickers
<input type="color">
produces a hash sign followed by a 6-digit hexadecimal
RGB value, like #aabbcc
, or an empty string.
Submit buttons
<input type="submit">
does not produce any data by default. If a name
attribute is provided. it will be present as a key, with the button's value
(or default text, if missing). However, the double role of the value
attribute
as a label and form data is bad design, particularly with multilingual user
interfaces.
<button type="submit">
is a better alternative. It does not produce any data
by default. The name
and value
attributes behave the same way, but value
is used only for form data (empty by default), not as a label.
Either way, the name
attribute should only be set to distinct multiple
buttons. With only 1 button, the resulting form data element would just be
meaningless noise which has to be removed before processing. By using multiple
buttons with the same name
but different value
, it is possible to determine
which button was clicked and take appropriate action:
<button type="submit" name="action" value="this">Do This</button> <button type="submit" name="action" value="that">Do That</button>
Now the action
field can be evaluated to determine what to do with the form
data. An enum is a simple way to represent and validate the action.
It might be even simpler to assign different handlers to the buttons via the
formaction
attribute and keep the button out of the form data (no name
attribute).
Other elements
Some elements do not produce any form data and are mentioned here for completeness only:
<input type="file">
is only useful for forms withmultipart/formdata
encoding, in which case uploaded files will be available via$_FILES
and handled entirely different. With this encoding,$_POST
will not contain any data for this element.<input type="image">
adds the clicked coordinates of the image in thex
andy
keys, optionally prefixed by thename
attribute. Not very useful unless you want to implement some sort of click map.<input type="reset">
does not produce any form data.