acdh-oeaw / acdh-repo
ARCHE repository backend
Requires
- php: ^8.1
- ext-gd: *
- acdh-oeaw/arche-lib: ^7
- guzzlehttp/guzzle: ^7
- php-amqplib/php-amqplib: ^3.1
- sweetrdf/simple-rdf: ^2
- zozlak/auth: ^3
- zozlak/http-accept: >=0.1.0 <1
- zozlak/logging: ^1
- zozlak/rdf-constants: ^1
Requires (Dev)
- dev-master
- 5.4.1
- 5.4.0
- 5.3.6
- 5.3.5
- 5.3.4
- 5.3.3
- 5.3.2
- 5.3.1
- 5.3.0
- 5.2.1
- 5.2.0
- 5.1.1
- 5.1.0
- 5.0.2
- 5.0.1
- 5.0.0
- 5.0.0-RC1
- 4.1.0
- 4.0.1
- 4.0.0
- v3.8.x-dev
- 3.8.0
- 3.7.12
- 3.7.11
- 3.7.10
- 3.7.9
- 3.7.8-RC2
- 3.7.8-RC1
- 3.7.7
- 3.7.6
- 3.7.2
- 3.7.1
- 3.7.0
- 3.6.2
- 3.6.1
- 3.6.0
- 3.5.3
- 3.5.2
- 3.5.0
- 3.4.4
- 3.4.3
- 3.4.2
- 3.4.0
- 3.3.0
- 3.2.4
- 3.2.3
- 3.2.2
- 3.2.1
- 3.2.0
- 3.1.0
- 3.0.2
- 3.0.1
- 3.0.0
- 2.0.4
- 2.0.3
- 2.0.2
- 2.0.0
- 1.11.1
- 1.11.0
- 1.10.3
- 1.10.2
- 1.10.1
- 1.10.0
- 1.9.x-dev
- 1.9.0
- 1.8.0
- 1.7.1
- 1.7.0
- 1.6.7
- 1.6.6
- 1.6.5
- 1.6.3
- 1.6.2
- 1.6.1
- 1.6.0
- 1.5.6
- 1.5.1
- 1.5.0
- 1.4.0
- 1.3.9
- 1.3.8
- 1.3.4
- 1.3.3
- 1.3.2
- 1.3.1
- 1.3.0
- 1.2.2
- 1.2.1
- 1.2.0
- 1.1.1
- 1.1.0
- 1.0.0
- dev-rdfinterface
- dev-ci
- dev-tx
- dev-issue-9
- dev-task-17238
This package is auto-updated.
Last update: 2024-10-23 12:00:59 UTC
README
The core component of the ARCHE repository solution responsible for the CRUD operations and transaction support.
Installation
composer require acdh-oeaw/arche-core
Deployment
See https://github.com/acdh-oeaw/arche-docker
Environment for development
An environment allowing you to edit code in your host system and run all the tests inside a docker container.
- Clone this repo and enter it
git clone https://github.com/acdh-oeaw/arche-core.git cd arche-core
- Get all dependencies
composer update
- Build the doker image with the runtime environment
docker build -t arche-dev build/docker
- Run the runtime environment mounting the repository dir into it and wait until it's ready
docker run --name arche-dev -v `pwd`:/var/www/html -e USER_UID=`id -u` -e USER_GID=`id -g` -d arche-dev docker logs -f arche-dev
wait until you see (timestamps will obviously differ):
then hit2020-06-04 14:06:52,309 INFO success: apache2 entered RUNNING state, process has stayed up for > than 1 seconds (startsecs) 2020-06-04 14:06:52,309 INFO success: postgresql entered RUNNING state, process has stayed up for > than 1 seconds (startsecs) 2020-06-04 14:06:52,309 INFO success: rabbitmq entered RUNNING state, process has stayed up for > than 1 seconds (startsecs) 2020-06-04 14:06:52,309 INFO success: tika entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)
CTRL+c
- Enter the docker container and run tests inside it
docker exec -ti -u www-data arche-dev /bin/bash
and then inside the containerXDEBUG_MODE=coverage vendor/bin/phpunit
Remarks:
- By default the development environment runs with PHP configured
as an Apache mod_php module
but it is also prepared to run PHP as the FPM.
To adjust the config run (in the host system)
docker exec arche-dev a2dissite mod_php docker exec arche-dev a2ensite php_fpm docker exec -w /root arche-dev supervisorctl restart apache2
Similarly to get back to the mod_php config:docker exec arche-dev a2dissite php_fpm docker exec arche-dev a2ensite mod_php docker exec -w /root arche-dev supervisorctl restart apache2
REST API documentation
- https://app.swaggerhub.com/apis/zozlak/arche (Swagger/OpenAPI)
- Guides on https://acdh-oeaw.github.io/arche-docs/
Architecture
Database structure
The main table is the resources
one. It stores a list of all repository resources identified by their internal repo id (the id
column) as well as transactions handling related data (columns transaction_id
and state
).
Metadata are devided into three tables according to the consistency checks applying to them.
- The
identifiers
table stores resources' identifiers (the repository assumes every resource may have many). The table enforces global identifiers uniquness. The RDF property storing the identifier comes implicitly from the repository'sconfig.yaml
($.schema.id
) and is not explicitly stored inside the database. - The
relations
table stores all RDF triples having an URI as an object. It enforces (with a foreign key check) existence of a repository resource an RDF triple points to. - The
metadata
table stores all other RDF triples. This table puts no constraints on the data. Triples are stored in an RDF-like way - each row in the table represents a single triple.- For triple values which look like a proper number/date the
value_n
/value_t
column stores a value casted to number/timestamp. This allows for correct comparisons which would fail against string values. - The index on the
value
column is set up only on first 1000 characters of the value. This is both for technical and performance reasons. An important consequence is that if you want to benefit from indexed search on the value column, you should state your condition assubstring(value, 1, 1000) = 'yourValue'
.
- For triple values which look like a proper number/date the
Supplementary tables include:
- The
transactions
table which stores information about pending transactions. - The
metadata_history
table which stores history of metadata modification. It's automatically filled in using triggers on tablesidentifiers
,relations
andmetadata
. - The
full_text_search
table storing a GIST index on a tokenized metadata values and resources' text content allowing for a full text search (see the Postgresql documentation). - The
spatial_search
table storing vector spatial data as PostGIS geography allowing for spatial searches (see the PostGIS documentation). - The
raw
table is used only for data migration from the previous ACDH-CH repository solution.
Helper functions and views
- The
metadata_view
gathers together triples from bothidentifiers
,relations
andmetadata
tables. - The
get_relatives()
function allows easy finding of resources related to a given one with a given RDF property. Internally it uses a recursive query which could be difficult to write correctly on you own. - The
get_neighbors_metadata()
and theget_relatives_metadata()
functions allow for easy fetching of metadata triples of bot a given resource and resources related to it. Either by any single-hop RDF property (get_neighbors_metadata()
) or with any number of hops of a one selected metadata property (get_relatives_metadata()
).