keboola / table-backend-utils
Package allows to import files to Snowflake from multiple cloud storages
Installs: 39 771
Dependents: 4
Suggesters: 0
Security: 0
Stars: 0
Watchers: 18
Forks: 1
Open Issues: 2
Requires
- php: ^8.1
- ext-json: *
- ext-odbc: *
- ext-pdo: *
- doctrine/dbal: ^3.3
- google/cloud-bigquery: ^1.23
- keboola/common-exceptions: ^1
- keboola/php-datatypes: ^7.6
- keboola/php-utils: ^4.1
- keboola/retry: ^0.5.0
Requires (Dev)
- dev-main
- 2.8.1
- 2.8.0
- 2.7.0
- 2.6.1
- 2.6.0
- 2.5.2
- 2.5.1
- 2.5.0
- 2.4.0
- 2.3.0
- 2.2.5
- 2.2.4
- 2.2.3
- 2.2.2
- 2.2.1
- 2.2.0
- 2.1.0
- 2.0.1
- 2.0.0
- 1.15.1
- 1.15.0
- v1.14.1
- v1.14.0
- v1.13.0
- v1.12.0
- v1.11.0
- v1.10.0
- v1.9.0
- v1.8.0
- v1.7.0
- v1.6.0
- v1.5.0
- v1.4.2
- v1.4.1
- v1.4.0
- v1.3.0
- v1.2.4
- v1.2.3
- v1.2.2
- v1.2.1
- v1.2.0
- v1.1.0
- v1.0.1
- v1.0.0
- v0.20.0
- v0.19.0
- 0.18.3
- v0.18.2
- v0.18.1
- v0.18.0
- v0.17.0
- v0.16.1
- v0.16.0
- v0.15.1
- v0.15.0
- v0.14.0
- v0.13.0
- v0.12.0
- v0.11.0
- v0.10.0
- v0.9.1
- v0.9.0
- v0.8.1
- v0.8.0
- v0.7.0
- v0.6.0
- v0.5.1
- v0.5.0
- v0.4.3
- v0.4.2
- v0.4.1
- v0.4.0
- v0.3.2
- v0.3.1
- v0.3.0
- v0.2.0
- v0.1
- dev-jirka-ct-1424-drop-fractional-seconds
- dev-odin-KAB-312
- dev-dependabot/github_actions/dot-github/workflows/actions/download-artifact-4.1.7
- dev-zajca-big-256
- dev-zajca-ct-1642
- dev-zajca-ct-950-ignore-columns
- dev-zajca-fix-distinct-on-nonnative-tables-2
- dev-zajca-fix-wrong-tests
- dev-martinj-db-import-export-terraform-fix
- dev-martinj-fix-phpstan
- dev-CT-933-add-release-development-branch-tag-to-able-to-require-in-other-lib-in-process-of-programming
- dev-martinj-ct-1361-default-value-for-null-conversion-in-bq
- dev-jirka-ct-1271-add-alter-command
- dev-jirka-1271-new-command
- dev-jirka-ct-1331-add-protobuf-for-table-info-in-preview
- dev-zajca-ct-1301
- dev-CT-1169-put-column-definition-endpoint
- dev-erik-metadata-backend
- dev-BIG-208-create-table-definition-sql-injection
- dev-martin-GCP-445
- dev-CT-905-dont-run-build-if-create-tag-on-master-branch
- dev-adamvyborny-CM-727-php-datatypes-oracle
- dev-backup-cache
- dev-jirka-big-193-convert-load-exception
- dev-martin-build-ecr
- dev-zajca-tag.php
- dev-zajca-big-185
- dev-big-160-update-common
- dev-jirka-big-167-too-many-requests-exception
- dev-zajca-big-171
- dev-zajca-big-169
- dev-zajca-big-170
- dev-big-160
- dev-jirka-ct-1084-add-table-type-bq-td
- dev-BIG-126-external-buckets
- dev-BIG-126-external-buckets-2
- dev-big-153-roman-improve-type-hint
- dev-big-153-runtime-options
- dev-revert-84-zajca-ct-1118
- dev-zajca-BIG-155-ASCII
- dev-zajca-BIG-157
- dev-zajca-big-142
- dev-zajca-ct-1118
- dev-zajca-ct-1118-no-bc
- dev-zajca-ct-1128-1
- dev-zajca-fix-zero-length
- dev-zajca-new-err-code
- dev-jirka-ct-910-external-tables
- dev-zajca-kbc-1003
- dev-CM-569-ondra
- dev-disable-td
- dev-jirka-ct-924-re-enable-exasol-start-stop
- dev-zajca-CT-666-snflk-null
- dev-php81
- dev-CT-950-ignore_timestamp
- dev-ct-835-fixx-export-null
- dev-ct-843-fix-numeric-value-is-empty-string
- dev-roman-finish-release
- dev-CT-843-null-import
- dev-PST-631_SNFLK-add-missing-types
- dev-roman-add-release-phase
- dev-add-ie-lib-repo
- dev-test-build
- dev-CT-844
- dev-roman-testing-layers
- dev-odbc-test
- dev-roman-depending-build
- dev-roman-improve-docker
- dev-roman-add-workflows-ie-lib
- dev-roman-adopt-ie-lib
- dev-roman-add-insturction-about-adopt-repo
- dev-roman-fix-deprecating-warnings-in-table-utils
- dev-CT-807-php8
- dev-add-ci-to-php-table-backend-utils
- dev-add-php-table-backend-utils
- dev-roman-add-ie-lib
- dev-roman-split-php-table-backend-utils
- dev-zajca-kbc-2902
- dev-KBC-2942-add-php-table-backend-utils-to-the-monorepo
- dev-roman-fix-exa-version
- dev-zajca-showgrantsontable
- dev-zajca-fix-exasol-pk-create-table
- dev-zajca-init-auth
- dev-zajca-init
This package is auto-updated.
Last update: 2024-10-25 11:29:39 UTC
README
Common stuff for table backends (Snowflake|Synapse|Redshift) shared between apps.
Interfaces
Database
Keboola\TableBackendUtils\Database\DatabaseReflectionInterface
interface DatabaseReflectionInterface { public function getUsersNames(?string $like = null): array; public function getRolesNames(?string $like = null): array; }
Schema
Keboola\TableBackendUtils\Schema\SchemaReflectionInterface
Function to retrieve information's about schema:
interface SchemaReflectionInterface { public function getTablesNames(): array; public function getViewsNames(): array; }
Table
Keboola\TableBackendUtils\Table\TableReflectionInterface
Function to retrieve information's about table:
interface TableReflectionInterface { public function getColumnsNames(): array; public function getColumnsDefinitions(): ColumnCollection; public function getRowsCount(): int; public function getPrimaryKeysNames(): array; public function getTableStats(): TableStatsInterface; public function isTemporary(): bool; public function getDependentViews(): array; }
Keboola\TableBackendUtils\Table\TableQueryBuilderInterface
Queries to work with table:
interface TableQueryBuilderInterface { public const TIMESTAMP_COLUMN_NAME = '_timestamp'; public function getDropTableCommand(string $schemaName, string $tableName): string; public function getRenameTableCommand(string $schemaName, string $sourceTableName, string $newTableName): string; public function getTruncateTableCommand(string $schemaName, string $tableName): string; public function getCreateTempTableCommand( string $schemaName, string $tableName, ColumnCollection $columns ): string; public function getCreateTableCommand( string $schemaName, string $tableName, ColumnCollection $columns, array $primaryKeys = [] ): string; public function getCreateTableCommandFromDefinition( TableDefinitionInterface $definition, bool $definePrimaryKeys = self::CREATE_TABLE_WITHOUT_PRIMARY_KEYS ): string; }
Keboola\TableBackendUtils\Table\TableStatsInterface
Table statistics
interface TableStatsInterface { public function getDataSizeBytes(): int; public function getRowsCount(): int; }
Column
Keboola\TableBackendUtils\Column\ColumnInterface
Table column definition:
interface ColumnInterface { public function getColumnName(): string; public function getColumnDefinition(): Keboola\Datatype\Definition\DefinitionInterface; public static function createGenericColumn(string $columnName): self; }
View
Keboola\TableBackendUtils\View\ViewReflectionInterface
Function to retrieve information's about view:
interface ViewReflectionInterface { public function getDependentViews(): array; }
Auth
Keboola\TableBackendUtils\Auth\UserReflectionInterface
interface UserReflectionInterface { public function endAllSessions(): void; public function getAllSessionIds(): array; }
Keboola\TableBackendUtils\Auth\GrantQueryBuilderInterface
interface GrantQueryBuilderInterface { public function getGrantSql(GrantOptionsInterface $options): string; public function getRevokeSql(RevokeOptionsInterface $options): string; }
Development
Preparation
Set up envs:
cp .env.dist .env
Set TEST_PREFIX=
ENV
SYNAPSE
Create synapse server on Azure portal or using CLI.
set up env variables:
SYNAPSE_UID=UID SYNAPSE_PWD=xxxx SYNAPSE_DATABASE=synapse_db SYNAPSE_SERVER=<synapse>.database.windows.net
Teradata
Prepare Teradata servers on AWS/Azure and set following properties. See
create new database for tests:
CREATE DATABASE <nick>_utils_tests FROM dbc AS PERMANENT = 1e8, SPOOL = 1e8;
set up env variables:
TERADATA_HOST= TERADATA_PORT=1025 TERADATA_USERNAME= TERADATA_PASSWORD= TERADATA_DATABASE=
AWS
In order to download TD odbc driver, create a user who can access S3 bucket with the driver package.
set up env variables:
AWS_ACCESS_KEY_ID= AWS_SECRET_ACCESS_KEY=
Exasol
Run Exasol on your local machine in docker (for this case .env is preconfigured)
docker compose up -d exasol
Run Exasol server somewhere else and set up env variables:
EXASOL_HOST= EXASOL_USERNAME= EXASOL_PASSWORD=
issues:
-
on slower machines Exasol can take a lot of resources. Run
docker compose -f docker compose.yml -f docker compose.limits.yml up exasol
to limit memory and cpu -
If you are getting error
exadt::ERROR: the private interface with address '<ip>/16' either does not exist or is down.
https://github.com/exasol/docker-db/issues/45 Exasol saves you (container) current ip address and docker daemon is changing default bridge range. There are two ways to fix this:
- each reboot set you current ip range to exasol
docket-compose run --rm exasol exaconf modify-node -n 11 -p '<ip>/16'
this is have to be done each time when ip addresses are not align. - set docker default bridge to some fixed range: edit or create
/etc/docker/daemon.json
and set{"bip":"172.0.0.1/24"}
(or different range that suites you)
Snowflake
Prepare credentials for Snowflake access
CREATE ROLE "KEBOOLA_CI_TABLE_UTILS"; CREATE DATABASE "KEBOOLA_CI_TABLE_UTILS"; GRANT ALL PRIVILEGES ON DATABASE "KEBOOLA_CI_TABLE_UTILS" TO ROLE "KEBOOLA_CI_TABLE_UTILS"; GRANT USAGE ON WAREHOUSE "DEV" TO ROLE "KEBOOLA_CI_TABLE_UTILS"; CREATE USER "KEBOOLA_CI_TABLE_UTILS" PASSWORD = 'my_secret_password' DEFAULT_ROLE = "KEBOOLA_CI_TABLE_UTILS"; GRANT ROLE "KEBOOLA_CI_TABLE_UTILS" TO USER "KEBOOLA_CI_TABLE_UTILS";
set up env variables:
SNOWFLAKE_HOST=
SNOWFLAKE_PORT=443
SNOWFLAKE_USER=KEBOOLA_CI_TABLE_UTILS
SNOWFLAKE_PASSWORD='my_secret_password
SNOWFLAKE_DATABASE=KEBOOLA_CI_TABLE_UTILS
SNOWFLAKE_WAREHOUSE=
Bigquery
To prepare the backend you can use Terraform template.
You must have the resourcemanager.folders.create
permission for the organization.
# run in provisioning/local/BigQuery folder terraform init terraform apply -var organization_id=<your-org-id> -var backend_prefix=<your_backend_prefix> -var billing_account_id=<billing_account_id> # and enter name for your backend prefix for example your name, all resources will create with this prefx
After terraform apply ends go to the service project in folder created by terraform.
- go to the newly created service project, the project id are listed at the end of the terraform call. (service_project_id)
- click on IAM & Admin
- on left panel choose Service Accounts
- click on email of service account(there is only one)
- on to the top choose Keys and Add Key => Create new key
- select Key type JSON
- click on the Create button and the file will automatically download
- convert key to string
awk -v RS= '{$1=$1}1' <key_file>.json >> .env
- set content on last line of .env as variable
BQ_KEY_FILE
setup envs:
BQ_KEY_FILE=<the content of the downloaded json key file>
Tests
Run tests with following command.
docker compose run --rm dev composer tests
Unit and functional test can be run sepparetly
#unit test docker compose run --rm dev composer tests-unit #functional test docker compose run --rm dev composer tests-functional
Code quality check
#phplint docker compose run --rm dev composer phplint #phpcs docker compose run --rm dev composer phpcs #phpcbf docker compose run --rm dev composer phpcbf #phpstan docker compose run --rm dev composer phpstan
Full CI workflow
This command will run all checks load fixtures and run tests
docker compose run --rm dev composer ci
Teradata connection
You can use following method to get connection to Teradata for your queries
\Keboola\TableBackendUtils\Connection\Teradata\TeradataConnection::getConnection([ 'host' => getenv('TERADATA_HOST'), 'user' => getenv('TERADATA_USERNAME'), 'password' => getenv('TERADATA_PASSWORD'), 'port' => getenv('TERADATA_PORT'), 'dbname' => '', ]);
If you want to use connection via PHPStorm DataGrip or other DB client, remove port when you setting up connection in client. Otherwise test connection will fail.
License
MIT licensed, see LICENSE file.