TimescaleDB Postgres extension is considered to be used as time-series DB in the Audit log service.
It was decided the TimescaleDB will not be used for Audit log
GCP does not support this PostgreSQL extension - therefore a separate vendor would have be used to maintain this
Currently there are no data crunching/aggregation requirements for the audit log, where the time-series BD would provide a performance boost
1. Installation options
Hosted by Timescale cloud
The Timescale cloud is built on top of AWS.
Not free, so must pay for its service & support.
Install managed service of TimescaleDB on Public Cloud (AWS, Azure, GCP)
Not free.
Serviced by TimescaleDB team.
Self hosted
Install TimescaleDB on server by yourself.
Supported OS: Debian/Ubuntu, RHEL/CentOS, Windows, MacOS.
Using pre-built Docker image
Easy to get it up since it’s simple to startup a docker container.
timescale/timescaledb:2.7.0-pg13
Install in Kubernetes using Helm
Be able to customized.
Easy to install.
Support single-node as well as multiple-node for High Available.
Install using Docker image on AWS
Only support AWS.
NOTE:
Support distributed Postgres instances.
Install using GCP
Seems this extension is not supported as per https://cloud.google.com/sql/docs/postgres/extensions
2. Install TimescaleDB extension
NOTE: Install TimescaleDB extension if you are not using pre-built docker image.
$ CREATE database example; $ \c example # connect to 'example' database $ CREATE EXTENSION IF NOT EXISTS timescaledb;
Check if the TimescaleDB extension is installed:
$ \dx List of installed extensions ...
3. Create a hypertable
The hypertable as the core of TimescaleDB is an abstract layer on top of a couple of chunks. Each chunk is implemented using a standard database table.
Rows inserted into a hypertable are "routed" (partitioning) to the right chunk based on a time column.
Chunks are not visible to users, so users interact with TimescaleDB via hyptertable by using standard SQL.
Create a regular Postgres table:
CREATE TABLE stocks_real_time ( time TIMESTAMPTZ NOT NULL, symbol TEXT NOT NULL, price DOUBLE PRECISION NULL, day_volume INT NULL );
Convert the regular table to hypertable:
SELECT create_hypertable('stocks_real_time','time');
Partition the data with ‘time’ column.
NOTE: TimescaleDB automatically creates an index on the time column.
Some key arguments of create_hypertable():
argument | required | description |
---|---|---|
table name | y | the first argument is to specify what table to convert |
column name | y | the second argument is to specify the column for partitioning |
chunk_time_interval | n | Event time that each chunk covers. Must be > 0. Default is 7 days. Events in 7 days are stored in the same chunk (table). |
Read create_hypertable to know more arguments.
To verify if a table is a hypertable:
SELECT * FROM timescaledb_information.hypertables WHERE hypertable_name = 'stocks_real_time';
4. Use TimescaleDB
Inserting and querying data using standard SQL statements.
5. Data retention
Old time-series data are not valuable when time goes on, so they can be dropped.
Automatic data pruning
SELECT add_retention_policy('stocks_real_time', INTERVAL '3 weeks');
Drop all data older than three weeks.
Manual data pruning
SELECT drop_chunks('stocks_real_time', INTERVAL '3 weeks');
6. Questions
Question | Description | Answer |
---|---|---|
Vendor & Cost | Is Safi CTO open to using this? (Until June 22 it was agreed on using a single SQL and a single NoSQL DB). Jakub Ukrop (Unlicensed) will get the answer. | |
Environment | Where to install TimescaleDB ? On GCP’s vm or kubernetes ? For different environment, we need to use different installation. | |
Data retention | How long is the data kept ? The old data should be stored somewhere else ? Is it still valuable ? | |
Single/multi node |