TimescaleDB Postgres extension is considered to be used as time-series DB in the Audit log service.

It was decided the TimescaleDB will not be used for Audit log

  • GCP does not support this PostgreSQL extension - therefore a separate vendor would have be used to maintain this

  • Currently there are no data crunching/aggregation requirements for the audit log, where the time-series BD would provide a performance boost

1. Installation options

Hosted by Timescale cloud

The Timescale cloud is built on top of AWS.

Not free, so must pay for its service & support.

Install managed service of TimescaleDB on Public Cloud (AWS, Azure, GCP)

Not free.

Serviced by TimescaleDB team.

Self hosted

Install TimescaleDB on server by yourself.

Supported OS: Debian/Ubuntu, RHEL/CentOS, Windows, MacOS.

Using pre-built Docker image

Easy to get it up since it’s simple to startup a docker container.

timescale/timescaledb:2.7.0-pg13

Install in Kubernetes using Helm

Be able to customized.

Easy to install.

Support single-node as well as multiple-node for High Available.

Install using Docker image on AWS

Only support AWS.

NOTE:

Support distributed Postgres instances.

Install using GCP

Seems this extension is not supported as per https://cloud.google.com/sql/docs/postgres/extensions

2. Install TimescaleDB extension

NOTE: Install TimescaleDB extension if you are not using pre-built docker image.

$ CREATE database example;
$ \c example                 # connect to 'example' database
$ CREATE EXTENSION IF NOT EXISTS timescaledb;

Check if the TimescaleDB extension is installed:

$ \dx
List of installed extensions
...

3. Create a hypertable

The hypertable as the core of TimescaleDB is an abstract layer on top of a couple of chunks. Each chunk is implemented using a standard database table.

Rows inserted into a hypertable are "routed" (partitioning) to the right chunk based on a time column.

Chunks are not visible to users, so users interact with TimescaleDB via hyptertable by using standard SQL.

Create a regular Postgres table:

CREATE TABLE stocks_real_time (
  time TIMESTAMPTZ NOT NULL,
  symbol TEXT NOT NULL,
  price DOUBLE PRECISION NULL,
  day_volume INT NULL
);

Convert the regular table to hypertable:

SELECT create_hypertable('stocks_real_time','time');

Partition the data with ‘time’ column.

NOTE: TimescaleDB automatically creates an index on the time column.

Some key arguments of create_hypertable():

argument

required

description

table name

y

the first argument is to specify what table to convert

column name

y

the second argument is to specify the column for partitioning

chunk_time_interval

n

Event time that each chunk covers. Must be > 0. Default is 7 days. Events in 7 days are stored in the same chunk (table).

Read create_hypertable to know more arguments.

To verify if a table is a hypertable:

SELECT * FROM timescaledb_information.hypertables WHERE hypertable_name = 'stocks_real_time';

4. Use TimescaleDB

Inserting and querying data using standard SQL statements.

5. Data retention

Old time-series data are not valuable when time goes on, so they can be dropped.

Automatic data pruning

SELECT add_retention_policy('stocks_real_time', INTERVAL '3 weeks');

Drop all data older than three weeks.

Manual data pruning

SELECT drop_chunks('stocks_real_time', INTERVAL '3 weeks');

6. Questions

Question

Description

Answer

Vendor & Cost

Is Safi CTO open to using this? (Until June 22 it was agreed on using a single SQL and a single NoSQL DB). Jakub Ukrop (Unlicensed) will get the answer.

Environment

Where to install TimescaleDB ? On GCP’s vm or kubernetes ? For different environment, we need to use different installation.

Data retention

How long is the data kept ?

The old data should be stored somewhere else ? Is it still valuable ?

Single/multi node

7. References

  1. https://docs.timescale.com/

  2. https://github.com/timescale/timescaledb