Trying to find a database that can both store and retrieves your time-series data? There are many different databases today to handle time-series data. We started out using OpenTSDB but then switched to TimescaleDB. In this article, we examine the characteristics of time-series data and requirements for handling them.
We initially tried with OpenTSDB and met with a clumsy work on combining the metadata(stored in PostgreSQL) with Time Series data. To overcome the burden, we switched to TimeScaleDB. We will also compare Timescale DB against OpenTSDB on key technical features and architecture.
Time-series Data :
Let’s have a look at the basic definition of time-series data :
A discrete time-data sequence taken at successive equally spaced points in time-indexed in time order.
Data that collectively represents how a system, process, or behavior changes over time.
Types of Time-series Data :
- Seasonal effect (Seasonal Variation or Seasonal Fluctuations)
- Other Cyclic Changes (Cyclical Variation or Cyclic Fluctuations)
- Trend (Secular Trend or Long Term Variation)
- Other Irregular Variation (Irregular Fluctuations)
Some of the examples are Monitoring applications data, weather analysis data, Stock Market Data etc.,
In this digital environment, a great deal of data is gathered by various devices and applications. For example, current location, browsing data, personal fitness/metrics trackers etc. In this kind of scenario, it is really important to store the data for the overall population in an effective time-series database for future predictions/forecast.
fig1: Example of Time series data points
TimescaleDB Overview :
TimescaleDB is the first time-series database specifically designed for scale, ease of use, and complex queries. While TimescaleDB is an extension of PostgreSQL, it provides the following:
- Automatic partitioning across time and space (partitioning key)
- Full SQL support
- Easy to use; like a relational database
- Fast and parallel ingestion
fig 2: PostgreSQL and TimescaleDB – A Comparison of Insert Rates
As we can see in the above figure (fig 2), the insert rates go down as the Dataset size increases in PostgreSQL. While in TimeScaleDB, steady insertion rate is maintained irrespective of the size of the Dataset. Thus, the performance of the application that sits on top of TimescaleDB improves greatly.
TimescaleDB executes the query on Hypertable comprising of many Chunks partitioned by time and space which really look like regular tables.
“ Time-series data is largely immutable. Writes primarily occur as new appends to recent time intervals, not as updates to existing rows. Both read and writes have a natural partitioning across both time and space.”
-TimeScale DB Developers
Data Handling in TimescaleDB :
HyperTable Outlook: Abstracts the table as the Hypertable composed of many right-sized chunks partitioned as per Time and Space.
Optimized Query Execution: During Query Execution it checks whether only the necessary chunks are used for retrieval of data. This can be done by aggressive constraint exclusion.
Data Model: TimeScale DB follows the Wide table Model which helps in the estimate, measure, or note the similarity between data.
Benchmarking TimeScaleDB :
Our use case required running complex aggregation queries while also supporting simultaneous ingestion of incoming time-series data. To ensure that the chosen platform can handle this load, we did some benchmarking.
An ingestion application kept pumping in data into the database. We ran three types of queries that accesses a varying number of rows. Every query was run several times to ensure stable results. Execution time over a table varied only slightly. This is because the query hits only the selected number of chunks satisfying the filtering conditions. The results are shown in the table below. We will describe the individual queries in a future blog post.
Results of TimescaleDB benchmark:
|Records Retrieved||Query 1 Execution time(seconds)||Query 2 Execution Time(seconds)||Query 3 Execution time(seconds)|
TimeScaleDB(v0.4.2) Vs OpenTSDB(v2.3.0)
Based on our use case for handling and storing time-series data for IoT implementation in a Big Data environment, TimescaleDB compares better on features such as partitioning, data retention, access methods, and compatibility to scripting for automation. Moreover, ease of access and simple retrieval of data for further application makes it more convenient when compared to the other time-series databases that we have used.