How ShiftLeft Uses PostgreSQL Extension TimescaleDB

Posted on
casestudy news postgres scaling

Time series are a major component of the ShiftLeft runtime experience. This is true for many other products and organizations too, but each case involves different characteristics and requirements. This post describes the requirements that we have to work with, how we useTimescaleDBto store and retrieve time series data, and the tooling we’ve developed to manage our infrastructure.

We have two types of time series data: metrics and vulnerability events. Metrics represent application events, and a subset of those that involve security issues are vulnerability events. In both cases, these time series have some sort of ID, a timestamp, and a count.

Vulnerability events can also have an event sample that contains detailed information about the request that exercised a security vulnerability. In addition to those attributes, time series are also keyed by an internal ID we call an SP ID, which essentially represents a customer project at a certain version. The data model for metrics is closely tied to the source-sink model of theCode Property Graphsuch that, for a given application, its methods, and I/O and data flows are organized into triggers, inputs, and outputs.

This has been illustrated in the figure below which summarizes a few typical endpoints acting astriggers and the flows through whichinputdata received at those endpoints eventually reach anoutputsuch as a log. We rarely query time series data for a single metric; usually we need to query for lots of metrics that are related according to this data model.