Scaling

Thanos: long-term storage for your Prometheus Metrics

Posted on Dec 22, 2018 | Updated on Dec 22, 2018 | 111 words | ~1mins

Thanos is a project that turns your Prometheus installation into a highly available metric system with unlimited storage capacity. From a very high-level view, it does this by deploying a sidecar to Prometheus, which uploads the data blocks to any object storage. A store component downloads the blocks again and makes them accessible to a query component, which has the same API as Prometheus itself.

Observability at Scale: Building Uber’s Alerting Ecosystem

Posted on Dec 22, 2018 | Updated on Dec 22, 2018 | 130 words | ~1mins

Uber’s software architectures consists of thousands of microservices that empower teams to iterate quickly and support our company’s global growth. These microservices support a variety of solutions, such as mobile applications, internal and infrastructure services, and products along with complex configurations that affect these products at city and sub-city levels. To maintain our growth and architecture, Uber’s Observability team built a robust, scalable metrics and alerting pipeline responsible for detecting, mitigating, and notifying engineers of issues with their services as soon as they occur.

Istio Multicluster

Posted on Dec 22, 2018 | Updated on Dec 22, 2018 | 91 words | ~1mins

Istio Multicluster is a feature of Istio–the basis of Red Hat OpenShift Service Mesh–that allows for the extension of the service mesh across multiple Kubernetes or Red Hat OpenShift clusters. The primary goal of this feature is to enable control of services deployed across multiple clusters with a single control plane. The main requirement for Istio multicluster to work is that the pods in the mesh and the Istio control plane can talk to each other.

Kubernetes Federation V2

Posted on Dec 22, 2018 | Updated on Dec 22, 2018 | 124 words | ~1mins

With datacenters spread across the globe, users are increasingly looking at ways to spread their applications and services across multiple locales or clusters. This need is driven by multiple use cases: from providing high availability, spreading load across multiple clusters while being resilient to individual cluster failures; to avoiding provider lock-in by using hybrid cloud … With datacenters spread across the globe, users are increasingly looking at ways to spread their applications and services across multiple locales or clusters.

Cape Technical Deep Dive

Posted on Dec 22, 2018 | Updated on Dec 22, 2018 | 247 words | ~2mins

In this post, we’ll take a deep dive into the design of the Cape framework. First, we’ll discuss Cape’s architecture. Then we’ll look at the core scheduling component of the system. Throughout, we’ll focus the discussion on a few key design decisions. Before we begin, let’s touch on a few of our principles for developing and maintaining Cape. These principles were proposed based on learnings from the development of other systems at Dropbox, especially from Cape’s predecessor Livefill.

Intro to Apache Kafka and Kafka Streams for Event-Driven Microservices on DevNation Live

Posted on Dec 9, 2018 | Updated on Dec 9, 2018 | 69 words | ~1mins

Scalability is often a key issue for many growing organizations. That’s why many organizations use Apache Kafka, a popular messaging and streaming platform. It is horizontally scalable, cloud-native, and versatile. It can serve as a traditional publish-and-subscribe messaging system, as a streaming platform, or as a distributed state store. Companies around the world use Apache Kafka to build real-time streaming applications, streaming data pipelines, and event-driven architectures. Source: redhat.com

How ShiftLeft Uses PostgreSQL Extension TimescaleDB

Posted on Dec 9, 2018 | Updated on Dec 9, 2018 | 246 words | ~2mins

Time series are a major component of the ShiftLeft runtime experience. This is true for many other products and organizations too, but each case involves different characteristics and requirements. This post describes the requirements that we have to work with, how we useTimescaleDBto store and retrieve time series data, and the tooling we’ve developed to manage our infrastructure. We have two types of time series data: metrics and vulnerability events. Metrics represent application events, and a subset of those that involve security issues are vulnerability events. In both cases, these time series have some sort of ID, a timestamp, and a count.

Guide to scaling engineering organizations

Posted on Dec 9, 2018 | Updated on Dec 9, 2018 | 9 words | ~1mins

Lessons learned from scaling Stripe’s engineering team. Source: stripe.com