How should pipelines be monitored?

For online serving systems it’s fairly well known that you should look for request rate, errors and duration. What about offline processing pipelines though? For a typical web application, high latency or error rates are the sort of thing you want to wake someone up about as they usually negatively affect the end-user’s experience. Request […]

MTTR is dead, long live CIRT

The game is changing for the IT ops community, which means the rules of the past make less and less sense. Organizations need accurate, understandable, and actionable metrics in the right context to measure operations performance and drive critical business transformation. The more customers use modern tools and the more variation in the types of […]

How to monitor Golden signals in Kubernetes

What are Golden signals metrics? How do you monitor golden signals in Kubernetes applications? Golden signals can help to detect issues of a microservices application. These signals are a reduced set of metrics that offer a wide view of a service from a user or consumer perspective, so you can detect potential problems that might […]

Infrastructure monitoring: Defense against surprise downtime

Infrastructure monitoring is an integral part of infrastructure management. It is an IT manager’s first line of defense against surprise downtime. Severe issues can inject considerable downtime to live infrastructure, sometimes causing heavy loss of money and material. Source: opensource

Kubernetes Metrics and Monitoring

This post explores the current state of metrics and monitoring in Kubernetes by walking through the gradual thought process that I experienced when learning this topic. Kubernetes needs some metrics for it’s basic out-of-the-box functionality, like autoscaling and scheduling. This is regardless of any monitoring solution you may want for the purpose of troubleshooting and […]

How Uber Monitors 4,000 Microservices

With 4,000 proprietary microservices and a growing number of open source systems that needed to be monitored, by late 2014 Uber was outgrowing its usage of Graphite and Nagios for metrics. They evaluated several technologies, including Atlas and OpenTSDB, but the fact that a growing number of open source systems were adding native support for […]

Monitoring Kubernetes, part 1: the challenges + data sources

Our industry has long been relying on microservice-based architecture to deliver software faster and safer. The advent and ubiquity of microservices naturally paved the way for container technology, empowering us to rethink how we build and deploy our applications. Docker exploded onto the scene in 2013, and, for companies focusing on modernizing their infrastructure and […]

Observability at Scale: Building Uber’s Alerting Ecosystem

Uber’s software architectures consists of thousands of microservices that empower teams to iterate quickly and support our company’s global growth. These microservices support a variety of solutions, such as mobile applications, internal and infrastructure services, and products along with complex configurations that affect these products at city and sub-city levels. To maintain our growth and […]

Stack Overflow: How We Do Monitoring

What is monitoring? As far as I can tell, it means different things to different people. But we more or less agree on the concept. I think. Maybe. Let’s find out! Source: nickcraver

Loki: Prometheus-inspired, open source logging for cloud natives

Grafana is the defacto dashboarding solution for time-series data. It supports over 40 datasources (as of this writing), and the dashboarding story has matured considerably with new features, including the addition of teams and folders. We now want to move on from being a dashboarding solution to being an observability platform, to be the go-to […]