Say hello to a new AIOps platform: Wave of the future or just another drop in the ocean?

Are self-healing applications on our horizon? CA Technologies’ new AIOps platform hopes to bring us closer to the future of intelligent automation. What are the grand plans for AIOps and is it a potential reality or just another buzzword? Artificial Intelligence for IT Operations, also known as AIOps, gains momentum as our machine learning capabilities and algorithims become more complex. On October 16, 2018 CA Technologies revealed a new AIOps platform that will help teams automate tasks intelligently.
Read more

Digging deeper into Kubernetes 1.12 – Two of the most promising features so far

This time around, I am happy to revisit two of the most important features of Kubernetes 1.12 and take a closer look at what’s under their hood and what the future prospectsare. Source: jaxenter.com

Lumen: Custom, Self-Service Dashboarding For Netflix

Netflix generates a lot of data. One of the ways that we gain useful insights is by visualizing that data in dashboards which allow us to comprehend large amounts of information quickly. This is particularly important when operational issues arise as our engineers need to be able to quickly diagnose problem areas and work to correct them. Operational issues, however, are just one potential use case for dashboards at Netflix. We also use dashboards to track and chart key business metrics, compare the results of experiments, monitor real-time data, and even find out if burgers are on the menu for lunch.
Read more

Uber’s Big Data Platform: 100+ Petabytes with Minute Latency

Uber is committed to delivering safer and more reliable transportation across our global markets. To accomplish this, Uber relies heavily on making data-driven decisions at every level, from forecasting rider demand during high traffic events to identifying and addressing bottlenecks in our driver-partner sign-up process. Over time, the need for more insights has resulted in over 100 petabytes of analytical data that needs to be cleaned, stored, and served with minimum latency through our Hadoop-based Big Data platform.
Read more

Time Series vs Logging vs Tracing

Monitoring, or the newer term ‘observability’ is a frequently misunderstood subject. I know this because I see forum posts where person A will ask a question and persons B, C and D will reply with some random nonsense. The landscape of monitoring tools is immense so a comparison blog that covers everything would quickly get tedious. What I’ll do instead is explain the three types of monitoring that you need. Then suggest the best tool to use in each case for Kubernetes.
Read more

Introducing Petastorm: Uber ATG’s Data Access Library for Deep Learning

Uber’s Advanced Technologies Group introduces Petastorm, an open source data access library enabling training and evaluation of deep learning models directly from multi-terabyte datasets in Apache Parquet format. In recent years, deep learning has taken a central role in solving a wide range of problems in pattern recognition. At Uber Advanced Technologies Group (ATG), we use deep learning to solve various problems in the autonomous driving space, since many of these are pattern recognition problems.
Read more

How we rolled out one of the largest Python 3 migrations ever

Dropbox is one of the most popular desktop applications in the world: You can install it today on Windows, macOS, and some flavors of Linux. What you may not know is that much of the application is written using Python. In fact, Drew’s very first lines of code for Dropbox were written in Python for Windows using venerable libraries such as pywin32. Though we’ve relied on Python 2 for many years (most recently, we used Python 2.
Read more

Netflix’s Production Technology = Voltron

Change management is hard. In everyday production, there are numerous factors working against embracing change. Limited preparation time, whole new show = whole new crew, innumerable planning variables, and the challenge of driving an operational plan based on creative instincts. These are problems that technology is not yet built to solve. Time, training, and education can and will make a dent in our efforts, but creative planning is nuanced, and by nature, human.
Read more

Architecture of Nautilus, the new Dropbox search engine

Search presents a unique challenge when it comes to Dropbox due to our massive scale—with hundreds of billions of pieces of content—and also due to the need for providing a personalized search experience to each of our 500M+ registered users. It’s personalized in multiple ways: not only does each user have access to a different set of documents, but users also have different preferences and behaviors in how they search. This is in contrast to web search engines, where the focus on personalization is almost entirely on the latter aspect, but over a corpus of documents that are largely the same for each user (localities aside).
Read more

Rethinking Netflix’s Edge Load Balancing

We briefly touched on some of the load balancing improvements we’ve recently been making in our Open Sourcing Zuul 2 post. In this post, we’ll go into more detail on the whys, hows and results of that work. On the Netflix Cloud Gateway team we are always working to help systems reduce errors, gain higher availability, and improve Netflix’s resilience to failures. We do this because even a low rate of errors at our scale of over a million requests per second can degrade the experience for our members, so every little bit helps.
Read more