Scaling Kubernetes to 2,500 Nodes

We’ve been running Kubernetes for deep learning research for over two years. While our largest-scale workloads manage bare cloud VMs directly, Kubernetes provides a fast iteration cycle, reasonable scalability, and a lack of boilerplate which makes it ideal for most of our experiments. We now operate several Kubernetes clusters (some in the cloud and some […]

Applying Customer Feedback: How NLP & Deep Learning Improve Uber’s Maps

High quality map data powers many aspects of the Uber trip experience. Services such as Search, Routing, and Estimated Time of Arrival (ETA) prediction rely on accurate map data to provide a safe, convenient, and efficient experience for riders, drivers, eaters, and delivery-partners. However, map data can become stale over time, reducing its quality. As […]

Say hello to a new AIOps platform: Wave of the future or just another drop in the ocean?

Are self-healing applications on our horizon? CA Technologies’ new AIOps platform hopes to bring us closer to the future of intelligent automation. What are the grand plans for AIOps and is it a potential reality or just another buzzword? Artificial Intelligence for IT Operations, also known as AIOps, gains momentum as our machine learning capabilities […]

Lumen: Custom, Self-Service Dashboarding For Netflix

Netflix generates a lot of data. One of the ways that we gain useful insights is by visualizing that data in dashboards which allow us to comprehend large amounts of information quickly. This is particularly important when operational issues arise as our engineers need to be able to quickly diagnose problem areas and work to […]

Uber’s Big Data Platform: 100+ Petabytes with Minute Latency

Uber is committed to delivering safer and more reliable transportation across our global markets. To accomplish this, Uber relies heavily on making data-driven decisions at every level, from forecasting rider demand during high traffic events to identifying and addressing bottlenecks in our driver-partner sign-up process. Over time, the need for more insights has resulted in […]

Time Series vs Logging vs Tracing

Monitoring, or the newer term ‘observability’ is a frequently misunderstood subject. I know this because I see forum posts where person A will ask a question and persons B, C and D will reply with some random nonsense. The landscape of monitoring tools is immense so a comparison blog that covers everything would quickly get […]

Introducing Petastorm: Uber ATG’s Data Access Library for Deep Learning

Uber’s Advanced Technologies Group introduces Petastorm, an open source data access library enabling training and evaluation of deep learning models directly from multi-terabyte datasets in Apache Parquet format. In recent years, deep learning has taken a central role in solving a wide range of problems in pattern recognition. At Uber Advanced Technologies Group (ATG), we […]

How we rolled out one of the largest Python 3 migrations ever

Dropbox is one of the most popular desktop applications in the world: You can install it today on Windows, macOS, and some flavors of Linux. What you may not know is that much of the application is written using Python. In fact, Drew’s very first lines of code for Dropbox were written in Python for […]

Netflix’s Production Technology = Voltron

Change management is hard. In everyday production, there are numerous factors working against embracing change. Limited preparation time, whole new show = whole new crew, innumerable planning variables, and the challenge of driving an operational plan based on creative instincts. These are problems that technology is not yet built to solve. Time, training, and education […]