Lumen: Custom, Self-Service Dashboarding For Netflix

Netflix generates a lot of data. One of the ways that we gain useful insights is by visualizing that data in dashboards which allow us to comprehend large amounts of information quickly. This is particularly important when operational issues arise as our engineers need to be able to quickly diagnose problem areas and work to […]

Uber’s Big Data Platform: 100+ Petabytes with Minute Latency

Uber is committed to delivering safer and more reliable transportation across our global markets. To accomplish this, Uber relies heavily on making data-driven decisions at every level, from forecasting rider demand during high traffic events to identifying and addressing bottlenecks in our driver-partner sign-up process. Over time, the need for more insights has resulted in […]

Time Series vs Logging vs Tracing

Monitoring, or the newer term ‘observability’ is a frequently misunderstood subject. I know this because I see forum posts where person A will ask a question and persons B, C and D will reply with some random nonsense. The landscape of monitoring tools is immense so a comparison blog that covers everything would quickly get […]

Introducing Petastorm: Uber ATG’s Data Access Library for Deep Learning

Uber’s Advanced Technologies Group introduces Petastorm, an open source data access library enabling training and evaluation of deep learning models directly from multi-terabyte datasets in Apache Parquet format. In recent years, deep learning has taken a central role in solving a wide range of problems in pattern recognition. At Uber Advanced Technologies Group (ATG), we […]

How we rolled out one of the largest Python 3 migrations ever

Dropbox is one of the most popular desktop applications in the world: You can install it today on Windows, macOS, and some flavors of Linux. What you may not know is that much of the application is written using Python. In fact, Drew’s very first lines of code for Dropbox were written in Python for […]

Netflix’s Production Technology = Voltron

Change management is hard. In everyday production, there are numerous factors working against embracing change. Limited preparation time, whole new show = whole new crew, innumerable planning variables, and the challenge of driving an operational plan based on creative instincts. These are problems that technology is not yet built to solve. Time, training, and education […]

Architecture of Nautilus, the new Dropbox search engine

Search presents a unique challenge when it comes to Dropbox due to our massive scale—with hundreds of billions of pieces of content—and also due to the need for providing a personalized search experience to each of our 500M+ registered users. It’s personalized in multiple ways: not only does each user have access to a different […]

Rethinking Netflix’s Edge Load Balancing

We briefly touched on some of the load balancing improvements we’ve recently been making in our Open Sourcing Zuul 2 post. In this post, we’ll go into more detail on the whys, hows and results of that work. On the Netflix Cloud Gateway team we are always working to help systems reduce errors, gain higher […]

Using machine learning to index text from billions of images

The potential benefit of automatically recognizing text in images (including PDFs containing images) is tremendous. People have stored more than 20 billion image and PDF files in Dropbox. Of those files, 10-20% are photos of documents—like receipts and whiteboard images—as opposed to documents themselves. These are now candidates for automatic image text recognition. Similarly, 25% […]

Reboot Plugin for Linux in Ansible 2.7

New in Anisble 2.7 is a reboot action plugin for Linux. Long overdue, it’s finally easy to reboot Linux hosts with Ansible. This is the development story and all the details on how it works. Source: ansible