Using machine learning to index text from billions of images

The potential benefit of automatically recognizing text in images (including PDFs containing images) is tremendous. People have stored more than 20 billion image and PDF files in Dropbox. Of those files, 10-20% are photos of documents—like receipts and whiteboard images—as opposed to documents themselves. These are now candidates for automatic image text recognition. Similarly, 25% of these PDFs are scans of documents that are also candidates for automatic text recognition.
Read more

Reboot Plugin for Linux in Ansible 2.7

New in Anisble 2.7 is a reboot action plugin for Linux. Long overdue, it’s finally easy to reboot Linux hosts with Ansible. This is the development story and all the details on how it works. Source:

Dropbox traffic infrastructure: Edge network

In this post we will describe the Edge network part of Dropbox traffic infrastructure. This is an extended transcript of our NginxConf 2018 presentation. Around the same time last year we described low-level aspects of our infra in the Optimizing web servers for high throughput and low latency post. This time we’ll cover higher-level things like our points of presence around the world, GSLB, RUM DNS, L4 loadbalancers, nginx setup and its dynamic configuration, and a bit of gRPC proxying.
Read more