Uber AI Labs introduces a method for making neural networks that process images faster and more accurately by leveraging JPEG representations. Neural networks, an important tool for processing data in a variety of industries, grew from an academic research area to a cornerstone of industry over the last few years. Convolutional Neural Networks (CNNs) have been particularly useful for extracting information from images, whether classifying them, recognizing faces, or evaluating board positions in Go.
At Uber, we use CNNs for an assortment of purposes, from detecting objects and predicting their motion to processing petabytes of street-level and satellite images to improve our maps. When making use of a CNN, we care about how accurately it completes its task, and in many cases, we also care about its speed. In these two examples, a network twice as fast may enable real-time detection instead of offline detection or be able to process an enormous dataset in one week of data center time instead of two.
In this article, we describe an approach presented at NeurIPS 2018 for making CNNs smaller, faster, and more accurate all at the same time by hacking libjpeg and leveraging the internal image representations already used by JPEG, the popular image format. An earlier version of this work was presented as an ICLR workshop poster in June 2018. This article will also discuss surprising insights about frequency space and color information as they relate to network architecture design.