Using machine learning to index text from billions of images

October 13, 2018

The potential benefit of automatically recognizing text in images (including PDFs containing images) is tremendous. People have stored more than 20 billion image and PDF files in Dropbox. Of those files, 10-20% are photos of documents—like receipts and whiteboard images—as opposed to documents themselves.

These are now candidates for automatic image text recognition. Similarly, 25% of these PDFs are scans of documents that are also candidates for automatic text recognition.

Source: