Ever since we launched Magic Pocket, our in-house multi-exabyte storage system, we’ve been continuously looking for opportunities to improve efficiency, while maintaining our high standards for reliability. Last year, we pushed the limits of storage density by being the first major tech company to adopt SMR storage. In this post, we’ll discuss another advance in storage technology at Dropbox: a new cold storage tier that’s optimized for less frequently accessed data.
This storage runs on the same SMR disks as our more active data, and through the same internal network. The access characteristics of a file at Dropbox varies heavily over time. Files are accessed very frequently within the first few hours of being uploaded but significantly less frequently afterwards.
Here is the cumulative distribution function of file accesses for files uploaded in the last year. Over 40% of all file retrievals in Dropbox are for data uploaded in the last day, over 70% for data uploaded in the last month, and over 90% for data uploaded in the last year. This pattern is unsurprising.
A new upload triggers a number of internal systems that fetch the file in order to augment the user experience, such as perform OCR, parse content to extract search tokens, or generate web previews for Office documents. Users also tend to share new documents, so a file is also likely to be synced to other devices soon after upload. In general, people are much more likely to access files they have recently uploaded rather than files they uploaded years ago.
Source: dropbox.com