Why databases use ordered indexes but programming uses hash tables

January 23, 2020

The traditional answer is that hash tables are designed to be efficient when storing data in memory, while B-Trees are designed for slower storage that is accessed in blocks. However, this is not a fundamental property of these data structures. There are hash tables designed to be used on disk (e.g. MySQL’s hash index), many in-memory trees (e.g. Java’s TreeMap, C++’s map), and even in-memory B-Trees.

I think the most important answer is that B-Trees are more ‘general purpose,’ which results in lower ‘total cost’ for very large persistent data. In other words, even though they are slower for single value accesses that make up the majority of the workload, they are better when you consider rare operations and the cost of multiple indexes. In this article, I’ll briefly explain the high level differences between hash tables and B-Trees, then discuss how persistent data has different needs than in-memory data.

Finally, while I think these are probably the right defaults, I’ll try to argue that we probably should use more ordered data structures in memory and more hash tables in databases.

Source: evanjones.ca