It turns out that for integer hash tables implementations the identity function (f(x)=x) is preferred in most cases. This is a trade-off between speed and desirable properties such as uniform distribution that works well for integers. In essence the speed of attempted lookups is so fast that we can afford the increased number of collisions a naive hash function will give us.

For byte arrays such as strings this is not the case, and instead the opposite is true. Optimising the lookup algorithm is less important, and the properties of the hash function much more so.

To choose a good hash function we need to look at both speed, how quickly the hash is calculated, and how likely collisions are to occur in a given use case. We will start with looking at speed, or data throughput, of some of the most interesting potential candidates.

# The benchmark

In this micro-benchmark…

View original post 220 more words