Thursday, July 5, 2007

Hash Tables....

It's a fairly constant amusement to me the lack of forethought put into certain details when other details are obsessed over. A great example can be found in the realm of the hash table. Now, I don't want to rant about basic programming practice, but it's my blog so I can do what I want.

Here is the scenario: our intrepid programmer friend has spent much time developing a 32bit hash table key. Now, the hash key is in reality 4 characters, the last of which is least important and often duplicated. Satisfied that the hash key works well, our friend creates the hash class, and decides that 256 entries is a nice round number.

Thus happens the line of code: arrayindex = hashkey % 256;

In this case, the modulo has left nothing more than the last byte of the key as significant. The last byte of our hash key is the *least* important and most duplicated, so our friend has managed nothing more than to remove a few entries from a linear searched list.

That, however, pales in comparison to this blunder done by using a word-aligned pointer as a hash key. arrayindex = hashkey % 100;

Now, this code looks harmless enough, until you consider the small fact that 100 equals 25 * 4. The 4 factor cancels out nicely because the integer value of the pointer must be a multiple of 4, and at this point, the programmer has essentially created a 25 bucket hash table using 100 buckets worth of space.

The funny thing is - I found BOTH of these in real code. The point? If you are going to spend hours working on one portion of a data structure, keep in mind how the rest works to know if you are wasting that time or not...

No comments: