Implementing HashMaps in Python

Hash Maps should try to provide us with constant time look ups. Arrays are good at providing us with constant look up time.

Now only if we could convert value to some sort of index ?
A function to map value to an array index. Thats what the hash function does.

In a Hash-Map, the keys are an unordered collection of data (Set). So, we don't know their order and they are all unique.
The keys in hash-map gets into our hash-function and outputs the index of the array.

There are 2 ways of implementing this hash functon.

A unique index for each key

Thats not exactly possible, since whatever function we choose, its always possible that it generates the same index for 2 different keys.
But it's possible if we:

  • Implement our hash-function such that it gives out widely spread values.

  • Every time a collision occurs, modify the hash-function such that their is no collision for any value currently in our data sets.

Use buckets

Instead of storing only one value at each index, we store a list of values. This is known as bucket.
So, each index can store more than one value. In case we have to search for a key, and if it happens to be inside of a bucket, we'll have to perform linear search.
In worst case it's O(n), but its rarely the case.

So, we have a tradeoff. Unique index implementation is the best, if space isn't a constraint. Otherwise we'll have to use Buckets.

Lets implement a Hash-Table which'll have string as keys.

Hash function

How about ASCII(str[0])*31 + ASCII(str[1]) ?

Key should be atleast 2 characters

class HashTable(object):  
    def __init__(self):
        self.table = [None]*10000

    def store(self, string):
        """Input a string that's stored in
        the table."""
        hash_index = self.calculate_hash_value(string)
        if self.table[hash_index]:
            self.table[hash_index]= [string]

    def lookup(self, string):
        """Return the hash value if the
        string is already in the table.
        Return -1 otherwise."""
        index = self.calculate_hash_value(string)
        if self.table[index]:
            for each_key in self.table[index]:
                if each_key == string:
                    return string
        return -1

    def calculate_hash_value(self, string):
        """Helper function to calulate a
        hash value from a string."""
        return ord(string[0])*100+ord(string[1])

# Setup
hash_table = HashTable()

print hash_table.calculate_hash_value('Yask')

print hash_table.lookup('Yask')

# Test store'Yask')  
  print hash_table.lookup('Yask')