Hash tables or Dictionaries(as referred to in Python) are associative arrays. From
Wikipedia associative arrays are defined as a collection of key, value pairs such that each key appears at most once in the hash table.
Question: Why can't we use arrays?
Answer: Because, when we use arrays its difficult to find an element in the array since the searching will loop through all the elements in the array until the element is found. This will compromise efficiency if the array is large in size. This problem is solved in hash table as elements can be accessed quickly without looping through the entire array.
Okay lets dig deeper...
I am going to explain this one with an example that we see daily. In this example we are going to store all members of a family relations and their names in a hash.
So the hash name is
family_dict = {}
I am listing out all the elements I am going to store in it. It is going to contain wife, son, daughter, friend, father, mother....
All these relations have a name that we can call with. Now to build a hash we need keys and values. Identifying keys and values is the important thing because ultimately it will satisfy our need to use hash data structure.
In our hash we are going to store relations and their names. Before that one thing we all need to keep in mind while we build a hash is that keys should be unique and values we will not care about them until there is really a need.
So our family hash needs unique things as keys. Names can't be unique as many people can be named with same name. This is as simple as that. Therefore, our hash is going to contain relations as keys and names as its corresponding values.
Now there will be a question what if our relations can also be same like when we have many brothers/sisters. Simply we are going to manipulate our keys are brother1, sister2 to make them unique. Enough of theory now and we will start our implementation.
family_dict = {
"me": "Mr.x",
"father": "Mr.y",
"mother": "Mrs.z",
"son":"kid1",
"daughter1":"d1"
"daughter2":"d2",
"wife":"w1"
}
This seems simple. Each element mapped to its corresponding value. Now think of a situation where we might need to point same key to many values. For it lets assume Mr.x is a bit cheeky and has another
wife w2. One just can't put two wives in same house. So in computers what happens is when you add same key with different value like
"wife":"w2". Our hash would store only one key that would be the one added last. That is previous keys are forgotten or overridden when same keys are added.
To solve this issue MR.x would compromise with his family and come to an agreement to put two of them in the same hut. But how do we do it here? No delay just scroll down.
family_dict = {
"me": "Mr.x",
"father": "Mr.y",
"mother": "Mrs.z",
"son":"kid1",
"daughter1":"d1"
"daughter2":"d2",
"wife":"w1, w2"
}
If you observe clearly we just updated our hash key such that its value holds the previous one too. So in real time programming we should always check for existence of a key in hash if it already exists so as to make sure all our values being taken care of and none are overridden because of duplicate entry.
And this is how is is done. Now think of a hash which is going to store how many times a word occurs in the given text. For this we will store each word as a hash and its count as value. So everytime a word is revisited we will check for the value of the word and increment it by one.
if word not in wordcount:
wordcount[word] = 1
else:
wordcount[word] += 1
And finally we will print our hash:
#print the dictionary with sorted keys(words) and count as values
for k in sorted(wordcount):
print (k, wordcount[k])
This has been a bit long post but thanks for coming here.
Bonus:
Finding word frequency using Python