Data Structures- Hashmaps, Sets, Hash Tables, Hashing and Collisions
Observing hashmaps with python dictionaries
What is a Hashtable/Hashmap?
A hashtable is a data structure that with a collection of key-value pairs, where each key maps to a value, and the keys must be unique and hashable.
In Python, the built-in implementation of a hashtable is called a "dictionary".
The typical time complexity of a hashtable is O(1) in the average case for lookup, insertion, and deletion operations. However, in the worst-case scenario where there are hash collisions and many elements map to the same index in the underlying array, the time complexity can degrade to O(n), where n is the number of elements in the hashtable. To mitigate this, most hashtable implementations use techniques like chaining or open addressing to handle collisions.
What is Hashing and Collision?
Hashing is the process of mapping a given key to a value in a hash table or hashmap, using a hash function. The hash function takes the key as input and produces a hash value or hash code, which is then used to determine the index in the underlying array where the value is stored. The purpose of hashing is to provide a quick and efficient way to access data, by eliminating the need to search through an entire data structure to find a value.
However, it is possible for two different keys to map to the same hash value, resulting in a collision. When a collision occurs, there are different ways to resolve it, depending on the collision resolution strategy used.
Python's dictionary implementation is optimized to handle collisions efficiently, and the performance of the dictionary is generally very good, even in the presence of collisions. However, if the number of collisions is very high, the performance of the dictionary can degrade, so it is important to choose a good hash function that minimizes collisions when designing a Python dictionary.
What is a Set?
my_set = set([1, 2, 3, 2, 1])
print(my_set)
# In the output of the code, you'll notice that the duplicate elements from the list [1, 2, 3, 2, 1] are removed, and the resulting set contains only unique elements [1, 2, 3]. This is because sets only contain unique elements, and the set() function removes duplicates when creating a set from a list or other iterable.
# Sets are related to hashtables/HashMaps because both use a hash function to efficiently retrieve and store elements. In sets, the hash function is used to determine the location of an element in the set, while in hashtables/HashMaps, the hash function is used to map a key to an index in the underlying array. Both data structures offer fast lookup times, making them useful for storing and retrieving large amounts of data. Additionally, both data structures require that their elements be hashable in order to use the hash function.
lover_album = {
"title": "Lover",
"artist": "Taylor Swift",
"year": 2019,
"genre": ["Pop", "Synth-pop"],
"tracks": {
1: "I Forgot That You Existed",
2: "Cruel Summer",
3: "Lover",
4: "The Man",
5: "The Archer",
6: "I Think He Knows",
7: "Miss Americana & The Heartbreak Prince",
8: "Paper Rings",
9: "Cornelia Street",
10: "Death By A Thousand Cuts",
11: "London Boy",
12: "Soon You'll Get Better (feat. Dixie Chicks)",
13: "False God",
14: "You Need To Calm Down",
15: "Afterglow",
16: "Me! (feat. Brendon Urie of Panic! At The Disco)",
17: "It's Nice To Have A Friend",
18: "Daylight"
}
}
# What data structures do you see?
#
#
# Printing the dictionary
print(lover_album)
I see the dictionary and list data structures, list for the genres, dictionary for the tracks.
print(lover_album.get('tracks'))
# or
print(lover_album['tracks'])
Both the above methods work. They do the same thing.
print(lover_album.get('tracks')[4])
# or
print(lover_album['tracks'][4])
You can get specific tracks by using indexes.
lover_album["producer"] = ['Taylor Swift', 'Jack Antonoff', 'Joel Little', 'Taylor Swift', 'Louis Bell', 'Frank Dukes']
# What can you change to make sure there are no duplicate producers?
#
#
# Printing the dictionary
print(lover_album)
To make sure there are no duplicates, you can change it to be a set. In the above code, you can see that when I change it to be set, then there is only one Taylor Swift in the producers list because it removed the duplicates!
lover_album["tracks"].update({19: "All Of The Girls You Loved Before"})
# How would add an additional genre to the dictionary, like electropop?
#
#
# Printing the dictionary
print(lover_album)
You can use the append function to add a genre to the genres list within the dictionary.
for k,v in lover_album.items(): # iterate using a for loop for key and value
print(str(k) + ": " + str(v))
# Write your own code to print tracks in readable format
#
#
In the above code, you can see that I printed out the tracks in a readable format. I even included the track number.
def search():
search = input("What would you like to know about the album?")
if lover_album.get(search.lower()) == None:
print("Invalid Search")
else:
print(lover_album.get(search.lower()))
search()
# This is a very basic code segment, how can you improve upon this code?
#
#
To make this code segment better, you can give some options and give clearer errors. For instance, "your option was not a valid choice, please select a valid choice." And in general, there is no error checking which is not very optimal.
Hacks
- Answer ALL questions in the code segments
- Create a diagram or comparison illustration (Canva).
- What are the pro and cons of using this data structure?
- Dictionary vs List
- Expand upon the code given to you, possible improvements in comments
-
Build your own album showing features of a python dictionary
-
For Mr. Yeung's class: Justify your favorite Taylor Swift song, answer may effect seed
Starboy_album = {
"title": "Starboy",
"artist": "The Weeknd",
"year": 2016,
"genre": ["Rap", "Chill"],
"tracks": {
1: ["Starboy", 9],
2: ["Party Monster", 4],
3: ["False Alarm", 10],
4: ["Reminder", 6],
5: ["Rockin", 10],
6: ["Secrets", 8],
7: ["True colors", 5],
8: ["Stargirl interlude", 7],
9: ["sidewalks", 9],
10: ["Sixfeet under", 8],
11: ["Love to lay", 6],
12: ["A lonely night", 7],
13: ["Attention", 9],
14: ["Ordinary life", 8],
15: ["Nothing without you", 9],
16: ["All I know", 10],
17: ["Die for you", 6],
18: ["I feel it coming", 6]
}
}
# Printing the dictionary
print(Starboy_album)