Hash Tables
A Hash Table (also called Hash Map) is a data structure that maps keys to values for highly efficient lookups. By using a hash function to compute array indices, hash tables achieve average O(1) time complexity for search, insertion, and deletion operations.
Interactive Hash Table Visualization
Watch keys move through the hash function, see collisions resolved through separate chaining, and observe how search operations traverse chains to find values.
Hash Table Operations (Separate Chaining)
Insert Key-Value Pair
Add or update a key-value pair
Search Key
Find a key in the hash table
Hash Table Visualization
Hash Function
Sum of character codes % 7
hash(key) = Σ(char codes) % 7
Total Items: 0 | Load Factor: 0.00 | Buckets: 7
How Hash Tables Work
Converts a key into an array index. Our simple function sums character codes and uses modulo to ensure the result fits in our 7-bucket array.
Separate Chaining: When multiple keys hash to the same index (collision), we store them in a linked list within that bucket.
Average: O(1) for insert, search, and delete operations.
Worst case: O(n) when all keys hash to the same bucket.
Load Factor: 0.00 (items/buckets). Keep below 0.75 for good performance.
Core Components of Hash Tables
Array of Buckets
The underlying storage mechanism:
- • Fixed-size array of storage locations
- • Each bucket can hold one or more key-value pairs
- • Index computed by the hash function
- • Direct access to any bucket in O(1) time
Hash Function
The key to efficient mapping:
- • Converts keys into array indices
- • Deterministic: same key → same index
- • Uniform distribution minimizes collisions
- • Fast computation for performance
Collision Handling
When multiple keys map to same index:
- • Inevitable: more keys than buckets
- • Separate Chaining: linked lists in buckets
- • Open Addressing: find alternative slots
- • Strategy choice affects performance
Hash Functions & Their Importance
What Makes a Good Hash Function?
- • Uniform Distribution: Spreads keys evenly across buckets
- • Deterministic: Always produces the same result for the same input
- • Fast Computation: Should be computed quickly (O(1) or O(k))
- • Avalanche Effect: Small input changes cause large output changes
- • Low Collision Rate: Minimizes keys mapping to same index
Common Hash Functions:
Division Method: h(k) = k mod m
Multiplication Method: h(k) = ⌊m(kA mod 1)⌋
String Hashing: Σ(char_code * base^i) mod m
Load Factor & Performance
The load factor (α) is the ratio of stored elements to total buckets: α = n/m
Low Load Factor (α < 0.5)
Few collisions, excellent performance, but wastes memory
Moderate Load Factor (0.5 ≤ α ≤ 0.75)
Good balance of performance and memory usage (recommended)
High Load Factor (α > 0.75)
Many collisions, degraded performance, time to resize
Separate Chaining Collision Resolution
How Separate Chaining Works
- Each bucket contains a linked list (or chain) of elements
- Hash function determines which bucket to use
- Multiple keys that hash to the same index are stored in the same bucket's chain
- Search operation traverses the chain looking for the target key
- Insertion adds to the beginning or end of the chain
- Deletion removes from the chain and updates links
Advantages:
- • Simple to implement and understand
- • Handles any number of collisions
- • Performance degrades gracefully
- • Easy to delete elements
Memory Layout Example
Considerations:
- • Extra memory overhead for pointers
- • Cache performance may suffer with long chains
- • Worst case: all keys hash to same bucket → O(n)
Complexity Analysis
| Operation | Average Case | Worst Case | Space | Explanation |
|---|---|---|---|---|
| Search | O(1) | O(n) | O(1) | Hash to bucket + traverse chain (avg length = α) |
| Insertion | O(1) | O(n) | O(1) | Hash + check for duplicates + insert at head/tail |
| Deletion | O(1) | O(n) | O(1) | Hash + search in chain + remove node |
| Space Overhead | O(n) | O(n) | - | Bucket array + node pointers in chains |
Performance Factors
Average O(1) assumes:
- Good hash function with uniform distribution
- Reasonable load factor (α ≤ 0.75)
- Random key distribution
Worst O(n) occurs when:
- All keys hash to the same bucket
- Poor hash function or adversarial input
- Extremely high load factor
Hash Table Implementation with Separate Chaining
Complete hash table implementations using arrays of linked lists for collision resolution. Study how the hash function maps keys to indices and how chaining handles multiple keys in the same bucket.
class HashNode:
"""
Node class for storing key-value pairs in the hash table
Used in separate chaining collision resolution
"""
def __init__(self, key, value):
self.key = key
self.value = value
self.next = None
class HashTable:
"""
Hash Table implementation using separate chaining for collision resolution
Uses an array of linked lists (buckets) to handle collisions
"""
def __init__(self, initial_capacity=7):
self.capacity = initial_capacity
self.size = 0
# Initialize array of empty buckets (linked lists)
self.buckets = [None] * self.capacity
def _hash(self, key):
"""
Simple hash function: sum of character codes mod capacity
Time Complexity: O(k) where k is the length of key
"""
if isinstance(key, int):
return key % self.capacity
hash_value = 0
for char in str(key):
hash_value += ord(char)
return hash_value % self.capacity
def insert(self, key, value):
"""
Insert a key-value pair into the hash table
Time Complexity: O(1) average, O(n) worst case
Space Complexity: O(1)
"""
index = self._hash(key)
# If bucket is empty, create first node
if self.buckets[index] is None:
self.buckets[index] = HashNode(key, value)
self.size += 1
return
# Traverse the chain to check for existing key or find end
current = self.buckets[index]
while current:
# Update existing key
if current.key == key:
current.value = value
return
# Reached end of chain, add new node
if current.next is None:
current.next = HashNode(key, value)
self.size += 1
return
current = current.next
def search(self, key):
"""
Search for a key in the hash table
Time Complexity: O(1) average, O(n) worst case
Space Complexity: O(1)
"""
index = self._hash(key)
current = self.buckets[index]
# Traverse the chain looking for the key
while current:
if current.key == key:
return current.value
current = current.next
# Key not found
raise KeyError(f"Key '{key}' not found in hash table")
def delete(self, key):
"""
Delete a key-value pair from the hash table
Time Complexity: O(1) average, O(n) worst case
Space Complexity: O(1)
"""
index = self._hash(key)
current = self.buckets[index]
# Empty bucket
if current is None:
raise KeyError(f"Key '{key}' not found in hash table")
# Key is in the first node
if current.key == key:
self.buckets[index] = current.next
self.size -= 1
return current.value
# Search in the chain
while current.next:
if current.next.key == key:
deleted_value = current.next.value
current.next = current.next.next
self.size -= 1
return deleted_value
current = current.next
# Key not found
raise KeyError(f"Key '{key}' not found in hash table")
def get_load_factor(self):
"""Calculate the load factor (size / capacity)"""
return self.size / self.capacity
def display(self):
"""Display the hash table structure for debugging"""
print(f"Hash Table (Size: {self.size}, Capacity: {self.capacity})")
print(f"Load Factor: {self.get_load_factor():.2f}")
for i, head in enumerate(self.buckets):
print(f"Bucket {i}: ", end="")
current = head
while current:
print(f"({current.key}: {current.value})", end="")
if current.next:
print(" -> ", end="")
current = current.next
print(" -> None" if head else "Empty")
def __len__(self):
return self.size
def __contains__(self, key):
try:
self.search(key)
return True
except KeyError:
return False
# Example usage and testing
if __name__ == "__main__":
# Create hash table
ht = HashTable()
# Test insertions
ht.insert("apple", "A red fruit")
ht.insert("banana", "A yellow fruit")
ht.insert("cat", "A feline pet")
ht.insert("dog", "A canine pet")
ht.insert("elephant", "A large mammal")
print("After insertions:")
ht.display()
# Test searches
try:
print(f"\nSearching for 'apple': {ht.search('apple')}")
print(f"Searching for 'cat': {ht.search('cat')}")
print(f"'dog' in hash table: {'dog' in ht}")
except KeyError as e:
print(f"Search error: {e}")
# Test update
ht.insert("apple", "A green fruit") # Update existing key
print(f"\nAfter updating 'apple': {ht.search('apple')}")
# Test deletion
try:
deleted = ht.delete("banana")
print(f"\nDeleted 'banana': {deleted}")
ht.display()
except KeyError as e:
print(f"Delete error: {e}") Real-World Applications & Advanced Topics
Database Indexing
Hash indexes provide O(1) lookups for exact-match queries in database systems.
Caching Systems
In-memory caches like Redis use hash tables for fast key-value storage and retrieval.
Programming Languages
Built-in dictionaries (Python), objects (JavaScript), and maps (Java) are hash table implementations.
Symbol Tables
Compilers use hash tables to store variable names, function definitions, and scoping information.
Distributed Systems
Consistent hashing distributes data across nodes in distributed databases and load balancers.
Cryptography
Cryptographic hash functions provide data integrity, digital signatures, and blockchain proof-of-work.
Advanced Hash Table Concepts
Dynamic Resizing: Automatically resize the table when load factor exceeds threshold, rehashing all elements.
Robin Hood Hashing: Open addressing variant that minimizes variance in probe distances.
Cuckoo Hashing: Guarantees O(1) worst-case lookup time using two hash functions.
Bloom Filters: Space-efficient probabilistic data structure for set membership testing.
Consistent Hashing: Distributes keys across changing sets of servers with minimal rehashing.
Perfect Hashing: Specialized hash functions that guarantee no collisions for static key sets.