Hash Tables

A Hash Table (also called Hash Map) is a data structure that maps keys to values for highly efficient lookups. By using a hash function to compute array indices, hash tables achieve average O(1) time complexity for search, insertion, and deletion operations.

Interactive Hash Table Visualization

Watch keys move through the hash function, see collisions resolved through separate chaining, and observe how search operations traverse chains to find values.

Hash Table Operations (Separate Chaining)

Insert Key-Value Pair

Add or update a key-value pair

Search Key

Find a key in the hash table

1000ms

Hash Table Visualization

Hash Function

Sum of character codes % 7

hash(key) = Σ(char codes) % 7

Bucket 0
Empty
Bucket 1
Empty
Bucket 2
Empty
Bucket 3
Empty
Bucket 4
Empty
Bucket 5
Empty
Bucket 6
Empty

Total Items: 0 | Load Factor: 0.00 | Buckets: 7

How Hash Tables Work

Hash Function:

Converts a key into an array index. Our simple function sums character codes and uses modulo to ensure the result fits in our 7-bucket array.

Separate Chaining: When multiple keys hash to the same index (collision), we store them in a linked list within that bucket.

Performance:

Average: O(1) for insert, search, and delete operations.

Worst case: O(n) when all keys hash to the same bucket.

Load Factor: 0.00 (items/buckets). Keep below 0.75 for good performance.

Core Components of Hash Tables

Array of Buckets

The underlying storage mechanism:

  • Fixed-size array of storage locations
  • Each bucket can hold one or more key-value pairs
  • Index computed by the hash function
  • Direct access to any bucket in O(1) time

Hash Function

The key to efficient mapping:

  • Converts keys into array indices
  • Deterministic: same key → same index
  • Uniform distribution minimizes collisions
  • Fast computation for performance

Collision Handling

When multiple keys map to same index:

  • Inevitable: more keys than buckets
  • Separate Chaining: linked lists in buckets
  • Open Addressing: find alternative slots
  • Strategy choice affects performance

Hash Functions & Their Importance

What Makes a Good Hash Function?

  • Uniform Distribution: Spreads keys evenly across buckets
  • Deterministic: Always produces the same result for the same input
  • Fast Computation: Should be computed quickly (O(1) or O(k))
  • Avalanche Effect: Small input changes cause large output changes
  • Low Collision Rate: Minimizes keys mapping to same index

Common Hash Functions:

Division Method: h(k) = k mod m

Multiplication Method: h(k) = ⌊m(kA mod 1)⌋

String Hashing: Σ(char_code * base^i) mod m

Load Factor & Performance

The load factor (α) is the ratio of stored elements to total buckets: α = n/m

Low Load Factor (α < 0.5)

Few collisions, excellent performance, but wastes memory

Moderate Load Factor (0.5 ≤ α ≤ 0.75)

Good balance of performance and memory usage (recommended)

High Load Factor (α > 0.75)

Many collisions, degraded performance, time to resize

Separate Chaining Collision Resolution

How Separate Chaining Works

  1. Each bucket contains a linked list (or chain) of elements
  2. Hash function determines which bucket to use
  3. Multiple keys that hash to the same index are stored in the same bucket's chain
  4. Search operation traverses the chain looking for the target key
  5. Insertion adds to the beginning or end of the chain
  6. Deletion removes from the chain and updates links

Advantages:

  • • Simple to implement and understand
  • • Handles any number of collisions
  • • Performance degrades gracefully
  • • Easy to delete elements

Memory Layout Example

Bucket Array:
0: ["cat":"🐱"] → ["hat":"🎩"] → null
1: null
2: ["dog":"🐶"] → null
3: ["apple":"🍎"] → ["table":"📱"] → null
4: null
5: ["fish":"🐟"] → null
6: ["banana":"🍌"] → null

Considerations:

  • • Extra memory overhead for pointers
  • • Cache performance may suffer with long chains
  • • Worst case: all keys hash to same bucket → O(n)

Complexity Analysis

Operation Average Case Worst Case Space Explanation
Search O(1) O(n) O(1) Hash to bucket + traverse chain (avg length = α)
Insertion O(1) O(n) O(1) Hash + check for duplicates + insert at head/tail
Deletion O(1) O(n) O(1) Hash + search in chain + remove node
Space Overhead O(n) O(n) - Bucket array + node pointers in chains

Performance Factors

Average O(1) assumes:

  • Good hash function with uniform distribution
  • Reasonable load factor (α ≤ 0.75)
  • Random key distribution

Worst O(n) occurs when:

  • All keys hash to the same bucket
  • Poor hash function or adversarial input
  • Extremely high load factor

Hash Table Implementation with Separate Chaining

Complete hash table implementations using arrays of linked lists for collision resolution. Study how the hash function maps keys to indices and how chaining handles multiple keys in the same bucket.

class HashNode:
    """
    Node class for storing key-value pairs in the hash table
    Used in separate chaining collision resolution
    """
    def __init__(self, key, value):
        self.key = key
        self.value = value
        self.next = None

class HashTable:
    """
    Hash Table implementation using separate chaining for collision resolution
    Uses an array of linked lists (buckets) to handle collisions
    """
    
    def __init__(self, initial_capacity=7):
        self.capacity = initial_capacity
        self.size = 0
        # Initialize array of empty buckets (linked lists)
        self.buckets = [None] * self.capacity
    
    def _hash(self, key):
        """
        Simple hash function: sum of character codes mod capacity
        Time Complexity: O(k) where k is the length of key
        """
        if isinstance(key, int):
            return key % self.capacity
        
        hash_value = 0
        for char in str(key):
            hash_value += ord(char)
        return hash_value % self.capacity
    
    def insert(self, key, value):
        """
        Insert a key-value pair into the hash table
        Time Complexity: O(1) average, O(n) worst case
        Space Complexity: O(1)
        """
        index = self._hash(key)
        
        # If bucket is empty, create first node
        if self.buckets[index] is None:
            self.buckets[index] = HashNode(key, value)
            self.size += 1
            return
        
        # Traverse the chain to check for existing key or find end
        current = self.buckets[index]
        while current:
            # Update existing key
            if current.key == key:
                current.value = value
                return
            
            # Reached end of chain, add new node
            if current.next is None:
                current.next = HashNode(key, value)
                self.size += 1
                return
            
            current = current.next
    
    def search(self, key):
        """
        Search for a key in the hash table
        Time Complexity: O(1) average, O(n) worst case
        Space Complexity: O(1)
        """
        index = self._hash(key)
        current = self.buckets[index]
        
        # Traverse the chain looking for the key
        while current:
            if current.key == key:
                return current.value
            current = current.next
        
        # Key not found
        raise KeyError(f"Key '{key}' not found in hash table")
    
    def delete(self, key):
        """
        Delete a key-value pair from the hash table
        Time Complexity: O(1) average, O(n) worst case
        Space Complexity: O(1)
        """
        index = self._hash(key)
        current = self.buckets[index]
        
        # Empty bucket
        if current is None:
            raise KeyError(f"Key '{key}' not found in hash table")
        
        # Key is in the first node
        if current.key == key:
            self.buckets[index] = current.next
            self.size -= 1
            return current.value
        
        # Search in the chain
        while current.next:
            if current.next.key == key:
                deleted_value = current.next.value
                current.next = current.next.next
                self.size -= 1
                return deleted_value
            current = current.next
        
        # Key not found
        raise KeyError(f"Key '{key}' not found in hash table")
    
    def get_load_factor(self):
        """Calculate the load factor (size / capacity)"""
        return self.size / self.capacity
    
    def display(self):
        """Display the hash table structure for debugging"""
        print(f"Hash Table (Size: {self.size}, Capacity: {self.capacity})")
        print(f"Load Factor: {self.get_load_factor():.2f}")
        
        for i, head in enumerate(self.buckets):
            print(f"Bucket {i}: ", end="")
            current = head
            while current:
                print(f"({current.key}: {current.value})", end="")
                if current.next:
                    print(" -> ", end="")
                current = current.next
            print(" -> None" if head else "Empty")
    
    def __len__(self):
        return self.size
    
    def __contains__(self, key):
        try:
            self.search(key)
            return True
        except KeyError:
            return False

# Example usage and testing
if __name__ == "__main__":
    # Create hash table
    ht = HashTable()
    
    # Test insertions
    ht.insert("apple", "A red fruit")
    ht.insert("banana", "A yellow fruit")
    ht.insert("cat", "A feline pet")
    ht.insert("dog", "A canine pet")
    ht.insert("elephant", "A large mammal")
    
    print("After insertions:")
    ht.display()
    
    # Test searches
    try:
        print(f"\nSearching for 'apple': {ht.search('apple')}")
        print(f"Searching for 'cat': {ht.search('cat')}")
        print(f"'dog' in hash table: {'dog' in ht}")
    except KeyError as e:
        print(f"Search error: {e}")
    
    # Test update
    ht.insert("apple", "A green fruit")  # Update existing key
    print(f"\nAfter updating 'apple': {ht.search('apple')}")
    
    # Test deletion
    try:
        deleted = ht.delete("banana")
        print(f"\nDeleted 'banana': {deleted}")
        ht.display()
    except KeyError as e:
        print(f"Delete error: {e}")

Real-World Applications & Advanced Topics

Database Indexing

Hash indexes provide O(1) lookups for exact-match queries in database systems.

Caching Systems

In-memory caches like Redis use hash tables for fast key-value storage and retrieval.

Programming Languages

Built-in dictionaries (Python), objects (JavaScript), and maps (Java) are hash table implementations.

Symbol Tables

Compilers use hash tables to store variable names, function definitions, and scoping information.

Distributed Systems

Consistent hashing distributes data across nodes in distributed databases and load balancers.

Cryptography

Cryptographic hash functions provide data integrity, digital signatures, and blockchain proof-of-work.

Advanced Hash Table Concepts

Dynamic Resizing: Automatically resize the table when load factor exceeds threshold, rehashing all elements.

Robin Hood Hashing: Open addressing variant that minimizes variance in probe distances.

Cuckoo Hashing: Guarantees O(1) worst-case lookup time using two hash functions.

Bloom Filters: Space-efficient probabilistic data structure for set membership testing.

Consistent Hashing: Distributes keys across changing sets of servers with minimal rehashing.

Perfect Hashing: Specialized hash functions that guarantee no collisions for static key sets.