Python Learning the Design Principles of Dictionary Data Structures

Conclusion first:

Python’s dict uses a hash table with open addressing (specifically quadratic probing + pseudo-random hopping) under the hood.

Core Structure: A hash table is essentially an “array”.

Each dict corresponds to an underlying array (called table). Each slot in this array may hold a key-value pair.

text

index:   0  1  2  3  4  5  6  7
value:  [ ] [ ] [ ] [ ] [ ] [ ] [ ] [ ]

Any input key is hashed (hash(key)).

python

d["abc"] = 123
# Python calculates h = hash("abc") → gets an integer, e.g., 918273645
# slot = h % table_size → e.g., 918273645 % 8 = 5

Thus, the key is placed in slot 5.

text

index:   0  1  2  3  4  5       6  7
value:  [ ] [ ] [ ] [ ] [ ] [('abc',123)] [ ] [ ]

If the calculated slot is already occupied, the dict does not use chaining (like a linked list). Instead, it finds the next empty slot using Open Addressing.

text

slot 6 empty? → Put here.
slot 6 also taken? → Check slot 7.
...

The dict controls the “load factor” (usage ratio). For example, if slot usage exceeds ~2/3, it automatically expands (resizes). After expansion, there is much more space, collisions become rarer, thus dict performance remains O(1).

Resizing operation:

  • The table size is doubled.
  • All keys are rehashed and placed into the new table (rehash).

Considering specific application scenarios: Lookup and insertion operations in a dict have O(1) time complexity on average.

1. Lookup Process

Looking up d[key]:

  1. Calculate hash(key).
  2. Locate the slot: slot = hash % table_size.
  3. Check if the key in that slot is the target key.
    • Yes → Found.
    • No → Continue probing using the open addressing rule.

Therefore, the efficiency of this process depends on the underlying principle of the hash algorithm. The hash function generally involves bitwise operations + randomization.


2. Practical Code Examples

python

# 1. Create an empty dict
adict = {}

# 2. Insert a key-value pair
# The key is hashed, a slot is found (potentially via probing), and the pair is stored.
adict[key] = value

# 3. Check if a key exists
if i not in adict:  # Checks if 'i' is a key in 'adict'
    # This operation involves calculating hash(i) and probing the table.
    pass