In this article, I’ll introduce some simple methods that can improve the speed of Python for loops by 1.3 to 900 times.
Python has a built-in utility called the timeit module. In the following sections, we’ll use it to measure the current and improved performance of loops.
For each method, we establish a baseline by running a test that includes executing the tested function 100K times (loops) over 10 test runs, then calculating the average time per loop (in nanoseconds, ns).
Several Simple Methods
1. List Comprehension
python
# Baseline version (Inefficient way)
# Calculating the power of numbers
# Without using List Comprehension
def test_01_v0(numbers):
output = []
for n in numbers:
output.append(n ** 2.5)
return output
# Improved version
# (Using List Comprehension)
def test_01_v1(numbers):
output = [n ** 2.5 for n in numbers]
return output
Results:
text
# Summary Of Test Results
Baseline: 32.158 ns per loop
Improved: 16.040 ns per loop
% Improvement: 50.1 %
Speedup: 2.00x
Using list comprehension results in a 2x speedup.
2. Calculate Length Outside the Loop
If you need to iterate based on the length of a list, calculate it outside the for loop.
python
# Baseline version (Inefficient way)
# (Length calculation inside for loop)
def test_02_v0(numbers):
output_list = []
for i in range(len(numbers)):
output_list.append(i * 2)
return output_list
# Improved version
# (Length calculation outside for loop)
def test_02_v1(numbers):
my_list_length = len(numbers)
output_list = []
for i in range(my_list_length):
output_list.append(i * 2)
return output_list
By moving the list length calculation outside the for loop, speed increases by 1.6x—this might be a lesser-known trick.
text
# Summary Of Test Results
Baseline: 112.135 ns per loop
Improved: 68.304 ns per loop
% Improvement: 39.1 %
Speedup: 1.64x
3. Use Sets
Use sets when performing comparisons in for loops.
python
# Use for loops for nested lookups
def test_03_v0(list_1, list_2):
# Baseline version (Inefficient way)
# (nested lookups using for loop)
common_items = []
for item in list_1:
if item in list_2:
common_items.append(item)
return common_items
def test_03_v1(list_1, list_2):
# Improved version
# (sets to replace nested lookups)
s_1 = set(list_1)
s_2 = set(list_2)
output_list = []
common_items = s_1.intersection(s_2)
return common_items
Using sets instead of nested for loops for comparisons results in a 498x speedup.
text
# Summary Of Test Results
Baseline: 9047.078 ns per loop
Improved: 18.161 ns per loop
% Improvement: 99.8 %
Speedup: 498.17x
4. Skip Irrelevant Iterations
Avoid redundant calculations by skipping irrelevant iterations.
python
# Example of inefficient code used to find
# the first even square in a list of numbers
def function_do_something(numbers):
for n in numbers:
square = n * n
if square % 2 == 0:
return square
return None # No even square found
# Example of improved code that
# finds result without redundant computations
def function_do_something_v1(numbers):
even_numbers = [i for n in numbers if n%2==0]
for n in even_numbers:
square = n * n
return square
return None # No even square found
This method requires thoughtful design of the loop’s content, and the improvement may vary based on the actual scenario.
text
# Summary Of Test Results
Baseline: 16.912 ns per loop
Improved: 8.697 ns per loop
% Improvement: 48.6 %
Speedup: 1.94x
5. Inline Code
In some cases, directly inlining the code of a simple function into the loop can enhance compactness and execution speed.
python
# Example of inefficient code
# Loop that calls the is_prime function n times.
def is_prime(n):
if n <= 1:
return False
for i in range(2, int(n**0.5) + 1):
if n % i == 0:
return False
return True
def test_05_v0(n):
# Baseline version (Inefficient way)
# (calls the is_prime function n times)
count = 0
for i in range(2, n + 1):
if is_prime(i):
count += 1
return count
def test_05_v1(n):
# Improved version
# (inlines the logic of the is_prime function)
count = 0
for i in range(2, n + 1):
if i <= 1:
continue
for j in range(2, int(i**0.5) + 1):
if i % j == 0:
break
else:
count += 1
return count
This approach yields a 1.3x improvement.
text
# Summary Of Test Results
Baseline: 1271.188 ns per loop
Improved: 939.603 ns per loop
% Improvement: 26.1 %
Speedup: 1.35x
Why is that?
Calling functions involves overhead, such as pushing and popping variables on the stack, function lookups, and argument passing. When a simple function is repeatedly called within a loop, the overhead of function calls accumulates and impacts performance. Therefore, inlining the function’s code directly into the loop eliminates this overhead, potentially leading to significant speed improvements.
⚠️ However, it’s important to balance code readability with the frequency of function calls.
Some Handy Tricks
6. Avoid Repetition
Consider avoiding repetitive calculations, as some computations may be redundant and slow down the code. Instead, consider precomputing values where applicable.
python
def test_07_v0(n):
# Example of inefficient code
# Repetitive calculation within nested loop
result = 0
for i in range(n):
for j in range(n):
result += i * j
return result
def test_07_v1(n):
# Example of improved code
# Utilize precomputed values to help speedup
pv = [[i * j for j in range(n)] for i in range(n)]
result = 0
for i in range(n):
result += sum(pv[i][:i+1])
return result
Results:
text
# Summary Of Test Results
Baseline: 139.146 ns per loop
Improved: 92.325 ns per loop
% Improvement: 33.6 %
Speedup: 1.51x
7. Use Generators
Generators support lazy evaluation, meaning expressions inside them are evaluated only when you request the next value. This dynamic processing helps reduce memory usage and improve performance, especially with large datasets.
python
def test_08_v0(n):
# Baseline version (Inefficient way)
# (Inefficiently calculates the nth Fibonacci
# number using a list)
if n <= 1:
return n
f_list = [0, 1]
for i in range(2, n + 1):
f_list.append(f_list[i - 1] + f_list[i - 2])
return f_list[n]
def test_08_v1(n):
# Improved version
# (Efficiently calculates the nth Fibonacci
# number using a generator)
a, b = 0, 1
for _ in range(n):
yield a
a, b = b, a + b
Noticeable improvement is observed:
text
# Summary Of Test Results
Baseline: 0.083 ns per loop
Improved: 0.004 ns per loop
% Improvement: 95.5 %
Speedup: 22.06x
8. map() Function
Use Python’s built-in map() function. It allows processing and transforming all items in an iterable without using explicit for loops.
python
def some_function_X(x):
# This would normally be a function containing application logic
# which required it to be made into a separate function
# (for the purpose of this test, just calculate and return the square)
return x**2
def test_09_v0(numbers):
# Baseline version (Inefficient way)
output = []
for i in numbers:
output.append(some_function_X(i))
return output
def test_09_v1(numbers):
# Improved version
# (Using Python's built-in map() function)
output = map(some_function_X, numbers)
return output
Using Python’s built-in map() function instead of explicit for loops speeds up by 970x.
text
# Summary Of Test Results
Baseline: 4.402 ns per loop
Improved: 0.005 ns per loop
% Improvement: 99.9 %
Speedup: 970.69x
Why is that?
The map() function is written in C and highly optimized, so its implicit inner loop is much more efficient than a regular Python for loop. Thus, speed is improved—or you could say Python is just too slow, ha!
9. Use Memoization
The idea of memoization is to cache (or “memoize”) the results of expensive function calls and return them when the same inputs occur again. It reduces redundant computations and speeds up programs.
First, the inefficient version:
python
# Example of inefficient code
def fibonacci(n):
if n == 0:
return 0
elif n == 1:
return 1
return fibonacci(n - 1) + fibonacci(n-2)
def test_10_v0(list_of_numbers):
output = []
for i in numbers:
output.append(fibonacci(i))
return output
Then, we use Python’s built-in functools.lru_cache.
python
# Example of efficient code
# Using Python's functools' lru_cache function
import functools
@functools.lru_cache()
def fibonacci_v2(n):
if n == 0:
return 0
elif n == 1:
return 1
return fibonacci_v2(n - 1) + fibonacci_v2(n-2)
def _test_10_v1(numbers):
output = []
for i in numbers:
output.append(fibonacci_v2(i))
return output
Results:
text
# Summary Of Test Results
Baseline: 63.664 ns per loop
Improved: 1.104 ns per loop
% Improvement: 98.3 %
Speedup: 57.69x
Using Python’s built-in functools.lru_cache for memoization speeds up by 57x.
How does lru_cache work?
“LRU” stands for “Least Recently Used.” lru_cache is a decorator that can be applied to functions to enable memoization. It stores the results of recent function calls in a cache and provides cached results when the same inputs appear again, saving computation time. When applied as a decorator, lru_cache allows an optional maxsize parameter, which determines the maximum size of the cache (i.e., how many different input values it stores results for). If maxsize is set to None, the LRU feature is disabled, and the cache can grow without constraint—which may consume significant memory. This is the simplest space-for-time optimization.
10. Vectorization
python
import numpy as np
def test_11_v0(n):
# Baseline version
# (Inefficient way of summing numbers in a range)
output = 0
for i in range(0, n):
output = output + i
return output
def test_11_v1(n):
# Improved version
# (# Efficient way of summing numbers in a range)
output = np.sum(np.arange(n))
return output
Vectorization is commonly used in data processing libraries like NumPy and Pandas in machine learning.
text
# Summary Of Test Results
Baseline: 32.936 ns per loop
Improved: 1.171 ns per loop
% Improvement: 96.4 %
Speedup: 28.13x
11. Avoid Creating Intermediate Lists
Using filterfalse can help avoid creating intermediate lists, reducing memory usage.
python
def test_12_v0(numbers):
# Baseline version (Inefficient way)
filtered_data = []
for i in numbers:
filtered_data.extend(list(
filter(lambda x: x % 5 == 0,
range(1, i**2))))
return filtered_data
Improved version using Python’s built-in itertools.filterfalse:
python
from itertools import filterfalse
def test_12_v1(numbers):
# Improved version
# (using filterfalse)
filtered_data = []
for i in numbers:
filtered_data.extend(list(
filterfalse(lambda x: x % 5 != 0,
range(1, i**2))))
return filtered_data
Depending on the use case, execution speed may not improve significantly, but avoiding intermediate lists reduces memory usage. Here, we achieved a 131x improvement.
text
# Summary Of Test Results
Baseline: 333167.790 ns per loop
Improved: 2541.850 ns per loop
% Improvement: 99.2 %
Speedup: 131.07x
12. Efficient String Concatenation
String concatenation using the + operator is slow and consumes more memory. Use join instead.
python
def test_13_v0(l_strings):
# Baseline version (Inefficient way)
# (concatenation using the += operator)
output = ""
for a_str in l_strings:
output += a_str
return output
def test_13_v1(numbers):
# Improved version
# (using join)
output_list = []
for a_str in l_strings:
output_list.append(a_str)
return"".join(output_list)
The test required a simple way to generate a large list of strings, so we wrote a helper function to produce the necessary string list for the test.
python
from faker import Faker
def generate_fake_names(count : int=10000):
# Helper function used to generate a
# large-ish list of names
fake = Faker()
output_list = []
for _ in range(count):
output_list.append(fake.name())
return output_list
l_strings = generate_fake_names(count=50000)
Results:
text
# Summary Of Test Results
Baseline: 32.423 ns per loop
Improved: 21.051 ns per loop
% Improvement: 35.1 %
Speedup: 1.54x
Using the join function instead of the + operator results in a 1.5x speedup. Why is join faster?
String concatenation using the + operator has a time complexity of O(n²), while using the join function has a time complexity of O(n).
Summary
This article introduced several simple methods that improved Python for loop performance from 1.3x to 970x.
- Using Python’s built-in
map()function instead of explicit for loops: 970x speedup. - Using sets instead of nested for loops: 498x speedup (Tip #3).
- Using
itertools.filterfalse: 131x speedup. - Using
lru_cachefor memoization: 57x speedup.