12 Ways to Optimize Python Loops

david 12/12/2025

In this article, I’ll introduce some simple methods that can improve the speed of Python for loops by 1.3 to 900 times.

Python has a built-in utility called the timeit module. In the following sections, we’ll use it to measure the current and improved performance of loops.

For each method, we establish a baseline by running a test that includes executing the tested function 100K times (loops) over 10 test runs, then calculating the average time per loop (in nanoseconds, ns).


Several Simple Methods

1. List Comprehension

python

# Baseline version (Inefficient way)
# Calculating the power of numbers
# Without using List Comprehension
def test_01_v0(numbers):
    output = []
    for n in numbers:
        output.append(n ** 2.5)
    return output

# Improved version
# (Using List Comprehension)
def test_01_v1(numbers):
    output = [n ** 2.5 for n in numbers]
    return output

Results:

text

# Summary Of Test Results
      Baseline: 32.158 ns per loop
      Improved: 16.040 ns per loop
% Improvement: 50.1 %
      Speedup: 2.00x

Using list comprehension results in a 2x speedup.


2. Calculate Length Outside the Loop

If you need to iterate based on the length of a list, calculate it outside the for loop.

python

# Baseline version (Inefficient way)
# (Length calculation inside for loop)
def test_02_v0(numbers):
    output_list = []
    for i in range(len(numbers)):
        output_list.append(i * 2)
    return output_list

# Improved version
# (Length calculation outside for loop)
def test_02_v1(numbers):
    my_list_length = len(numbers)
    output_list = []
    for i in range(my_list_length):
        output_list.append(i * 2)
    return output_list

By moving the list length calculation outside the for loop, speed increases by 1.6x—this might be a lesser-known trick.

text

# Summary Of Test Results
      Baseline: 112.135 ns per loop
      Improved: 68.304 ns per loop
% Improvement: 39.1 %
      Speedup: 1.64x

3. Use Sets

Use sets when performing comparisons in for loops.

python

# Use for loops for nested lookups
def test_03_v0(list_1, list_2):
    # Baseline version (Inefficient way)
    # (nested lookups using for loop)
    common_items = []
    for item in list_1:
        if item in list_2:
            common_items.append(item)
    return common_items

def test_03_v1(list_1, list_2):
    # Improved version
    # (sets to replace nested lookups)
    s_1 = set(list_1)
    s_2 = set(list_2)
    output_list = []
    common_items = s_1.intersection(s_2)
    return common_items

Using sets instead of nested for loops for comparisons results in a 498x speedup.

text

# Summary Of Test Results
      Baseline: 9047.078 ns per loop
      Improved:   18.161 ns per loop
% Improvement: 99.8 %
      Speedup: 498.17x

4. Skip Irrelevant Iterations

Avoid redundant calculations by skipping irrelevant iterations.

python

# Example of inefficient code used to find
# the first even square in a list of numbers
def function_do_something(numbers):
    for n in numbers:
        square = n * n
        if square % 2 == 0:
            return square

    return None  # No even square found

# Example of improved code that
# finds result without redundant computations
def function_do_something_v1(numbers):
    even_numbers = [i for n in numbers if n%2==0]
    for n in even_numbers:
        square = n * n
        return square

    return None  # No even square found

This method requires thoughtful design of the loop’s content, and the improvement may vary based on the actual scenario.

text

# Summary Of Test Results
      Baseline: 16.912 ns per loop
      Improved: 8.697 ns per loop
% Improvement: 48.6 %
      Speedup: 1.94x

5. Inline Code

In some cases, directly inlining the code of a simple function into the loop can enhance compactness and execution speed.

python

# Example of inefficient code
# Loop that calls the is_prime function n times.
def is_prime(n):
    if n <= 1:
        return False
    for i in range(2, int(n**0.5) + 1):
        if n % i == 0:
            return False

    return True

def test_05_v0(n):
    # Baseline version (Inefficient way)
    # (calls the is_prime function n times)
    count = 0
    for i in range(2, n + 1):
        if is_prime(i):
            count += 1
    return count

def test_05_v1(n):
    # Improved version
    # (inlines the logic of the is_prime function)
    count = 0
    for i in range(2, n + 1):
        if i <= 1:
            continue
        for j in range(2, int(i**0.5) + 1):
            if i % j == 0:
                break
        else:
            count += 1
    return count

This approach yields a 1.3x improvement.

text

# Summary Of Test Results
      Baseline: 1271.188 ns per loop
      Improved: 939.603 ns per loop
% Improvement: 26.1 %
      Speedup: 1.35x

Why is that?

Calling functions involves overhead, such as pushing and popping variables on the stack, function lookups, and argument passing. When a simple function is repeatedly called within a loop, the overhead of function calls accumulates and impacts performance. Therefore, inlining the function’s code directly into the loop eliminates this overhead, potentially leading to significant speed improvements.

⚠️ However, it’s important to balance code readability with the frequency of function calls.


Some Handy Tricks

6. Avoid Repetition

Consider avoiding repetitive calculations, as some computations may be redundant and slow down the code. Instead, consider precomputing values where applicable.

python

def test_07_v0(n):
    # Example of inefficient code
    # Repetitive calculation within nested loop
    result = 0
    for i in range(n):
        for j in range(n):
            result += i * j
    return result

def test_07_v1(n):
    # Example of improved code
    # Utilize precomputed values to help speedup
    pv = [[i * j for j in range(n)] for i in range(n)]
    result = 0
    for i in range(n):
        result += sum(pv[i][:i+1])
    return result

Results:

text

# Summary Of Test Results
      Baseline: 139.146 ns per loop
      Improved: 92.325 ns per loop
% Improvement: 33.6 %
      Speedup: 1.51x

7. Use Generators

Generators support lazy evaluation, meaning expressions inside them are evaluated only when you request the next value. This dynamic processing helps reduce memory usage and improve performance, especially with large datasets.

python

def test_08_v0(n):
    # Baseline version (Inefficient way)
    # (Inefficiently calculates the nth Fibonacci
    # number using a list)
    if n <= 1:
        return n
    f_list = [0, 1]
    for i in range(2, n + 1):
        f_list.append(f_list[i - 1] + f_list[i - 2])
    return f_list[n]

def test_08_v1(n):
    # Improved version
    # (Efficiently calculates the nth Fibonacci
    # number using a generator)
    a, b = 0, 1
    for _ in range(n):
        yield a
        a, b = b, a + b

Noticeable improvement is observed:

text

# Summary Of Test Results
      Baseline: 0.083 ns per loop
      Improved: 0.004 ns per loop
% Improvement: 95.5 %
      Speedup: 22.06x

8. map() Function

Use Python’s built-in map() function. It allows processing and transforming all items in an iterable without using explicit for loops.

python

def some_function_X(x):
    # This would normally be a function containing application logic
    # which required it to be made into a separate function
    # (for the purpose of this test, just calculate and return the square)
    return x**2

def test_09_v0(numbers):
    # Baseline version (Inefficient way)
    output = []
    for i in numbers:
        output.append(some_function_X(i))

    return output

def test_09_v1(numbers):
    # Improved version
    # (Using Python's built-in map() function)
    output = map(some_function_X, numbers)
    return output

Using Python’s built-in map() function instead of explicit for loops speeds up by 970x.

text

# Summary Of Test Results
      Baseline: 4.402 ns per loop
      Improved: 0.005 ns per loop
% Improvement: 99.9 %
      Speedup: 970.69x

Why is that?

The map() function is written in C and highly optimized, so its implicit inner loop is much more efficient than a regular Python for loop. Thus, speed is improved—or you could say Python is just too slow, ha!


9. Use Memoization

The idea of memoization is to cache (or “memoize”) the results of expensive function calls and return them when the same inputs occur again. It reduces redundant computations and speeds up programs.

First, the inefficient version:

python

# Example of inefficient code
def fibonacci(n):
    if n == 0:
        return 0
    elif n == 1:
        return 1
    return fibonacci(n - 1) + fibonacci(n-2)

def test_10_v0(list_of_numbers):
    output = []
    for i in numbers:
        output.append(fibonacci(i))

    return output

Then, we use Python’s built-in functools.lru_cache.

python

# Example of efficient code
# Using Python's functools' lru_cache function
import functools

@functools.lru_cache()
def fibonacci_v2(n):
    if n == 0:
        return 0
    elif n == 1:
        return 1
    return fibonacci_v2(n - 1) + fibonacci_v2(n-2)

def _test_10_v1(numbers):
    output = []
    for i in numbers:
        output.append(fibonacci_v2(i))

    return output

Results:

text

# Summary Of Test Results
      Baseline: 63.664 ns per loop
      Improved: 1.104 ns per loop
% Improvement: 98.3 %
      Speedup: 57.69x

Using Python’s built-in functools.lru_cache for memoization speeds up by 57x.

How does lru_cache work?

“LRU” stands for “Least Recently Used.” lru_cache is a decorator that can be applied to functions to enable memoization. It stores the results of recent function calls in a cache and provides cached results when the same inputs appear again, saving computation time. When applied as a decorator, lru_cache allows an optional maxsize parameter, which determines the maximum size of the cache (i.e., how many different input values it stores results for). If maxsize is set to None, the LRU feature is disabled, and the cache can grow without constraint—which may consume significant memory. This is the simplest space-for-time optimization.


10. Vectorization

python

import numpy as np

def test_11_v0(n):
    # Baseline version
    # (Inefficient way of summing numbers in a range)
    output = 0
    for i in range(0, n):
        output = output + i

    return output

def test_11_v1(n):
    # Improved version
    # (# Efficient way of summing numbers in a range)
    output = np.sum(np.arange(n))
    return output

Vectorization is commonly used in data processing libraries like NumPy and Pandas in machine learning.

text

# Summary Of Test Results
      Baseline: 32.936 ns per loop
      Improved: 1.171 ns per loop
% Improvement: 96.4 %
      Speedup: 28.13x

11. Avoid Creating Intermediate Lists

Using filterfalse can help avoid creating intermediate lists, reducing memory usage.

python

def test_12_v0(numbers):
    # Baseline version (Inefficient way)
    filtered_data = []
    for i in numbers:
        filtered_data.extend(list(
            filter(lambda x: x % 5 == 0,
                    range(1, i**2))))
   
    return filtered_data

Improved version using Python’s built-in itertools.filterfalse:

python

from itertools import filterfalse

def test_12_v1(numbers):
    # Improved version
    # (using filterfalse)
    filtered_data = []
    for i in numbers:
        filtered_data.extend(list(
            filterfalse(lambda x: x % 5 != 0,
                        range(1, i**2))))
        
    return filtered_data

Depending on the use case, execution speed may not improve significantly, but avoiding intermediate lists reduces memory usage. Here, we achieved a 131x improvement.

text

# Summary Of Test Results
      Baseline: 333167.790 ns per loop
      Improved: 2541.850 ns per loop
% Improvement: 99.2 %
      Speedup: 131.07x

12. Efficient String Concatenation

String concatenation using the + operator is slow and consumes more memory. Use join instead.

python

def test_13_v0(l_strings):
    # Baseline version (Inefficient way)
    # (concatenation using the += operator)
    output = ""
    for a_str in l_strings:
        output += a_str

    return output

def test_13_v1(numbers):
    # Improved version
    # (using join)
    output_list = []
    for a_str in l_strings:
        output_list.append(a_str)

    return"".join(output_list)

The test required a simple way to generate a large list of strings, so we wrote a helper function to produce the necessary string list for the test.

python

from faker import Faker

def generate_fake_names(count : int=10000):
    # Helper function used to generate a
    # large-ish list of names
    fake = Faker()
    output_list = []
    for _ in range(count):
        output_list.append(fake.name())

    return output_list

l_strings = generate_fake_names(count=50000)

Results:

text

# Summary Of Test Results
      Baseline: 32.423 ns per loop
      Improved: 21.051 ns per loop
% Improvement: 35.1 %
      Speedup: 1.54x

Using the join function instead of the + operator results in a 1.5x speedup. Why is join faster?

String concatenation using the + operator has a time complexity of O(n²), while using the join function has a time complexity of O(n).


Summary

This article introduced several simple methods that improved Python for loop performance from 1.3x to 970x.

  • Using Python’s built-in map() function instead of explicit for loops: 970x speedup.
  • Using sets instead of nested for loops: 498x speedup (Tip #3).
  • Using itertools.filterfalse131x speedup.
  • Using lru_cache for memoization: 57x speedup.