When a Python for loop becomes slow, you can often speed it up by running iterations in parallel across multiple CPU cores. This is useful for CPU-bound tasks where each loop iteration is independent.

1. multiprocessing.Pool

from multiprocessing import Pool, cpu_count

def task(x):
    return x * x  # simulate CPU work

items = list(range(1000))

with Pool(cpu_count()) as p:
    results = p.map(task, items)

2. concurrent.futures

from concurrent.futures import ProcessPoolExecutor

def task(x):
    return x * x

with ProcessPoolExecutor() as executor:
    results = list(executor.map(task, items))

Benchmark: Parallel vs Non-Parallel

Here’s a simple timing comparison:

import time
from multiprocessing import Pool, cpu_count
import requests

def task(x):
    return x * x

def task2(x):
    response = requests.post("https://httpbin.org/post", data={"key": "value"}, timeout=5)

items = list(range(100))

# Serial
start = time.perf_counter()
serial_results = [task2(x) for x in items]
serial_time = time.perf_counter() - start

# Parallel
start = time.perf_counter()
with Pool(cpu_count()) as p:
    parallel_results = p.map(task2, items)
parallel_time = time.perf_counter() - start


print(f"Serial time:   {serial_time:.3f}s")
print(f"Parallel time: {parallel_time:.3f}s")

results:

task

Serial time:   1.024s
Parallel time: 2.650s

task 2

Serial time:   55.345s
Parallel time: 4.551s

Parallel code can be slower than serial for several practical reasons:

  • Process Startup & IPC Overhead
  • Task Is Too Small (Granularity Problem)
  • Data Copying Costs
  • Limited CPU Cores or CPU Throttling

Quick Diagnostic

If this is true:

task_time << process_overhead

Then:

parallel_time > serial_time

This is why benchmarking is critical before parallelizing.

Important Tips

  • Use multiprocessing, not threading, for CPU-bound tasks (GIL).
  • Parallel overhead can make small tasks slower.
  • Always benchmark before and after parallelizing.
  • For Pandas and NumPy, try vectorization first.

Parallel processing can provide major speedups for heavy workloads — but only when the task size justifies the overhead.