For Loop to List Comprehension

Last night a friend of mine sent a snippet he quickly wrote for his class to avoid using calculator and doing individual calculations. My first thought when I saw the script was: "Why are you using for loops? What's going on with that f statement? Maybe it will be faster if list comprehension was used?"

Code that uses for loop

My initial comments

Mind you, the guy knows his Python. One of the best coders I know. He was just in a hurry to finish it quickly and didn't bother making it complicated.

What actually is list comprehension? And why is it better than for loops?

To answer that question, lets look at two snippets of code. A for loop and a list comprehension.

values = [12, 13, 56, 54, 56]

#for loop
fitness = []

for x in values:
        fitness.append(((15 * x) - x ** 2))

#list comprehension
fitness = [((15*x) - (x **2)) for x in values]

As per Programmiz, List comprehension is an elegant way to define and create lists based on existing lists. I am satisfied with that definition so let's move on.

Let's look at the differences. The for loop code is much lengthier than the list comprehension. The second thing is each of the time the for loop is called here, the list needs to be appended. It takes a while to lookup and the load the append function and the for loop has to do that with every single iteration. But will this really affect it's performance? We'll see.

So today, I went ahead and changed his code to use list comprehensions. Here is my code.

values = [12, 4, 1, 14, 7, 9]

fitness = [((15*x) - x **2) for x in values]

total_fitness = sum(fitness)

fitness_ratios = [(x/total_fitness) * 100 for x in fitness]

for i, (v, f, ft) in enumerate(zip(values, fitness, fitness_ratios), start=1):
    print(f"{i} = {v}, {f}, {ft}")

Disclaimer: The article gets boring from here on out. It's mostly running some tests and me being unhappy about the results.

After writing this I decided to test it's performance. Now what I tried at first was overkill probably. I tried to crunch 2 billion numbers. By setting the value to a range from 1 to 2 billion. Go big or go home?

That didn't work very well. After a few minutes of running, I ended up with a MemoryError. So I decided to remove the print statements from the snippets, make them into functions and return the fitness and fitness_ratios values instead.

import time

start = time.time()

#CODE GOES HERE

end = time.time()
print(end - start)

This time I tried will 1 billion numbers. The for loop when running was using about +/- 10GB. Yeah you read that right! TEN. Talk about inefficient code? I was pretty sure mine wouldn't hit that.

Boy, was I wrong. Mine started at +/- 4GB but pretty quickly went upto 8GB and even 10 at some point. Now the big question is did it take more time? Well, technically it didn't. But I am not at all satisfied with the results. So I changed it up a bit and removed everything that was unnecessary, made the values outside the functions and used 10,000 numbers. I ended up with the following:

For Loop: 9.643182277679443 seconds
List Comprehenion: 8.324370861053467 seconds

Now I still believe that this data is wrong and decided to spin up Jupyter to timeit the code. Below are the results.

List comprehension

For Loop

In conclusion, maybe I am doing this wrong or the code still can be optimized. Or maybe the list comprehension is not as fast as I thought. But it is definitely more elegant than a for loop. I would definitely choose lambdas and list comprehensions over a for loop any day.

Note: The above code is a part implementation of a genetic algorithm. Normally, packages like numpy are used for crunching billions of numbers as the inbuilt functions are more suitable for such implementations.