Python for Network Engineers

Threads vs Processes vs Asyncio

by: George El., November 2020, Reading time: 6 minutes

In this post I will explain the difference between threads, processes and asyncio and what is the best approach to use when you want to configure your devices.

Concurrency vs Parallelism

Lets first explain the difference between concurrency and parallelism. Concurrency means that an application gives the impression that executes many tasks at the same time, but in reality only one task is executed at a time. However, because the cpu switches back and forth between the tasks very fast, it gives the impression that they are executed at the same time. This happens in your OS if you have a single core cpu. You think that many applications are executed at the same time, but in reality only one executes at a certain time. However, because the cpu switches from one to other in nsecs, you think they are executed at the same time.

Parallelism on the other hand, means that many tasks are executed in parallel, that is at the same time. In order to do that, you must have a multi-core cpu or a multi-cpu system or both. In this case, one app runs on one core and another one runs on the other core at the same time.

concurrency

Multi-threading in Python

With regard to Python, because of the Python Global Interpreter Lock aka GIL, only one thread is executed at a time, even if you have multi-core cpus. This is done for increased performance. It is possible to remove the GIL, but because you will have to implement locks, there is a performance penalty. Infact Guido has said that it will remove the GIL if somebody can do it, without impacting performace.

So when you a run a multi-thread python application, only one thread is running at a time. However, because the CPU is very fast, if the tasks are not cpu intensive, it gives you the impression that are executed in parallel. Threads have the advantage that are more lightweight than processes, and you can run up to 50-60 threads without problem. If you need to execute cpu bound tasks, then it is recommened to use multi-processing but the recommended number of processes is the number of cpus times cores per cpu.

When you connect to routers and switches, a big amount of time is spent establishing the connection, usually msecs, but this amount is high enough for the CPU to do something else, because the CPU works in nsecs. So using threads in this case is the recommended choice. In addition because the threads are executed on different equipment, you don’t have race condition and you don’t need to implement locks.

There can be a problem however. How long a thread will run, before it is stopped and another is scheduled to run, is determined by your Operating System. You cannot control when one thread stops and another starts. Lets take a look at an example.

from multiprocessing.dummy import Pool as ThreadPool

def send_commands(router):
    print("router"+str(router)+" command1")
    print("router"+str(router)+" command2")
    print("router"+str(router)+" command3")

def main():
    pool=ThreadPool(3)
    routers=list(range(1,11))
    results = pool.map(send_commands, routers)
    pool.close()
    pool.join()

if __name__=='__main__':
    main()

In this example I create a pool of 3 threads that will send 3 commands to 10 routers. Lets run it and see the results

router1 command1
router2 command1
router1 command2
router2 command2
router1 command3
router2 command3
router4 command1
router5 command1
router4 command2
router5 command2
router4 command3
router3 command1
router5 command3
router6 command1
router7 command1
router6 command2
router7 command2
router6 command3
router7 command3
router8 command1
router9 command1
router8 command2
router9 command2
router8 command3
router9 command3
router10 command1
router3 command2
router3 command3
router10 command2
router10 command3

You see that the commands are not executed together. This is important if you have commands to be executed together, so if you use netmiko or another library, make sure to execute them in one command.

As I said, it is not possible to predict when one thread will be stopped and another will be run. In order to do that, we have to use another technology which is called colaborative multitasking. This means that it is up to the application to make many tasks run concurrently. There are many libraries that do this, like curio, trio, gevent, twisted, etc. but we will look at asyncio which is a standard library in python3. In order to do this, your functions have to be non blocking. For instance a While true loop will never give way to another function. A function is non blocking if it implements async/wait paradigm. Lets look at an example.

In the following example we have to execute again 3 commands, but this time we create the tasks and execute them with asyncio. To allow only 3 tasks at a time I use a semaphore. The method that takes the longest time is the connect method which in a real environment should be non-blocking. I simulate this using the asyncio.sleep() method.

import asyncio
import time
import random

async def connect_to_router(router):
    print("router"+str(router)+" connecting")
    await asyncio.sleep(random.uniform(1, 4))

async def send_commands(router,sem):
    async with sem:
        await connect_to_router(router)
        print("router"+str(router)+" command1")
        print("router"+str(router)+" command2")
        print("router"+str(router)+" command3")
        print("router"+str(router)+" quiting")
        return "router"+str(router)+" completed"


async def main():
    sem = asyncio.Semaphore(3)
    tasks=[asyncio.create_task(send_commands(router,sem)) for router in range(1,11)]

    for res in asyncio.as_completed(tasks):
    	results=await res
    	print(results)

asyncio.run(main())

Lets see the results

router1 connecting
router2 connecting
router3 connecting
router3 command1
router3 command2
router3 command3
router3 quiting
router4 connecting
router3 completed
router2 command1
router2 command2
router2 command3
router2 quiting
router5 connecting
router2 completed
router1 command1
router1 command2
router1 command3
router1 quiting
router6 connecting
router1 completed
router6 command1
router6 command2
router6 command3
router6 quiting
router7 connecting
router6 completed
router5 command1
router5 command2
router5 command3
router5 quiting
router8 connecting
router5 completed
router4 command1
router4 command2
router4 command3
router4 quiting
router9 connecting
router4 completed
router7 command1
router7 command2
router7 command3
router7 quiting
router10 connecting
router7 completed
router8 command1
router8 command2
router8 command3
router8 quiting
router8 completed
router9 command1
router9 command2
router9 command3
router9 quiting
router9 completed
router10 command1
router10 command2
router10 command3
router10 quiting
router10 completed

What we see is that although the connections are asynchronous, the commands 1,2,3 are all executed together. This is happening because we tell the system when to switch from one function to another.

Summary

  • Multiprocessing is mainly used when you have cpu bound tasks. You can spawn (number of CPUs) x (cores/cpu) processes. Using multithreading or asyncio in this case you will have zero benefits.
  • Multithreading is used when you have network bound or io bound tasks. In our case as network engineers, usually the tasks are network bound, so it makes sense to use multiple threads. A number of 50 is usually sufficient. After a number you will not see any significant gains. The OS decides when to stop one thread and start another. You cannot change this. Also in python only one thread executes at a time.
  • Ayncio is used when you have network bound or io bound tasks. It has the advantage that you decide when to stop the execution, but you have to use a library/methods that support the asyncio paradigm.
comments powered by Disqus