Threads vs Processes vs Asyncio
In this post I will explain the difference between threads, processes and asyncio and what is the best approach to use when you want to configure your devices.
Concurrency vs Parallelism
Lets first explain the difference between concurrency and parallelism. Concurrency means that an application gives the impression that executes many tasks at the same time, but in reality only one task is executed at a time. However, because the cpu switches back and forth between the tasks very fast, it gives the impression that they are executed at the same time. This happens in your OS if you have a single core cpu. You think that many applications are executed at the same time, but in reality only one executes at a certain time. However, because the cpu switches from one to other in nsecs, you think they are executed at the same time.
Parallelism on the other hand, means that many tasks are executed in parallel, that is at the same time. In order to do that, you must have a multi-core cpu or a multi-cpu system or both. In this case, one app runs on one core and another one runs on the other core at the same time.
Multi-threading in Python
With regard to Python, because of the Python Global Interpreter Lock aka GIL, only one thread is executed at a time, even if you have multi-core cpus. This is done for increased performance. It is possible to remove the GIL, but because you will have to implement locks, there is a performance penalty. Infact Guido has said that it will remove the GIL if somebody can do it, without impacting performace.
So when you a run a multi-thread python application, only one thread is running at a time. However, because the CPU is very fast, if the tasks are not cpu intensive, it gives you the impression that are executed in parallel. Threads have the advantage that are more lightweight than processes, and you can run up to 50-60 threads without problem. If you need to execute cpu bound tasks, then it is recommened to use multi-processing but the recommended number of processes is the number of cpus times cores per cpu.
When you connect to routers and switches, a big amount of time is spent establishing the connection, usually msecs, but this amount is high enough for the CPU to do something else, because the CPU works in nsecs. So using threads in this case is the recommended choice. In addition because the threads are executed on different equipment, you don’t have race condition and you don’t need to implement locks.
There can be a problem however. How long a thread will run, before it is stopped and another is scheduled to run, is determined by your Operating System. You cannot control when one thread stops and another starts. Lets take a look at an example.
from multiprocessing.dummy import Pool as ThreadPool def send_commands(router): print("router"+str(router)+" command1") print("router"+str(router)+" command2") print("router"+str(router)+" command3") def main(): pool=ThreadPool(3) routers=list(range(1,11)) results = pool.map(send_commands, routers) pool.close() pool.join() if __name__=='__main__': main()
In this example I create a pool of 3 threads that will send 3 commands to 10 routers. Lets run it and see the results
router1 command1 router2 command1 router1 command2 router2 command2 router1 command3 router2 command3 router4 command1 router5 command1 router4 command2 router5 command2 router4 command3 router3 command1 router5 command3 router6 command1 router7 command1 router6 command2 router7 command2 router6 command3 router7 command3 router8 command1 router9 command1 router8 command2 router9 command2 router8 command3 router9 command3 router10 command1 router3 command2 router3 command3 router10 command2 router10 command3
You see that the commands are not executed together. This is important if you have commands to be executed together, so if you use netmiko or another library, make sure to execute them in one command.
As I said, it is not possible to predict when one thread will be stopped and another will be run. In order to do that, we have to use another technology which is called colaborative multitasking. This means that it is up to the application to make many tasks run concurrently. There are many libraries that do this, like curio, trio, gevent, twisted, etc. but we will look at asyncio which is a standard library in python3. In order to do this, your functions have to be non blocking. For instance a While true loop will never give way to another function. A function is non blocking if it implements async/wait paradigm. Lets look at an example.
In the following example we have to execute again 3 commands, but this time we create the tasks and execute them with asyncio. To allow only 3 tasks at a time I use a semaphore. The method that takes the longest time is the connect method which in a real environment should be non-blocking. I simulate this using the asyncio.sleep() method.
import asyncio import time import random async def connect_to_router(router): print("router"+str(router)+" connecting") await asyncio.sleep(random.uniform(1, 4)) async def send_commands(router,sem): async with sem: await connect_to_router(router) print("router"+str(router)+" command1") print("router"+str(router)+" command2") print("router"+str(router)+" command3") print("router"+str(router)+" quiting") return "router"+str(router)+" completed" async def main(): sem = asyncio.Semaphore(3) tasks=[asyncio.create_task(send_commands(router,sem)) for router in range(1,11)] for res in asyncio.as_completed(tasks): results=await res print(results) asyncio.run(main())
Lets see the results
router1 connecting router2 connecting router3 connecting router3 command1 router3 command2 router3 command3 router3 quiting router4 connecting router3 completed router2 command1 router2 command2 router2 command3 router2 quiting router5 connecting router2 completed router1 command1 router1 command2 router1 command3 router1 quiting router6 connecting router1 completed router6 command1 router6 command2 router6 command3 router6 quiting router7 connecting router6 completed router5 command1 router5 command2 router5 command3 router5 quiting router8 connecting router5 completed router4 command1 router4 command2 router4 command3 router4 quiting router9 connecting router4 completed router7 command1 router7 command2 router7 command3 router7 quiting router10 connecting router7 completed router8 command1 router8 command2 router8 command3 router8 quiting router8 completed router9 command1 router9 command2 router9 command3 router9 quiting router9 completed router10 command1 router10 command2 router10 command3 router10 quiting router10 completed
What we see is that although the connections are asynchronous, the commands 1,2,3 are all executed together. This is happening because we tell the system when to switch from one function to another.
- Multiprocessing is mainly used when you have cpu bound tasks. You can spawn (number of CPUs) x (cores/cpu) processes. Using multithreading or asyncio in this case you will have zero benefits.
- Multithreading is used when you have network bound or io bound tasks. In our case as network engineers, usually the tasks are network bound, so it makes sense to use multiple threads. A number of 50 is usually sufficient. After a number you will not see any significant gains. The OS decides when to stop one thread and start another. You cannot change this. Also in python only one thread executes at a time.
- Ayncio is used when you have network bound or io bound tasks. It has the advantage that you decide when to stop the execution, but you have to use a library/methods that support the asyncio paradigm.