Asynchronous Programing
Table of Contents
1 Motivation
I have stumbled upon several instances while answering on quora/SO, where someone is having trouble in their code due to incorrect understanding of asynchronous code flow. Sometimes the OP simply doesn't understand the mystical ritual of passing functions as parameters (as callbacks), while sometimes there is racing issue which makes the code work on local machines and fail miserably on production server.
In this post, I'll try to explain what asynchronous programing is, how control flows and later we'll try to implement a very crude framework which facilitates asynchronous callbacks.
2 What
Let's start with what we already know and what this post is not about:
2.1 The synchronous way
Here's a contrived example, which opens up a file and checks if first character is a vowel. If it is a vowel then it calculates \(\sqrt[3]{2}\) else it calculates \(\sqrt{2}\). And finally it calculates result 32
1: f = open('/home/user/rndFile.txt', 'r') 2: 3: first_line = f.readline() 4: first_char = first_line[0] 5: 6: if first_char in ['a', 'e', 'i', 'o', 'u']: 7: result1 = 2 ** (1/3.0) 8: else: 9: result1 = 2 ** (1/2.0) 10: 11: result2 = 3 ** 2
We can easily guess what code is doing by looking at it. It goes over each line sequencially after executing the previous one. Let's dive deeper in this example and see what is going on behind the scenes.
2.1.1 Anatomy
f = open('/home/user/rndFile.txt', 'r')
: requests Operating System to lookup a file using specified path in read-only mode.first_line = f.readline()
: requests OS again to read the file into memory.first_char = first_line[0]
: copies some content of memory location pointed byfirst_line
into memory location pointed byfirst_char
if first_char in ['a', 'e', 'i', 'o', 'u']:
requires the CPU to compare several memory location and then choose a branch based on that.result1 = 2 ** (1/3.0)
andresult1 = 2 ** (1/2.0)
involves processing to be done by computation circuitary such as ALU or FPU.result2 = 3 ** 2
: same as above.
2.1.2 The infinite space between words1
Reading files, making network calls and doing CPU intensive work is indiscriminately interwoven into each other. But the time required to execute each differs dramatically. Here's a table of approximate timing1 for various operations and computer time translated to arbitary seconds:
1 CPU cycle | 0.3 ns | 1s |
Level 1 cache access | 0.9 ns | 3s |
Level 2 cache access | 2.8 ns | 9 s |
Level 3 cache access | 12.9 ns | 43 s |
Main memory access | 120 ns | 6 min |
Solid-state disk I/O | 50-150 μs | 2-6 days |
Rotational disk I/O | 1-10 ms | 1-12 months |
Internet: SF to NYC | 40 ms | 4 years |
Internet: SF to UK | 81 ms | 8 years |
Internet: SF to Australia | 183 ms | 19 years |
Now with this perspective, let's revisit our earlier example.
- Line 1 and 3 will consume few hundred μ seconds because of the disk IO.
- Line 4 and 6 which deals with Main memory will require few hundreds of nano seconds.
- And finally lines 7, 9 and 11 require few CPU cycles and thus will be done in less than a nano second.
2.1.3 Blocking IO
Since we have seen in the reference frame of processor, IO takes forever to return data; what does the processor do during that period? It simply idles 2.
Figure 1: what should I kill next?
Note the instruction on line 11. It doesn't depend upon the data read from the file, yet its computation is blocked due to sequencial execution of intructions.
2.2 The Asynchronous way
Instead of waiting for the blocking operation, we can proceed with the computation by specificing what we intend to do after the blocking operation. We can specify this by providing a callback which is executed whenever the operation succeeds.
2.2.1 Callbacks
These callbacks can be provided as anonymous functions (λ) or named functions.
Let's rewrite our earlier example pretending open
and readline
to be asynchronous
operations.
1: def calculateResult1(first_char): 2: if first_char in ['a', 'e', 'i', 'o', 'u']: 3: result1 = 2 ** (1/3.0) 4: else: 5: result1 = 2 ** (1/2.0) 6: 7: f = open('/home/user/rndFile.txt', 8: 'r', 9: lambda f: f.readline(lambda first_line: calculateResult1(first_line[0]))) 10: 11: result2 = 3 ** 2 12: 13: # Note: this is not a valid Python code, 'open' is not asynchronous in Python. 14: # Do Not Try this at home.
Here's the diff between the two examples in English:
- We have moved the code which processes the data read from the file into a named function
calculateResult1
. - Additionally there are 2 more lambdas3 (anonymous functions) which handles reading a line and invoking
calculateResult1
using first character of the line.
2.2.2 Async-effect
Now instead of blocking on open
call, our execution proceeds to execute line 11
after registering a callback with the open
and readline
operation.
To conclude our journey in understanding asynchronous programing, let's have example which can be fiddled with. Fireup your JavaScript console4 and paste the following code.
var request = new XMLHttpRequest(); request.open('GET', 'https://anuragpeshne.github.io/essays/index.html', true) request.onload = function() { if (request.status >= 200 && request.status < 400) { console.log("Success! " + request.responseText.length + " characters received"); } else { console.log("Server returned error " + request.status); } } request.onerror = function() { console.log("Connection Error"); } console.log("before making the call"); request.send(); console.log("after making the call");
Here request.send()
is indeed asynchronous. It is network version of open
call that we saw in previous examples. We have registered two callbacks here,
one onload
and other onerror
.
Note that the execution order of last three lines is indeterministic. If request is successful, order may be:
before making the call After making the call Success! xxx characters received
What would happen if url is changed to google.com5, or browser is disconnected from internet. This is something you should definitely try at home.
3 How
And now we can turn our attention to how asynchronous callbacks can be implemented. This can be really useful in getting deep knowledge about asynchronous programming.
3.1 TODO Read more about Event Loop
3.2 TODO Read about multi threading/thread pool in Python
3.3 TODO Non Blocking system calls:
3.3.1 TODO select
3.3.2 TODO poll
3.4 TODO SO question
3.5 TODO Talk
Footnotes:
: Of course, the processor doesn't actually idle; The OS scheduler puts the thread in blocked state and schedules another process.
: calculateResult1
could have also been a lambda but Python Lambdas are
syntactically restricted to a single expressions.
: if you are on Chrome or Firefox right click over any whitespace in this page and click 'Inspect [Element]'. Now head over the 'Console' tab.
: changing to other origin should result in an error because of the Same origin policy enforced by the browsers.