Asynchronous Programing

Table of Contents

DRAFT

1 Motivation

I have stumbled upon several instances while answering on quora/SO, where someone is having trouble in their code due to incorrect understanding of asynchronous code flow. Sometimes the OP simply doesn't understand the mystical ritual of passing functions as parameters (as callbacks), while sometimes there is racing issue which makes the code work on local machines and fail miserably on production server.

In this post, I'll try to explain what asynchronous programing is, how control flows and later we'll try to implement a very crude framework which facilitates asynchronous callbacks.

2 What

Let's start with what we already know and what this post is not about:

2.1 The synchronous way

Here's a contrived example, which opens up a file and checks if first character is a vowel. If it is a vowel then it calculates \(\sqrt[3]{2}\) else it calculates \(\sqrt{2}\). And finally it calculates result 32

 1: f = open('/home/user/rndFile.txt', 'r')
 2: 
 3: first_line = f.readline()
 4: first_char = first_line[0]
 5: 
 6: if first_char in ['a', 'e', 'i', 'o', 'u']:
 7:     result1 = 2 ** (1/3.0)
 8: else:
 9:     result1 = 2 ** (1/2.0)
10: 
11: result2 = 3 ** 2

We can easily guess what code is doing by looking at it. It goes over each line sequencially after executing the previous one. Let's dive deeper in this example and see what is going on behind the scenes.

2.1.1 Anatomy

  • f = open('/home/user/rndFile.txt', 'r'): requests Operating System to lookup a file using specified path in read-only mode.
  • first_line = f.readline(): requests OS again to read the file into memory.
  • first_char = first_line[0]: copies some content of memory location pointed by first_line into memory location pointed by first_char
  • if first_char in ['a', 'e', 'i', 'o', 'u']: requires the CPU to compare several memory location and then choose a branch based on that.
  • result1 = 2 ** (1/3.0) and result1 = 2 ** (1/2.0) involves processing to be done by computation circuitary such as ALU or FPU.
  • result2 = 3 ** 2: same as above.

2.1.2 The infinite space between words1

Reading files, making network calls and doing CPU intensive work is indiscriminately interwoven into each other. But the time required to execute each differs dramatically. Here's a table of approximate timing1 for various operations and computer time translated to arbitary seconds:

1 CPU cycle 0.3 ns 1s
Level 1 cache access 0.9 ns 3s
Level 2 cache access 2.8 ns 9 s
Level 3 cache access 12.9 ns 43 s
Main memory access 120 ns 6 min
Solid-state disk I/O 50-150 μs 2-6 days
Rotational disk I/O 1-10 ms 1-12 months
Internet: SF to NYC 40 ms 4 years
Internet: SF to UK 81 ms 8 years
Internet: SF to Australia 183 ms 19 years

Now with this perspective, let's revisit our earlier example.

  • Line 1 and 3 will consume few hundred μ seconds because of the disk IO.
  • Line 4 and 6 which deals with Main memory will require few hundreds of nano seconds.
  • And finally lines 7, 9 and 11 require few CPU cycles and thus will be done in less than a nano second.

2.1.3 Blocking IO

Since we have seen in the reference frame of processor, IO takes forever to return data; what does the processor do during that period? It simply idles 2.

wolf.gif

Figure 1: what should I kill next?

Note the instruction on line 11. It doesn't depend upon the data read from the file, yet its computation is blocked due to sequencial execution of intructions.

2.2 The Asynchronous way

Instead of waiting for the blocking operation, we can proceed with the computation by specificing what we intend to do after the blocking operation. We can specify this by providing a callback which is executed whenever the operation succeeds.

2.2.1 Callbacks

These callbacks can be provided as anonymous functions (λ) or named functions. Let's rewrite our earlier example pretending open and readline to be asynchronous operations.

 1: def calculateResult1(first_char):
 2:     if first_char in ['a', 'e', 'i', 'o', 'u']:
 3:         result1 = 2 ** (1/3.0)
 4:     else:
 5:         result1 = 2 ** (1/2.0)
 6: 
 7: f = open('/home/user/rndFile.txt',
 8:          'r',
 9:          lambda f: f.readline(lambda first_line: calculateResult1(first_line[0])))
10: 
11: result2 = 3 ** 2
12: 
13: # Note: this is not a valid Python code, 'open' is not asynchronous in Python.
14: # Do Not Try this at home.

Here's the diff between the two examples in English:

  1. We have moved the code which processes the data read from the file into a named function calculateResult1.
  2. Additionally there are 2 more lambdas3 (anonymous functions) which handles reading a line and invoking calculateResult1 using first character of the line.

2.2.2 Async-effect

Now instead of blocking on open call, our execution proceeds to execute line 11 after registering a callback with the open and readline operation.

To conclude our journey in understanding asynchronous programing, let's have example which can be fiddled with. Fireup your JavaScript console4 and paste the following code.

var request = new XMLHttpRequest();
request.open('GET', 'https://anuragpeshne.github.io/essays/index.html', true)

request.onload = function() {
    if (request.status >= 200 && request.status < 400) {
        console.log("Success! " + request.responseText.length + " characters received");
    } else {
        console.log("Server returned error " + request.status);
    }
}

request.onerror = function() {
    console.log("Connection Error");
}

console.log("before making the call");
request.send();
console.log("after making the call");

Here request.send() is indeed asynchronous. It is network version of open call that we saw in previous examples. We have registered two callbacks here, one onload and other onerror.

Note that the execution order of last three lines is indeterministic. If request is successful, order may be:

before making the call
After making the call
Success! xxx characters received

What would happen if url is changed to google.com5, or browser is disconnected from internet. This is something you should definitely try at home.

3 How

And now we can turn our attention to how asynchronous callbacks can be implemented. This can be really useful in getting deep knowledge about asynchronous programming.

3.1 TODO Read more about Event Loop

3.2 TODO Read about multi threading/thread pool in Python

3.3 TODO Non Blocking system calls:

3.3.1 TODO select

3.3.2 TODO poll

3.4 TODO SO question

3.5 TODO Talk

Footnotes:

2

: Of course, the processor doesn't actually idle; The OS scheduler puts the thread in blocked state and schedules another process.

3

: calculateResult1 could have also been a lambda but Python Lambdas are syntactically restricted to a single expressions.

4

: if you are on Chrome or Firefox right click over any whitespace in this page and click 'Inspect [Element]'. Now head over the 'Console' tab.

5

: changing to other origin should result in an error because of the Same origin policy enforced by the browsers.

Date: 2016-02-07

Author: Anurag Peshne

Created: 2017-04-22 Sat 10:04

Emacs 25.1.1 (Org mode 9.0.5)

Validate