tcp to c10k

Table of Contents

Brief Background

Get's get straight to the point, here's my bucket list for next couple of years: the things which I want to build from scratch:

  • A Compiler
  • A Web Server
  • Something that runs on bare metal; (maybe a super primitive OS)

It's over 4 months since I started with grad school at UF. With the Networking and Concurrent Programming courses this semester, implementing a web server seems the most natural choice. This is my attempt to take on the 2nd item in the list.

Why Clojure

Since this is a great opportunity to learn a new language, I considered Haskell and Clojure to implement this project.I chose Clojure over Haskell1, because:

As of now, Clojure is turning out to be a great decision.

About this essay

I'm planning to use this essay as a journal to the project httpj. I'll record the achievements and lessons learned in a diary fashion so that this may help someone else to implement a HTTP server and most importantly, help me remember things I learned. 'Adventures of HTTP Server Implementation' would have been more appropriate title for this essay. Just another note, I'm more interested in making a concurrent server than a perfect HTTP specification compliant server. Thus, I'll favor implementations that result in simpler, faster server than complete RFC compliant server. I'm using this board to manage tasks.

Getting Started

Echo Server

SCHEDULED: <2016-11-02 Wed>

The first stage is to make a simple server which accepts TCP connection and simply echos back everything sent to it. This was pretty straight forward, but a good exercise to get acquainted with the Clojure syntax and Java interop.

Here's the commit and similar thing in Java.

Concurrent Connections

SCHEDULED: <2016-11-05 Sat>

The next logical step was to make it accept multiple concurrent TCP connections. And the Java thing to do would have been to create a new thread to handle the new connection. Creating thread takes time and resources, and hence thread pool (something similar to ThreadPoolExecutor2) to manage and reuse threads to handle the clients is recommended. We can use Java Interop feature of Clojure to port this to Clojure.

I went to through the chapter Concurrency of the book 'Clojure for the brave and true' and found out various ways to make the program concurrent. The first and the easiest way is to create a future object. Right now, to get the project started we'll use this construct; but we'll revisit concurrency after we have worked on the protocol implementation to a fair extent.

TODO Persistent Data Structure

HTTP request and response

Right now, our server does nothing interesting other than greeting the world. Let's change that; let's parse the request and try to figure out what client is requesting. Since we are trying to make a static server, most of our requests would be GET request, so let's start with them3.

A GET request has a structure similar to4:

GET /html/index.html HTTP/1.1
Host: localhost:8080
User-Agent: curl/7.49.1
Accept: */*
<CRLF>

In general the HTTP request format consists of an initial line followed by several headers in key: value format, followed by CRLF (carriage return \r and line feed \n) and then followed by message body (which may contain data for POST method).

Other HTTP requests, such as POST, and HTTP response have similar format with some differences in number and type of headers.

If HTTP/1.0 is used, it's easy to know the end of request/response; socket is closed after each transfer, which makes implementing HTTP server and client a bit easy but makes the whole process a lot in efficient5.

Let's serve HTTP!

Now that we know the request and response format, we can start servering HTTP, and by serve, I literally mean serve HTTP. We can use nc6 to manually talk HTTP:

$ nc -lk 8080
# -l make nc *listen* to socket
# -k nc keep listening

# now point browser to 'http://localhost:8080'
# browser blocks, and we can see the request browser made on nc stdout
# type in HTTP response directly into nc
HTTP/1.1 200 OK
Content-Length: 12

Hello World

# after the last blank line, the loading gears in browsers should stop and
# 'Hello World' should appear on browser!

Parsing GET request

parse-request parses the HTTP request and returns a map {:headLine :headers} where headers in itself is map. Parsing is straightforward splitting the string over ": ".

(defn parse-reqest
"parses and returns request obj"
[in]
(let [inp (line-seq in)
      head-line (parse-head-line (first inp))
      headers (loop [cur-inp (rest inp) list []]
                (if (or (= (first cur-inp) "") (nil? (first cur-inp))) list
                    (recur (rest cur-inp)
                           (conj list (apply hash-map
                                             (str/split (first cur-inp) #": "))))))]
  {:headLine head-line, :headers headers}))

Most of the code is obvious. The only thing which isn't is loop - recur construct. line-seq returns a lazy sequence of lines from socket. We are recurring over the list (Lisp style), but instead of doing a function call, we are using this construct because Java doesn't provide tail call optimization7 and recur is a hack to save stacks.

Generating response

  • HTTP 1.0 v/s HTTP 1.1
    • Content-Length and socket close.
  • Thread first macro, ->

File Server

A subject so complicated, it needs a level 2 heading.

Footnotes:

1

Haskell is recommended if one wants to learn FP, so I'll build something else to learn it.

2

Java Docs for Threadpoolexecutor

3

I'm following along this, brilliant yet very simple HTTP tutorial.

4

RFC 2616 HTTP request format

5

TCP Connection establishment is a long process. Hello; (Hello) Acknowledged; (Hello Acknowledged) Acknowledged.

6

man pages describe nc as: arbitrary TCP and UDP connections and listens. It pipes socket to stdin and stdout

7

Here Brian Goetz mentions they are working to get tail call optimization to JVM (but not soon)

Creative Commons License

Date: 2016-10-30

Author: Anurag Peshne

Emacs 25.2.2 (Org mode 9.1.14)

Validate