tcp to c10k
Table of Contents
Brief Background
Get's get straight to the point, here's my bucket list for next couple of years: the things which I want to build from scratch:
- A Compiler
- A Web Server
- Something that runs on bare metal; (maybe a super primitive OS)
It's over 4 months since I started with grad school at UF. With the Networking and Concurrent Programming courses this semester, implementing a web server seems the most natural choice. This is my attempt to take on the 2nd item in the list.
Why Clojure
Since this is a great opportunity to learn a new language, I considered Haskell and Clojure to implement this project.I chose Clojure over Haskell1, because:
- Clojure is a Lisp; It's long since I worked through Little Schemer and I was craving to write some Lisp
- Rich Hickey
- Couple of threads mentioning Clojure to be more 'practical'1
As of now, Clojure is turning out to be a great decision.
About this essay
I'm planning to use this essay as a journal to the project httpj. I'll record the achievements and lessons learned in a diary fashion so that this may help someone else to implement a HTTP server and most importantly, help me remember things I learned. 'Adventures of HTTP Server Implementation' would have been more appropriate title for this essay. Just another note, I'm more interested in making a concurrent server than a perfect HTTP specification compliant server. Thus, I'll favor implementations that result in simpler, faster server than complete RFC compliant server. I'm using this board to manage tasks.
Getting Started
Echo Server
The first stage is to make a simple server which accepts TCP connection and simply echos back everything sent to it. This was pretty straight forward, but a good exercise to get acquainted with the Clojure syntax and Java interop.
Here's the commit and similar thing in Java.
Concurrent Connections
The next logical step was to make it accept multiple concurrent TCP connections.
And the Java thing to do would have been to create a new thread to handle the new
connection. Creating thread takes time and resources, and hence thread pool (something
similar to ThreadPoolExecutor
2) to manage and reuse threads to handle the clients
is recommended. We can use Java Interop feature of Clojure to port this to
Clojure.
I went to through the chapter Concurrency of the book 'Clojure for the brave and true' and found out various ways to make the program concurrent. The first and the easiest way is to create a future object. Right now, to get the project started we'll use this construct; but we'll revisit concurrency after we have worked on the protocol implementation to a fair extent.
TODO Persistent Data Structure
HTTP request and response
Right now, our server does nothing interesting other than greeting the world.
Let's change that; let's parse the request and try to figure out what client
is requesting. Since we are trying to make a static server, most of our requests
would be GET
request, so let's start with them3.
A GET
request has a structure similar to4:
GET /html/index.html HTTP/1.1
Host: localhost:8080
User-Agent: curl/7.49.1
Accept: */*
<CRLF>
In general the HTTP request format consists of an initial line followed by several
headers in key: value
format, followed by CRLF
(carriage return \r
and
line feed \n
) and then followed by message body (which may contain data for
POST
method).
Other HTTP requests, such as POST
, and HTTP response have similar format with
some differences in number and type of headers.
If HTTP/1.0 is used, it's easy to know the end of request/response; socket is closed after each transfer, which makes implementing HTTP server and client a bit easy but makes the whole process a lot in efficient5.
Let's serve HTTP!
Now that we know the request and response format, we can start servering HTTP,
and by serve, I literally mean serve HTTP.
We can use nc
6 to manually talk HTTP:
$ nc -lk 8080 # -l make nc *listen* to socket # -k nc keep listening # now point browser to 'http://localhost:8080' # browser blocks, and we can see the request browser made on nc stdout # type in HTTP response directly into nc HTTP/1.1 200 OK Content-Length: 12 Hello World # after the last blank line, the loading gears in browsers should stop and # 'Hello World' should appear on browser!
Parsing GET
request
parse-request
parses the HTTP request and returns a map {:headLine :headers}
where headers
in itself is map. Parsing is straightforward splitting the string
over ": "
.
(defn parse-reqest "parses and returns request obj" [in] (let [inp (line-seq in) head-line (parse-head-line (first inp)) headers (loop [cur-inp (rest inp) list []] (if (or (= (first cur-inp) "") (nil? (first cur-inp))) list (recur (rest cur-inp) (conj list (apply hash-map (str/split (first cur-inp) #": "))))))] {:headLine head-line, :headers headers}))
Most of the code is obvious. The only thing which isn't is loop - recur
construct.
line-seq
returns a lazy sequence of lines from socket. We are recurring over
the list (Lisp style), but instead of doing a function call, we are using this
construct because Java doesn't provide tail call optimization7 and recur
is a
hack to save stacks.
File Server
A subject so complicated, it needs a level 2 heading.
Footnotes:
Haskell is recommended if one wants to learn FP, so I'll build something else to learn it.
TCP Connection establishment is a long process. Hello; (Hello) Acknowledged; (Hello Acknowledged) Acknowledged.
man pages describe nc
as: arbitrary TCP and UDP connections and listens
. It pipes socket to stdin
and stdout