2013-07-10

HTTP/2 ... Wait! There's more!

Seems like the first post on HTTP/2 really hit a nerve. Lots of folks on Twitter are responding with many variations of "WTF!", "Holy Crap!" and "Ooo, Pretty!". Almost everyone certainly agrees that HTTP/2 is a whole lot more complicated than HTTP/1.1 but the opinions about whether that complexity is worth it vary dramatically. Personally, my feelings on it are mixed. Let me break down a few more details to explain why I feel that some parts are really great and other parts really aren't.

Let's start the negotiations!

If you recall from my previous post, I said that when a HTTP/2 client establishes a new connection with a server, it needs to send a very specific sequence of prefix octets in order to establish the connection. Currently, that sequence of octets is: 50 52 49 20 2a 20 48 54 54 50 2f 32 2e 30 0d 0a 0d 0a 53 4d 0d 0a 0d 0a. What I did not explain, however, is why those octets were required. The answer, my friends, is pretty awful in my opinion.

In the current HTTP/2 draft there is a section called Starting HTTP/2.0. This section deals with the establishment of a new HTTP/2.0 connection. This section is about 4 pages in length... that fact alone should start raising some red flags.

The way it works is not so simple. You see, HTTP/1.1 is primarily a TCP/IP based protocol, and like all TCP/IP based protocols, communication happens over a TCP port. The default TCP port for HTTP/1.1 is port 80. This is all well established... everyone just knows that TCP port 80 is always used for HTTP traffic. Well, here's the thing, HTTP/2.0 also uses port 80 as the default. In other words, it's now possible that you can have the text based HTTP version 1.1 *OR* the binary-framing based HTTP version 2.0 over the same default port. When establishing the TCP connection, you need to have a way of communicating which version of the protocol you're using.

There are several scenarios here:

  1. The client knows in advance that the server supports HTTP/2 so it opens the connection and immediately starts sending HTTP/2 frames.. (that's essentially what I showed in the example in my previous post)
  2. Or, the client does not know if the server supports HTTP/2, but it knows it supports HTTP/1.1, so the client sends an HTTP/1.1 Upgrade Request hoping that the server will respond correctly and upgrade the connection...
  3. OR, the client decides it's going to use TLS over the TCP connection, in which case it will use TLS Application Layer Protocol Negotiation to determine whether or not HTTP/2 is supported by the client.

Still with me so far? If so, good job. What this all means is that the current HTTP/2 specification defines 3 separate ways of determining whether or not you're able to use HTTP/2 on any given TCP connection, depending on whatever your prior knowledge of the servers capabilities are. (oh, and it's not really the capabilities of the origin server you're worried about, it's the capabilities of whatever the next hop in the network path... because you might be talking to an HTTP caching proxy, reverse proxy, router, or whatever, but let's not get ahead of ourselves).

So what's with this special sequence of octets that MUST be sent by the client at the start of every HTTP/2.0 connection? Well, since HTTP/2 traffic is carried over port 80 (or port 443 if TLS is being used) and because there is so much existing infrastructure out there running HTTP/1.1 on port 80, this magic sequence of octets is designed to force most (not all) HTTP/1.1 only intermediaries to respond with an error if the HTTP/2 sequence is not understood.

Let's look at the octet sequence to understand how this works. The sequence again is: 50 52 49 20 2a 20 48 54 54 50 2f 32 2e 30 0d 0a 0d 0a 53 4d 0d 0a 0d 0a. If we convert that to ASCII, we get "PRI * HTTP/2.0\r\n\r\nSM\r\n\r\n"... in other words, the sequence is a not-so-cleverly-disguised tongue-in-cheek malformed HTTP/1.1 style request that is intended to only be understood by HTTP/2 implementations. Limited testing has shown that a lot of servers seem to respond with an error if this sequence is sent... so if the server accepts the sequence without throwing a fit, we assume it's safe to go ahead and start sending HTTP/2 frames. It's generally the equivalent to a blind man honking the horn as he drives through a busy intersection during rush hour.

Now, here's the thing: I *HAVE* to send that sequence of octets even if I already know beyond a shadow of any doubt that the endpoint I'm connecting to already supports HTTP/2. Which, in my opinion, is just plain silly.

But it gets worse.

Suppose I connect to a server that I know supports HTTP/1.1 and I want to use HTTP/2 instead. I would send it an upgrade request which looks something like:


  GET /default.htm HTTP/1.1
  Host: server.example.com
  Connection: Upgrade, HTTP2-Settings
  Upgrade: HTTP/2.0
  HTTP2-Settings: 

If the server understands this upgrade and speaks fluent HTTP/2, it would respond with something like:


  HTTP/1.1 101 Switching Protocols
  Connection: Upgrade
  Upgrade: HTTP/2.0

  [ HTTP/2.0 connection ...

After which it would begin answering the GET request for /default.htm as if it had received a HTTP/2 formatted GET request.

Oh, and after the server responds with it's 101 Switching Protocols, the client would STILL have to send the magical sequence of octets along with another SETTINGS frame to the server... despite having already established that HTTP/2 is being used and despite having already sent SETTINGS to the server using the newly created HTTP2-Settings request header.

But wait! There's more! If I do decide to use TLS for my connection rather than plain ole HTTP/2 over TCP/IP, I would use ALPN to negotiate the use of HTTP/2 instead of the Upgrade mechanism above. And once HTTP/2 has been negotiated... and both sides are well aware that HTTP/2 is being used, I am still required to send that silly magic sequence of octets.

And all this is just to establish the HTTP/2 connection in the first place! Why is it this complicated? The only reason is because it was decided that HTTP/2 absolutely must use the same default ports as HTTP/1.1.... which, honestly, does not make any real sense to me. What would be easier? (1) Defining new default TCP/IP ports for HTTP/2 and HTTP/2 over TLS. (2) Creating a new URL scheme http2 and https2 and (3) Using DNS records to aid discovery. It shouldn't be any more complicated than that.

The State of Header Compression

In my previous post (and in other postings on this blog) I talked a little bit about header compression in HTTP/2. Let's dig into a bit more about how that works according to the current HTTP/2 implementers draft.

First, imagine that for every HTTP/2 connection, each endpoint maintains two pieces of state for each direction: A Header Table and a Reference Set. The Outbound Header Table and Outbound Reference Set are used to maintain the state of Outgoing header fields (for the client, these are request headers). The Inbound Header Table and Inbound Reference Set are used to maintain the state of Incoming header fields (e.g. response headers). So, for each side of the connection, we have:


 CLIENT                              SERVER

 Outbound                            Inbound
   Header Table   =================>   Header Table
   Reference Set                       Reference Set
   
 Inbound                             Outbound
   Header Table   <=================   Header Table
   Reference Set                       Reference Set

When a client sends an HTTP request to the server, the set of headers are processed and encoded into a set of instructions detailing how the Header Table and the Reference set are to modified by both endpoints. Each endpoint establishes a maximum size for it's Inbound Header Table, the corresponding Outbound encoder MUST NOT exceed that limit. This gives the receiving endpoint a degree of control over how much state the sending endpoint is allowed to ask the receiver to hold on to.

The Header Table contains (name,value) pairs with an associated integer index. As new (name,value) pairs are added to the header table, the size of the table is checked, and if the size limit is exceeded, the least recently written (name,value) pairs are removed until the size falls back within the limits. Items can be added to the table either at the end (called "Incremental Indexing") or replaced by index (called "Substitution Indexing"). There are some additional specific rules for how the index is managed that I won't get into here. The main thing to know is that while the receiving endpoint gets to decide how much data to store in the Header Table, the sending endpoint decides what (name,value) pairs will be stored and when. The receiving endpoint is required to do whatever the sending endpoint asks in order to maintain proper connection state.

So what about this Reference Set thing? Well, the Reference Set captures the currently active set of header fields. For those of you familiar with HTTP/1.1, you may be asking yourself: What in the flipping hell does that mean? Well, allow me to explain.

In HTTP/1.1, request header fields are highly repetitive and wasteful of bytes. The exact same data is sent over and over and over and over again. Think of things like User-Agent strings or Cookies. These things are large and repetitive and you have to send them with every single request, regardless of whether or not the values change. This is called "Statelessness".

Within HTTP/2, the Reference Set idea is meant to eliminate this kind of waste. When you establish a new connection, the Reference Set is empty. Then you send an initial set of header fields to the server. All of those headers are added to the Reference Set and to the Header Table (I'm simplifying it a bit, the headers might not be added to the reference set or the header table, but that's jumping ahead). Then, later on over the same connection, you send another request that has a different set of headers. Using the Reference Set, all you actually send over the wire is a set of instructions that tells the server which items in the current Reference Set to drop and which ones to add. For instance, the on the wire serialization would say something like, "Turn off header #1, Add header #2, and Add this new header 'foo=bar'". In other words, you'd only have to send headers once per connection unless (a) the value changes or (b) the (name,value) gets dropped out of the header table due to size constraints. Since each endpoint maintains a synchronized view of the Reference Set and Header Table, all the client needs to do is make sure the instructions it sends to the server are correct.

So, to summarize: client maintains a header table and reference set, takes a set of header fields and generates a list of instructions on how to modify the header table and reference set, sends those instructions to the server, which uses them to modify it's own view of the header table and reference set, then uses the reference set to reconstruct the currently active set of headers. Sound easy enough?

Oh, and so it's clear, that process is happening in both directions over a single TCP connection (there are two synchronized pairs of Header Tables and Reference Sets being handled).

There are a few nuances to this that are worth mentioning. When working with the Header Table, the sender can choose one of four possible ways of sending any given header field, depending on (a) whether the header is already in the header table, (b) whether the header is in the Reference Set, (c) Whether the sender wants the header to be in the header table, (d) How the sender wishes to manage the header table and (e) What constraints the receiving end has placed on the size of the header table.

The four possible types of encodings are:

  • Indexed -- which means, the header is already in the header table and can be identified using the header table integer index. If the header is already in the Reference Set, sending an Indexed representation will switch it off. If the header is not already in the Reference Set, this will switch it on.
  • Literal without Indexing -- which means, the header (name,value) likely is not already in the header table and shouldn't be added now. Just use the header for this one request and don't add it to the Reference Set.
  • Literal with Incremental Indexing -- which means, the header (name,value) likely is not already in the header table and should be added to the end. Once it's added, add it to the Reference Set.
  • Literal with Substitution Indexing -- which means, I want to replace an existing entry in the header table with this new (name,value) pair. Once replaced, add it to the Reference Set.

Now, it's possible that with each of the Literal * representations, a header with the same name is already contained in the Header Table, so the encoding syntax gives us the option of either encoding the name directly, or referencing the index position of that already indexed name. So, really then, there are seven different ways of representing a header:

  • Indexed
  • Literal without Indexing and Literal Name
  • Literal without Indexing and Indexed Name
  • Literal with Incremental Indexing and Literal Name
  • Literal with Incremental Indexing and Indexed Name
  • Literal with Substitution Indexing and Literal Name
  • Literal with Substitution Indexing and Indexed Name

Ok, deep breath... hope you're still with me... Let's continue

Given a set of request headers, then, when I encode the HTTP request, I have to make some choices.

  • Do I want this header field indexed in the compression state? Yes or No.
    • If no, encode it as a Literal without Indexing. There are many great reasons to choose this option. For instance, it's probably not a good idea to add Authorization header fields to the stored compression state, especially if there are passwords in the value. That's just asking for trouble down the road. Another good reason is: what if the value changes frequently? Date header fields are a good example.
    • If yes, move to the next step...
  • Is the header field already included in the header table? Yes or No.
    • If yes, use an Indexed representation.
    • If no, move to the next step...
  • Should the header field be added at the end of the header table or should it replace an existing one?
    • If added at the end, use Literal with Incremental Indexing
    • If replaced, use Literal with Substitution Indexing

The next thing to consider is the fact that the Header Table is pre-populated with a number of common header field names and a handful of values. These are intended to provide even greater optimization of common cases. Unfortunately, however, there are two separate header tables for Request and Response fields. Hopefully that will change soon.

All in all, the stateful header compression mechanism does provide significant savings in the number of bytes sent across the wire (as an order of magnitude)... and there's been a lot of great work done to implement and improve the scheme over the past six months. It does work, it does save bytes, but it comes at a cost in terms of implementation complexity and the introduction of connection-persistent state that existing HTTP infrastructure just may not be able to cope with currently.

I've been spending the past year attempting to come up with an alternative approach that uses efficient binary encodings of header values to reduce the number of bytes sent. For instance, in HTTP/1.1 Date fields consume 29 bytes of data. We can encode the same information in as few as 6-8 bytes. Numeric headers such as Content-Length and response status are currently encoded as sequences of ASCII characters in HTTP/1.1. Using a variable width binary encoding, we can encode the same values using significantly fewer bytes. I call this approach "Binary Optimized Header Encoding" or "BOHE" (pronounced "bow"). Over the past few months I've been attempting to find a balance between using Bohe and a less complicated stateful header compression mechanism. A current description of that work can be found here. Truth be told, however, I'd be much happier ditching stateful header compression completely in favor of just passing around literal binary-optimized encodings around. Yeah, it means sending a few more bytes over the wire but that's OK in my book.

Don't get me wrong, it's not all bad!

Many people who have responded to my initial post have commented to the effect that switching to a binary protocol is a bad thing. I disagree with that sentiment. While there are definitely things I don't like in HTTP/2 currently, I think the binary framing model is fantastic, and the ability to multiplex multiple requests is a huge advantage. I just think it needs to be significantly less complicated.

HTTP/2 traffic ought to be done over a separate dedicated TCP/IP port and not port 80. Doing so eliminates all the handshake complexity.

HTTP/2 ought not to have stateful header compression.

Server Push ought to be moved out of the core specification and defined as an optional protocol extension... later, once we're done with the core stuff.

Flow control and priority ought to be further simplified and better defined (right now you can have both connection level flow control AND stream level flow control, and flow control only applies to DATA frames, so a creative implementations can easily bypass it by creating new non-flow controlled frame types). Priority values, on the other hand, are assigned using a 31-bit value space... which seems insanely broad to me.

The extensibility model needs to be strictly defined. In other words, the spec either needs to do a really good job of explaining how the protocol is extended or needs to rule of extensions completely, otherwise it's just asking for trouble. For instance, in the spec currently there's this notion that some frames are hop-by-hop while some are end-to-end. Unfortunately, it does not say how to determine whether or not extension frames (new frames not defined by the spec) are hop-by-hop or end-to-end. The only thing the spec states is that new frames can be created and registered and that implementations must ignore frames it doesn't understand.

In other words, in my opinion, there's a ton more work that needs to be done to get this protocol right. The current draft is just an "Implementation Draft", which means folks are just starting to get their feet wet with the protocol as currently defined. Things can still change. My hope is that the change will be for the better, but the jury is still out on that.