chmod 777 @self
- open source, open standards, and the open web
2013-04-18
Impact 2013
2013-03-18
New Domain...
2013-03-14
What's Next for RSS and Atom?
Simple answer: Nothing really.
RSS and Atom are good, stable useful specifications that have served us well over the last decade. The fact that Google Reader is going away is unfortunate but it doesn't really matter very much in the big picture. The way people use the Web is continuing to evolve and that means things that are useful today may become deprecated tomorrow, that doesn't make them any less useful for the application cases they were designed for, it just means those application cases aren't nearly as important as they used to be. It happens.
2013-03-12
A Proposal: Multiple Request-URIs in HTTP/2
Following up on some recent discussion on the httpbis mailing list, I have just published draft-snell-httpbis-mget-00, which details a proposal for allowing Idempotent and Safe HTTP requests to contain multiple, independent request URIs.
That is, for example, right now in HTTP 1.1, for every resource you wish to retrieve from a server, you must send a single request. For the majority of GET operations, these tend to duplicate much of the exact same bits of information (same cookies, same authorization, same user-agent, etc). Sending all of this duplicate information is wasteful and unnecessary. Because we are reworking the way header fields are encoded within HTTP 2.0, we have the opportunity to make other kinds of improvements and optimizations as well. Allowing for multiple request URIs within a single request is an example of such an optimization.
Here's an example request:
| Field | Value |
|---|---|
| :method | GET |
| :path | "/images/foo.png", "/images/bar.png" |
| :host | example.org |
| if-none-match | "etag-for-foo.png", "etag-for-bar.png" |
This is a single HTTP/2 GET request for two separate image resources. This one request is equivalent to two separate, simultaneous HTTP 1.1 GET requests for each of the resources..
GET /images/foo.png HTTP/1.1
Host: example.org
If-None-Match: "etag-for-foo.png", "etag-for-bar.png"
GET /images/bar.png HTTP/1.1
Host: example.org
If-None-Match: "etag-for-foo.png", "etag-for-bar.png"
The responses to these requests would be delivered using separately identified HTTP/2 response streams, each with their own status codes, response headers and data frames so that intermediate caches could simply "Do The Right Thing".
Multiple Request URIs would be permitted for all HTTP methods that are known to be both Idempotent and Safe (GET, HEAD and OPTIONS, for instance). It would also be allowed for the DELETE method. Multiple Request URIs would NOT be allowed for operations like PUT, POST and PATCH, which are not safe or idempotent and just simply do not make any sense to express with multiple request URIs.
The advantages of the Multiple Request proposal ought to be fairly obvious... For one, User-Agents would be able to selectively bundle requests for multiple resources (e.g. all images on a page) into a single outbound request. Two, no new HTTP methods are required so there is no confusion about when to use GET vs. Multi-GET etc. Three, the existing caching model just continues to work. Four, we ought to be able to realize significant resource reduction by eliminating bits sent over the wire.
Note that this is just a proposal at this stage and there is lots of higher priority work that has yet to be done in HTTP/2 but I wanted to at least get this on the table for discussion and review. If you have comments, questions or concerns, you can direct your feedback to the httpbis mailing list or as comments on this post.
2013-02-05
Additional Link Relations is done...
Happy to report that The Additional Link Relations specification has been approved as an RFC. It's in the ed queue now so it'll take some time to get assigned a number. Here's the draft that was approved:
http://tools.ietf.org/html/draft-snell-additional-link-relations-07What does it do? It defines five new fairly useful link relation types.
- rel="about" is used "to refer to a resource that is the subject or topic of the link's context"
- rel="preview" is used "to refer to a resource that serves as a preview of the link's context" (think, pointer to thumbnails or video trailers, etc)
- rel="type" is used to "indicate that the context resource is an instance of the resource identified by the target IRI"
- rel="privacy-policy" is used "to refer to a resource describing the privacy policy associated with the link's context"
- rel="terms-of-service" is used "to refer to a resource describing the Terms of Service associated with the link's context"
Doing Things Better
Here recently I've been spending a lot of time looking at issues of HTTP header encodings and providing input to the IETF HTTP workgroup that is currently putting together the specification for HTTP version 2.0. There is definitely a lot of work to be done still to pull everything together.
While going through various experiments with possible binary encoding of header values, just to collect data on complexity etc, I explored a number of Headers that are in common use today and was struck by the fact that we, as a community, have done a generally crappy job of maintaining good engineering standards.
Take the P3P Header for example... P3P is a specification for the expression of Privacy Policy Preferences produced by the W3C. Many browsers and websites support P3P policies and there are quite a few governments that give significant weight to stated policies. These policies can be expressed either in XML or, when included in the header of an HTTP message, using a so-called "Compact" encoding. Here's an example taken from a real website:
CP="CAO DSP LAW CURa ADMa DEVa TAIa PSAa PSDa IVAa IVDa OUR BUS IND UNI COM NAV INT"
You'll have to refer to the P3P spec for details on what each of the various tokens means. The thing that strikes me about this is that while this string of characters is definitely less verbose than the XML version of a P3P policy, the fact that someone actually labeled this as being "compact" is something that I find quite funny and pathetic. Why is that? Because if you break it down, there are 79 bytes used to express on 18 discreet points of data. Can we do better? Of course we can.
Let's start by mapping out all of P3P's defined three letter tokens. There are a fixed set in the P3P specification. Let's dump them out in an array, in the order they are listed in spec..
['NOI','ALL','CAO','IDC',
'OTI','NON','DSP','COR',
'MON','LAW','NID','CUR',
'ADM','DEV','TAI','PSA',
'PSD','IVA','IVD','CON',
'HIS','TEL','OTP','OUR',
'DEL','SAM','UNR','PUB',
'OTR','NOR','STP','LEG',
'BUS','IND','PHY','ONL',
'UNI','PUR','FIN','COM',
'NAV','INT','DEM','CNT',
'STA','POL','HEA','PRE',
'LOC','GOV','OTC','TST']
Note that in the example, some of the three letter tokens have an additional lower case "a" appended to them. These are "audience flags". The P3P spec defines that there are three basic audience flags, labeled as "a", "i" and "o". Again, refer to the P3P spec for details on what exactly those mean.
Now, let's set a goal that we are going to use just a single byte to represent each token. We can do this by simply taking the index of the token from the table above and adding 1. For instance, NOI = 1, ALL = 2, CAO = 3, etc. Let's then shift that index two bits to the right and assign the audience flags values as well, a = 1, i = 2, o = 3. For example, to represent the token "PSDa", it's essentially ((index('PSD') + 1) << 2) | 1.
If we apply this strategy to the entire example above, then encode it as hex, we end up with:
0C1C283035393D4145494D60848894A0A4A8
We drop from 79 bytes down to a much more compact 39 bytes without losing a single bit of information. We also end up with something that is easier and far more efficient to parse. That's a compact encoding.
There are certainly a broad number of other headers in HTTP messages that are equally wasteful. Date headers, for example, account for a surprisingly large amount of wasted space in HTTP messages. A typical Date header, for instance, usually weighs in at around 29 bytes per instance. Our tests show that we can, alternatively, encode the same information using only 4-6 bytes. Hopefully as we continue through this process of defining HTTP version 2.0 we will be able to optimize much of the encoding so that we're not being so wasteful.
Lucky 13
The new Lucky Thirteen attack on TLS is definitely an intriguing read. There are a number of practical limitations with the approach that make it generally less of a threat to everyday use but the technique used is certainly novel and highlights a number of very specific weaknesses in HTTPS. For those of us involved in the effort to define the next version of http, there is one very important lesson that I take away: we *really* need to the use of proof-of-possession mechanism when it comes to authentication, cookies, etc.
The key characteristic of Lucky Thirteen is that it is a multi-session attack. For properly implemented HTTP stacks, every iteration of the attack requires the establishment of a new TLS session. For the victim, this would tend to just come across like increased network latency. More often than not they would shrug it off with a simple explanation that "The network is just slow today!". What's really happening is that a man in the middle is repeatedly attempting to reverse engineer plain text secrets from an encrypted packet.
While there are a number of suggested solutions, such as the use of stream-ciphers as opposed to block ciphers, it cannot be overlooked that the only reason these kinds of attacks would be attractive is that the data contained in the encrypted package retains it's value over time. For instance, a username and password transmitted generally in the clear as base64-encoded text is a wonderfully tempting target. What's that? It's protected by TLS? That's ok, let me just run it through this lucky thirteen process and bingo, pretty good chance we'll have the password in no time.
If, however, we moved away from session cookies, basic auth, and even static API keys passed along in headers or query strings and moved exclusively to cryptographic proof-of-possession algorithms where each message contained a one-time use token, by the time the attacker is able to reverse engineer the plain text from the cipher, the information retrieved is no longer relevant or useful.
This is a problem that just needs to be fixed. Period. Basic Auth should be deprecated. Use of cookies for persistent authentication should be deprecated. We know all of this already so let's just do it.