2012-04-16

On the "profile" link relation...


Erik Wilde just published a new internet draft that proposes a new "profile" link relation... prompting Mark Nottingham to sing it's praises here... Here's what the abstract says...
This specification defines the 'profile' link relation type that allows resource representations to indicate that they are following one or more profiles.  A profile is defined to not alter the semantics of the resource representation itself, but to allow clients to learn about additional semantics (constraints, conventions, extensions) that are associated with the resource representation, in addition to those defined by the media type and possibly other mechanisms. - Erik W.
The draft itself is, unfortunately, quite spare on examples so I'll show one from Mark N.'s post
HTTP/1.1 200 OK
Content-Type: application/json
Link: <http://example.org/profiles/myjsonstuff>; rel="profile"

{"stuff": 0,
 "moar-stuff": 1
}

Essentially, what the profile link is saying is that the bit of JSON contained in the payload of the request conforms to some set of conventions identified by the URI "http://example.org/profiles/myjsonstuff" .. which can be anything really.

The argument that Mark N. gives in favor of this approach is that using the Link header in this way allows us to avoid creating yet another Mime Media Type that identifies the semantics associated with the content. The fear is that as more and more people begin to build applications on top of generic data formats like JSON, the number of media types that folks want to register may explode. 
For better or worse, everyone and his dog is minting “RESTful” APIs. One byproduct of this is the need to identify formats, so they’re going off and creating new media types. Lots of them. Sometimes, for every variation of their syntax (e.g., if a new element is added) some people feel the compulsion to identify it with a media type. This quickly gets unwieldy, obviously, and registering that many media types won’t do. - Mark N.
That all said, however, while I have to say that I am generally in favor of allowing developers to specifically identify the conventions and specifications they're following in their requests, I'm not sure that the profile link relation, as defined, is quite the right approach to take.

First of all, I disagree with the notion that lots of MIME Media Types are a bad thing. There may need to be better registration processes and better practices for minting them, but lots of MIME Media Types, by itself, is not necessarily a bad thing. People just need to be smarter about how they're used. Defining a "profile" link does not actually eliminate the problem that Mark is afraid of; it simply defers it to a different extensibility mechanism. 

For instance, let's take a scenario similar to what Mark describes in the quote above. Let's imagine that I have a JSON-based data format. Let's call it OpenSocialData version 1.0 and let's say that it defines a data structure for a name that includes the properties "familyName" and "givenName". There are three approaches I can take to identifying this format in my requests...

A. I can simply use "application/json" as the media type and leave it up to the application to figure out the rest. This is less than ideal, but it does work and lots of people are doing this in practice today. This is not behavior we want to encourage, however because it makes the data payload processing model ambiguous and relies too much on out of band information.
HTTP/1.1 200 OK
Content-Type: application/json
B. I can mint a new media type, e.g. "application/opensocialdata-1.0+json". This is good in that it clearly identifies the semantics of the data payload.
HTTP/1.1 200 OK
Content-Type: application/opensocialdata-1.0+json
C. I can use "application/json" as the media type and add a "profile" link as specified by the new internet-draft. Let's suppose that the profile link target is something like "http://example.org/opensocialdata/1.0". 
HTTP/1.1 200 OK
Content-Type: application/json
Link: <http://example.org/opensocialdata/1.0>; rel="profile"

Ok, so let's iterate on the scenario a bit... and add a new property to our name data structure... let's say, "middleName". The fear, as Mark describes, I would mint yet another media type for my data structure... and indeed, many people would certainly do so. So let's say I iterate the version of my specification to 1.1. .. and I want to communicate to folks that I'm using the new 1.1 specification rather than 1.0.

A. If I went with approach A before, things don't change, which isn't exactly helpful.
HTTP/1.1 200 OK
Content-Type: application/json
B. If I went with approach B before, and I simply create a new media type, then I achieve the goal of notifying users that the new format is in use... 
HTTP/1.1 200 OK
Content-Type: application/opensocialdata-1.1+json

C. If I went with approach C before, I would need to iterate on the identified profile IRI in order to communicate that the new version is being used.
HTTP/1.1 200 OK
Content-Type: application/json
Link: <http://example.org/opensocialdata/1.1>; rel="profile"

The question is: what benefit is option C over option B? Well, yes, the registration process for new media types leaves a lot to be desired -- as someone who has created and attempted to register new media types, I can certainly attest to the pain involved. However, in either case, the proliferation that Mark speaks about is going to happen... we're either creating new media types or we're creating new profile IRI's.  Which is worse? Which is better?

Well, I would argue that despite their pain, using the media type is best because of all the existing infrastructure that exists around media types today. For instance, suppose we went with option C and used a profile link to identify the semantics of the payload. Now let's suppose that I have a client application that implements version 1.0 of the data format. It doesn't support version 1.1. Using the profile link approach, how does the client indicate to the server which version of the data format it wants?

Today, using MIME Media Types, we have the Accept request header...
GET /some/bit/o/data HTTP/1.1
Accept: application/opensocialdata-1.0+json
Later on, if I create a client that implements version 1.1 of the data format, it can easily communicate that fact to the server...
GET /some/bit/o/data HTTP/1.1
Accept: application/opensocialdata-1.1+json
Using profile links, there is no equivalent mechanism defined. This is a problem.

What's more, is that lots of existing software is designed around the use of media types already. Take the Java javax.activation.DataHandler class, for instance. For all it's warts, it has an existing method for accessing the MIME Media Type of the associated data; it does not have a method for accessing an arbitrary collection of profile links associated with it. That would mean that for many applications, the profile information would simply get dropped on the floor.  I can't imagine that would make Joe at Hugecorp very happy at all. 

So while profile links are interesting, I don't think they're quite the solution we're need. Here's what I think should happen:

Mark's quite justified fear is that folks will go on trying to mint new MIME Media Types for every variation of their data format. Add a new field? new Media Type! Wrong solution. A new mime media type SHOULD only be minted when a non-backwards compatible version of the data format is produced.  For backwards compatible versions, an optional media type parameter should be used. For instance, in my previous example, the addition of a new optional property to the name data structure is a backwards compatible change. There's no reason why I couldn't have defined my media type as "application/opensocialdata-1+json" ... note that I dropped the ".0" off the end of the version identifier portion of that... As a best practice, I would state that all point versions (i.e. 1.1, 1.2, 1.3, etc) MUST be backwards compatible. No breaking changes allowed such that I can be reasonably assured that someone processing any resource identified as "application/opensocialdata-1+json" isn't going to break, regardless of the specific point version used. Let's suppose, however, that I need a specific point version... it has to be 1.1 and not 1.0, for instance. Then, simply define the media type with an optional media type parameter, e.g. "application/opensocialdata-1+json; v=1.1". This approach would mean that I would not have to go about registering new media types for every iteration of my data format... only for new major versions (e.g. application/opensocialdata-2+json). 

This approach has the advantage of working with existing infrastructure... for instance, I can easily specify parameters with the Accept header....
GET /some/bit/o/data HTTP/1.1
Accept: application/opensocialdata-1+json;v=1.1
If you really really need the ability to specify a URI identifying a profile associated with the resource... then you can also use an optional media type parameter... e.g. 
GET /some/bit/o/data HTTP/1.1
Accept: application/opensocialdata-1+json;v=1.1;profile="http://.../foo"
Yes, it makes the media type quite a bit more difficult to read, but it works with existing stuff. 

That, however, is not the only argument Mark makes... Link Relations like profile have the added benefit of working within data formats in addition to within the HTTP request... for instance, I could use the "profile" link relation inside an atom:link element:
<atom:link rel="profile" href="http://.../foo" />
The example Mark gives is using the profile link within a JSON structure...
{
  "thing": { "a": 1, "b": 2 },   
  "owner": {
    "_profile": "http://example.net/name", 
    "firstName": "Bob",
    "lastName": "Roberts" 
  }
}
The "_profile" property is a naming convention where the underscore (_) character identifies the property as a link and the "profile" part identifies the link relation. In this example, the idea is that the "_profile" property identifies the semantics to which the containing JSON object comply. 

Now that's a use case I can buy in to, which is why I had previously proposed the "implements" link relation: http://tools.ietf.org/html/draft-snell-additional-link-relations-01.html#section-2

Used within a data format the way Mark suggests in his example, "_implements" would have identical semantics to "_profile" ... specifying a link to a resource that defines the implemented semantics. The key difference is that when rel="implements" is used within the HTTP request, it describes the semantics implemented by the request as a whole and not just the payload.

In other words, if I said...
POST /collection HTTP/1.1
Content-Type: application/atom+xml; type=entry
Link: <http://.../rfc5023>; rel="implements"
The "implements" link is specifying that this POST request conforms to RFC5023 (Atom Publishing Protocol). The "implements" link says nothing about the actual payload of the request... that's handled appropriately by the Content-Type header. Specifically: "implements" identifies the semantics implemented by the thing that contains the "implements" link, which in the above case, is the HTTP request...

If, however, I said...
{
  "thing": { "a": 1, "b": 2 },
  "owner": {
     "_implements": "http://example.net/name",
     "firstName": "Bob",
     "lastName": "Roberts" 
  }
}
What I'm saying is that the containing JSON object conforms to the specification "http://example.net/name" (again, identical semantics to "_profile"). 

So to summarize:

1. Use media types to identify the semantics of the payload... just use those media types more intelligently.
2. Use "implements" to identify the semantics implemented by a containing object