So, here’s a thought that occured rather spontaneously this afternoon.  What if URLs had a codified set of rules, akin to linguistic grammar?  Right now there is a minimal required form, that of protocol://host/<arbitrary-and-optional-stuff>, which is perfectly reasonable in a limited fashion.  What I’m suggesting is that <arbitrary-and-optional-stuff> have a set of rules as well.

Earlier this afternoon, I sent a link to a friend of mine, except it was a link that I had never visited, nor did I have any interest in ever visiting it.  Thus, when he asked me a couple of questions regarding it, I had to pause – at that moment it occurred to me that, from my knowledge of the site in question, I was able to craft the URL through the same basic process by which one crafts any sort of linguistic construct.  That is to say, I knew the grammar employed by the site in question and was thus able to use those rules to build the URL without having any prior knowledge of it’s contents, nor even of it’s actual existence.

In essence you can think of each host, that is the www.somewhere.com portion of a url, and to some extent it’s protocol, ie: http or ftp, as being identifiers of what language the URL is crafted in.  The problem then is that there are hundreds, thousands, maybe even hundreds of thoudsands, of “languages” on the internet, and you have to familiarize yourself with each one before you can start crafting valid URLs in that language.

The advantage of having a unified grammar is that we could, for instance, presume that since The Unofficial Apple Weblog is a blog (obviously!), that it has tags and categories, and authors even.  Now, forgiving my probably inaccurate usage of various parts of speech, let’s assume a grammar like “/adjective/noun/<action>.”  Knowing the grammar, and being familiar with the site, I can almost automatically tell you that if I wanted to know more about one of their authors, say Victor Agreda Jr., I could presumably just head over to http://www.tuaw.com/author/victor-agreda-jr, or that to view the ‘Gaming’ category I could go to http://www.tuaw.com/category/gaming.  Well, except in the case of TUAW, they use ‘blogger’ instead of ‘author.’  But http://www.tuaw.com/category/gaming is prefectly valid.

Now that I’m batting one for two, what could be done to resolve the author-vs-blogger issue?  Well, in linguistics we have synonyms.  Two words which essentially mean the same thing.  If we’re defining a grammatical structure, then why not use synonyms as well?  It wouldn’t be hard to manage a set of synonyms, so that when I say author and the site in question knows blogger, it assumes (based on the context that it’s a weblog, and I’m looking for written works from a specific person) that blogger and author are interchangable.

The <action> component of the URL is where the software really comes through.  After the software knows what it’s performing the action on (with the default action typically being some variation on ‘view’), we can tell it what action needs to be performed.  Back to a hypothetical blog example, we can presume that /post/url-linguistics is a specific post (“URL Linguistics”) and that it has a default action of “view” because, the majority of the time, what we’ll want to do with a post is simply to view it.  But what else can we do to it?  We could edit it.  We could delete it.  Maybe we could move it, or even clone it.  In each case, to perform the action in question, we could say ‘/post/url-linguistics/edit’ or ‘/post/url-linguistics/clone’ and the backend system could, given that it has implemented a specific action, simply comply.

What are some other advantages to this approach?  As I’ve already mentioned, plugin authors would have a known structure for URLs with which to work.  Visitors, of course, can make educated guesses about how to get what they want without having to navigate through portions of a site that don’t interest them.  It also suggests a model for development, grouping actions together by content type, so your framework might include, for instance, a Post class, an Action class, and a set of actions that the Post class can accept that extensions of the Action class which implement a PostAction interface.

Now it becomes trivial to extend the framework by just creating new actions using your existing model, and with the addition of a simple API to enumerate the available adjectives, nouns, etc., the doors open up for cross-platform and 3rd party plugin development.  The plugin author simply architects the plugin to support a given set of fallbacks, and then builds for the best-case scenario.  If the platform the plugin is running on can support a given action, it uses it.  If not, it can gracefully handle that situation by either disabling functionality, or using a fallback to a supported action.  Then, at a later date, an upgrade to the platform might include a new set of actions that existing plugins already support, so once the functionality is in place the plugin will “magically” take advantage of that.

There’s a lot of room for discussion around this model, as I’ve prepared this almost purely as a mental exercise — a sort of thought experiment, if you will — yet I feel that its merrits are sufficient to justify further exploration.  That said, I am sufficiently isolated from the community that any serious developments of standards or RFCs along these lines will not be of my own hand, nor is it likely that I have the resources to bring any meaningful contribution to such an undertaking other than the occasionaly outburst, such as this is.  Maybe one day I’ll have the time and energy to devise a proof-of-concept CMS myself that implements the bulk of my ramblings, but even this will be at the mercy greater powers.  Only time will tell…