I have just uploaded the latest draft of the ODD specification to OpenDD.net, so pop over and take a look.

Since the last release of the draft I’ve done a fair amount of work to simplify the format even further; simplifying terminology, clearing up some inconsistencies and dropping namespaces altogether.

You’ll notice that we still don’t define any terms. As Ben touched on in a recent post, we decided to not confuse the format by trying to tie it to any one application, while keeping it as easy as possibly to actually use. I’ll cover this in more detail a bit later…

So, lets talk about how I’m using ODD to implement full data import and export in the upcoming release of ElggElgg 1.

For those who don’t already know, Elgg is an Open source social networking application engine. The previous version has been downloaded over 100K times, and Import and export was one of the most frequently requested enhancements.

Export

Export was a fairly trivial matter. The new version of Elgg employs a flexible event system, so all I had to do was trigger an “export” event.

This event is passed a GUID – an identifier identifying the thing you are exporting, and elements of the system (and thirdparty plugins) can listen for this event and react accordingly.

The event is essentially asking all parts of the Elgg application – core and plugins – “Tell me all you know about X”. The export listens to the answers and converts it into an ODD document that looks something like this:

<odd version="1.0" generated="Wed, 30 Apr 2008 22:21:55 +0100">


<entity uuid="http://example.com/odd/78/" class="object" subclass="blog" published="Fri, 18 Apr 2008 11:45:50 +0100" />


<metadata uuid="http://example.com/odd/78/attr/owner_uuid/" entity_uuid="http://example.com/odd/78/" name="owner_uuid" published="Fri, 18 Apr 2008 11:45:50 +0100" >http://example.com/odd/77/</metadata>


<metadata uuid="http://example.com/odd/78/attr/title/" entity_uuid="http://example.com/odd/78/" name="title" published="Fri, 18 Apr 2008 11:45:50 +0100" >test</metadata>


<metadata uuid="http://example.com/odd/78/attr/description/" entity_uuid="http://example.com/odd/78/" name="post" published="Fri, 18 Apr 2008 11:45:50 +0100" >First post</metadata>


<metadata uuid="http://example.com/odd/78/metadata/35/" entity_uuid="http://hexample.com/odd/78/" name="tags" type="metadata" owner_uuid="http://example.com/odd/77/" published="Fri, 18 Apr 2008 11:45:50 +0100" >wibble</metadata>


</odd>

Here we see an entity (in this case a blog post), and some details about it (the metadata).

Import

Import is traditionally the more complicated part of the equation. ODD is trivial to parse, each tag is atomic and represents exactly one thing, this is a big advantage from the point of view of anyone implementing a reader for it since it makes the whole thing pretty much stateless.

ODD tags arrive, whether as a file to import or as a live feed, and an event is triggered. This event passes around the tag and essentially asks the question “Does anyone know how to handle this?”.

The stateless nature of ODD of course meaning that you don’t have to process the entire file, making it a trivial matter to implement a reader using a SAX parser.

That just about covers it, I’ll be posting some example code in a few days (workload permitting) so hopefully people can start getting stuck in. If you want to get involved in development, please head over to the ODD group.

A final note: I will be in San-Francisco all next week, so if you are in the bay area and feel like having a chat about ODD or Elgg, then please get in touch!

Data portability is a bit of a hot topic at the moment, and a recent article in the Economist illustrates that this is becoming seen as an issue outside the technical blogging crowd.

So it seems like a good time for me to blog about one of the funky things I’ve been working on recently, the Open Data Definition (ODD).

So, what is ODD?

ODD is an XML based data exchange format which is designed to be simple to implement and use. It consists of a framework and an extension format defining keywords.

The development of ODD fell out of a need for import/export functionality in the new version of Elgg. Import/export was one of the most requested features for previous Elgg incarnations, but be quickly realised that it could be converted into something that had many more uses.

As covered in the Economist article, Data silos don’t cut it anymore. Users want to be able to move their accounts between social networks and have friends on different networks without having thousands of accounts (an issue we looked at solving in a slightly different way with explode).

When looking for a solution we did look at adapting one or more of the existing data portability solutions, but to say that none of them seemed suitable was somewhat of an understatement. Many seemed to fall foul of a problem common… they are just too damn complicated for widespread adoption!

ODD, as mentioned previously, is XML based. When making it I used Elgg 1’s object model as a guide and reduced things down to their lowest common denominators, therefore we have three main components – Entities, Metadata and Relationships.

These components are atomic, and the format itself has virtually no nesting. This is slightly unconventional, but it makes the format easy to parse, supports partial import/export and makes it easy to extend the format to support the live pinging of updates.

This gives us:

Entity

Entities are “things”, for example a web log post or a user account. The entity has a “class” attribute to specify what type of entity it is and can be subclassed.

All entities are identified by a UUID, this is important and I’ll get on to that later.

Metadata

Metadata provides information about an entity as a name/value pair. Optionally, you can give a type to specify the type of metadata – e.g. attribute or annotation.

Relationship

As the name suggests, a relationship defines the relationship between two entities. To do this they use a “verb” (as defined in the extension format mentioned above). Doing it this way permits setting and un-setting operations – for example, friend & unfriend, join & leave.

The UUID

An important concept in all this is the UUID.

The UUID is a URL which must point to an ODD representation of the thing it represents. I think this is quite a powerful concept since it permits truly distributed networks to be build.

An example

To give you an idea of how this might look, here’s an example ODD document.

<odd>
<header version=”1.0” extension=”SN:1.0” generated=”....” />

<entity uuid=”http://foo.com/export/34/” class=”object” subclass=”blog” />

<metadata entity_uuid=”http://foo.com/export/34/” uuid=”http://foo.com/export/34/metadata/45” name=”owner”>http://foo.com/export/24</metadata>

<metadata entity_uuid=”http://foo.com/export/34/” uuid=”http://foo.com/export/34/metadata/49” name=”content”>I saw Sindy at the mall today, she thinks she's all that, but she's not all that... I'm going to cry now and listen to Emo music.</metadata>

<metadata entity_uuid=”http://foo.com/export/34/” uuid=”http://foo.com/export/34/metadata/4” name=”tag”>angst</metadata>

<metadata entity_uuid=”http://foo.com/export/34/” uuid=”http://foo.com/export/34/metadata/5” name=”tag”>emo</metadata>
</odd>

Pretty simple, I hope you’ll agree!

The only extra thing to note is the header element, which simply gives some version information about the framework and extension being used.

I will be giving a brief presentation about this at the Oxford Geek Night on the 22nd and will be answering questions after the event (and again in San-francisco on May 7th), feel free to come along!

In the meantime, have a look at http://www.opendd.net

Elgg 1 introduces some important changes under the hood, perhaps the most important of these has got to be the new object model.

In a nutshell, Elgg 1’s object model is a simplification of what we’ve done with Elgg 0.x (from now on called Elgg classic), reducing things to their essential components.

In Elgg 1 you have at the highest level three things:

  • Entities: Things in the system; users, blog posts, etc…
  • Metadata: Information about an Entity (called Extenders in Elgg 1).
  • Relationships: Define how one object is related to another.

Conceptually this is very clean but also very flexible. Because entities, relationships and metadata have a consistent interface we can do some very cute things.

One thing in particular – arbitrary mixed type feeds – which were pretty much impossible in Elgg classic now become very easy indeed.

Don’t know what I mean? Well, suppose you were looking for Blog posts tagged with “Firefly”, in Elgg classic you could have these listed out in a feed.

Fine.

But suppose you wanted to show videos or music tagged with “Firefly” on the same page? What if you want to write a plugin that displays flash games or store files on S3 and want them to show up in the same stream?

All very easy now. Cute eh?

The above is a rather simplistic example of what is possible. I have hinted at some other applications a few posts back…