Summary: Building a distributed social network

Posted on July 10, 2014

This post, requested by Ben Werdmuller, pulls together a number of earlier posts in order to better document the federated, cross platform, friend/follow and signon mechanism stuff I’ve been hacking on recently. It’ll summarise the posts together with my latest thoughts, although I do encourage you to read the originals as well, since there’s a fair amount of detail there.

Federated/distributed social networking is something I (and many other people) have been kicking around for a little while. When working on Elgg, I was involved in a bunch of conversations where we explored getting the various Elgg sites to talk to each other, but it never really got anywhere at the time.

Times move on, and now I think we have a chance to really get somewhere; kicking about the Known code has given me a nice experimental platform to play with and there are now some distributed social tools and protocols that are seeing wide adoption (PuSH, MF2 etc), which is going to be very helpful.

Post Snowden of course, it is now clear that target dispersal, combined with widespread encryption, is required to keep our private lives safe from being spied on. Getting our everyday social interactions out of a centralised data-mining facility is now a basic requirement to safeguard our essential liberties.

Initial requirements

Going into this then, I wanted to start building the parts of a distributed social network, and I wanted to set some loose guidelines of what I’d like to see.

Distributed: There should be no central server anywhere in the ecosystem. Ideally transactions should occur peer to peer between nodes, rather than be orchestrated by a central body.
Cross platform: I don’t want to mandate the use of one specific platform. You can’t call yourself a distributed/federated social network if you can only federate between nodes running the same software! That’s a monoculture, and we know those are bad.
Simple, open, protocols: I don’t want to spend days building this, and if necessary I want to be able to test using the command line and CURL.
URLs, from a UX standpoint, are a bad way to identify people (lessons learnt from OpenID). I may need to reference user profiles by URL, but every time you force someone to type one in, God kills a kitten.

Friending and profile discovery

Original posts here and here.

The first step towards building a social network of any kind is to have the ability to add your friends to your network, and a distributed network is no different.

Here, and in my reference implementations on Known and Elgg, I am adopting the uni-directional “follow” idea of friendship (like Twitter follow) rather than the omni-directional transactional Facebook model, since this was the minimum I needed to make this work, and in my mind at least, better fitted how “friending” works in the real world.

So then, friending works by having an endpoint on your site which is passed the URL you want to add as a friend. To make this easy, and to avoid typing URLs, both my reference Known and Elgg implementations contain a bookmarklet which you can add to your browser button bar.

Alice visits Bob’s website or profile and clicks on the button. Alice’s site then retrieves Bob’s site and parses it for whatever user details can be found on the page – looking for name, profile picture and the URL of their profile. This is made possible through the use of Microformats, especially MF2.

Microformats are simple bits of markup that are invisible to someone who just looks at a webpage, but which allows a computer to understand the meaning of things on a page, for example, to understand that a certain bit of text is a person’s name, or that one link is a link to their profile picture and another link is their profile url. Additionally, since this is just text on a page, there is no requirement for that page to be “special” in any way, i.e. it could just be a static page, there is no requirement for special headers or the page to be the output of a script.

Here is an example of how a user may be marked up:


    
    Marcus Povey aka mapkyca
    marcus@marcus-povey.co.uk
    My profile.

This markup can then be easily processed using one of the many libraries out there; if you’re using PHP I highly recommend Barnaby Walters PHP-MF2 library. In the above example I create a block, that I say identifies as a person (h-card), then details their photo, full name, email address and a url relating to them. This is probably enough information to be getting on with, but you can of course extract more profile/user information, if the markup is there.

Since a given page may contain multiple marked up people (especially if Alice clicks the “add friend” button while on a news feed), my reference implementations present a list of users which may be added, after first removing any duplicates (based on the URL of their profile), and you are also given the opportunity to fill in or amend any scraped information. If more than one URL is given for an entry, you should reconcile this by some mechanism in some way – I just render this as a dropdown in order to give Alice the choice of Bob’s primary profile, but I’m sure there’s a cleverer way.

Once Alice is happy, she can add Bob as a friend, and her site can do any post friending stuff – subscribing to Bob’s PuSH endpoint (if one is specified), or generating access credentials for Bob.

So, in summary, distributed friending works like this:

Alice sends Bob’s page URL to her magic friending endpoint (using a browser bookmarklet)
Alice’s site examines the URL for MF2 marked up h-card entries
Alice is presented with a unique list of h-card entries (where uniqueness is defined on normalised profile URLs).
Alice adds Bob as a friend and triggers any post friend events

Listening to Bob

After Alice adds Bob as a friend, she wants to be told when Bob updates his site. In Known this is accomplished by performing a Pubsubhubbub discovery and subscribe when the “friend” event is triggered (step 4 above).

I won’t go into too much detail as to how a PuSH subscription handshake works, since there’s more complete implementation information in the spec, but in summary, when Alice successfully adds Bob as a friend, her site does the following:

Alice’s site looks for a feed URL on Bob’s site.
Her site retrieves that url and looks in it for a “self” link (which is the canonical permalink for the feed of Bob’s updates).
Then her site looks at this URL again and looks for any declared PuSH hubs to which to subscribe.
If found, her site places a marker that she is subscribing to this hub in memory, then makes a subscription request.
Bob’s hub at some point later will ping Alice’s PuSH endpoint with a success or failure message.
Alice’s PuSH endpoint matches this request with the list of requests she’s made, and if the security tokens match up she can say she is subscribed.

Once subscribed, Alice’s endpoint will be pinged by Bob’s hub every time he makes an update. Alice’s site can then decide what to do with that information; perhaps Alice can use it to maintain a news feed, or send out an email update, whatever.

Friend only/private posts & friend signon

Original posts here and here, here and finally here.

So far, all we’ve really done is create a fancy RSS reader. The next step in creating a truly distributed social network is to have the ability to create posts which only your friends (or a selected subset of your friends) can see, but that the wider internet can not.

On centralised social networks this is trivial, since all users are local and can be identified in one of the many time honoured and straight forward ways, and once identified, content that they’re not permitted access to can be easily hidden. On a distributed social network, this becomes much more difficult.

Fundamentally, it’s a problem of credential exchange.

There are many techniques you could deploy to solve this problem, and most of them are not mutually exclusive. One approach might for Alice’s site to generate a random password and email it to Bob (since we likely have Bob’s email address from his h-card). Personally, I don’t think this is terribly clean.

So, I humbly put forward my thoughts on using OpenPGP keys as an identity mechanism…

OpenPGP signin

My spec for this can be found in these two posts, but in short it works as follows:

Bob generates / adds a pgp key pair to his profile, and publicises his public key in one or more of the following ways *(discussion: Bob’s site needs access to a private key in order to generate signatures, therefore this key material should be kept secure. It may be that it’s best to generate a new keypair for exclusive use by Bob’s site, but I do kind of like tying together Bob’s profile and Bob’s email and identifying both cryptographically with the same key)*
1. Via a HTTP Link header, with a rel of “key”, e.g. Link: https://example.com/bob/pubkey.asc; rel="key"
2. Via a META tag in the HTTP header, e.g. <meta href="https://example.com/bob/pubkey.asc" />
3. Via an anchor tag within the page body of rel=”key”, e.g. <a href="https://example.com/bob/pubkey.asc" rel="key">My Key</a>
4. By pasting the key into the body of the page, and giving it a class of “key”, e.g.
  
  <pre class="key"> -----BEGIN PGP PUBLIC KEY BLOCK----- .... -----END PGP PUBLIC KEY BLOCK----- </pre>
When Alice successfully adds Bob as a friend, her site attempts to extract the public key from his page. If found, her site saves the public key against Bob’s newly created user.

Now, some time later Alice creates a post, and she only wants Bob to be able to see it, so she…

Creates a new post, and adds Bob’s user to the ACL.
Bob’s site is notified by PuSH that Alice’s site has been updated, if Bob has also added Alice as a friend (because it’s a private post, we don’t push content, although conceivably we could encrypt the content with the public key of Bob, and whoever else has access. This bit is a little out of scope at the moment)
Bob visits Alice’s site and identifies himself by clicking on a bookmarklet. This bookmarklet passes the URL of Alice’s site back to Bob’s site which produces a signed request and sends it back.
Alice’s site verifies that the signature is valid, and that it was signed by the key belonging to Bob.

The signature sent by Bob’s site is formed over a message containing:

The current date and time in ISO8601 format, as produced by date('c', time()); in PHP
Bob’s profile URL
The URL of the resource on Alice’s site that Bob is requesting.

Alice’s site should, on receiving this:

Verify the signature is valid and the contents unmodified.
Verify that it was signed by Bob’s key.
Verify that the resource being requested is on Alice’s site.
Check the timestamp is valid and within an acceptable range from now.
Store the timestamp + profile url + resource url together and use it as a nonce to guard against replay attacks.
Check that we’ve not seen this request before by querying the nonce generated in step 5 against the nonce store.

If all the above passes, Alice’s site lets Bob access the restricted resource (and optionally, logs Bob in to the site, allowing him access to any other resources he has access to).

Moving forward

So far I’ve demonstrated this working in a small distributed social network comprised of Known users, Elgg users and WordPress users, as well as PGP signon from the same plus shell scripts and javascript.

Nothing here requires anything particularly special to get up and running, and I’m hopeful that after all this has been revved a few times it’ll be pretty robust.

I’d be interested in your thoughts!

OpenPGP Login spec: Countering replay attacks

Posted on May 30, 2014

Marcus Povey

Yesterday, I wrote a post outlining a draft specification for a possible way to handle login on a distributed social network, together with a reference implementation for Known.

I got some really positive feedback, including someone pointing out a potential replay vulnerability with the protocol as it stands.

I admit I had overlooked replay as an attack vector (oops!), but since peer review is exactly why open standards are more secure than propriatory standards, I thought I’d kick off the discussion now!

The Replay problem

Alice wants to see something that Bob has written, so logs in according to the protocol, however Eve is listening to the exchange and records the login. She then, later, sends the same data back to Bob. Bob sees the signature, sees that it is valid, and then logs Eve in as Alice.

Worse, Eve could send the same packet of data to Clare and David’s site as well, all without needing access to Alice’s key.

Eve needs to be able to intercept Alice’s login session, which, if HTTPS has been deployed is largely impractical, but since this can’t always be counted on I’d like to improve the protocol.

Countermeasures

Largely, countermeasures to a replay attack take the form of creating the signature over something non-repeatable and algorithmically verifiable that Alice can generate and Bob can check.

This may be some sort of algorithmically generated hash, a timestamp, or even just a random number, or record whether we have seen a specific signature before.

My specific implementation has an additional wrinkle in that it has to function over a distributed network, in which each node doesn’t necessarily talk to each other (so we can’t check whether we’ve seen a signature or random number before, since Bob might have seen it, but Clare and David won’t have).

I also want to avoid adding too much complexity, so I’d like to avoid, if I can, doing some sort of multi-stage handshaking; for example hitting an endpoint on the server to obtain a random session id, then signing that and sending it back. Basically, I’d still like to be able to talk to a server using Unix command line tools (gpg) and CURL if I can!

Proposed revision

Currently, when Alice logs in to Bob’s site, Alice signs their profile URL using her key and sends it to Bob. Bob then uses this profile url to verify that Alice is someone with access to Bob’s site/post and then users the signature to verify that it is indeed Alice who’s attempting to log in.

What I propose, is that in addition to forming the signature over Alice’s profile URL, she also forms it over the URL of the page she is trying to see, and also the current time in GMT.

Including the requested URL in the signature allows Bob to verify that the request is for access on his site. If Eve sent this packet to Clare or Dave, it could be easily discarded as invalid.

Adding the timestamp allows Bob to check that this isn’t an old packet being replayed back. Since any implementation should have a small tolerance (perhaps a few minutes either side) to allow for clock drift, using a timestamp allows a small window of attack where Eve could replay the login. To counter this, Bob’s implementation should remember, for a short while, timestamps received for Alice and if the same one is seen twice invalidate all of Alice’s sessions.

Why invalidate all of Alice’s sessions when we see the same timestamp twice, can’t we just assume that the second packet is Eve?”
Sadly not – sophisticated attackers are able to attack from a position physically close to you, so Eve’s login may be received first. In the situation where two identical login requests are received, it is probably safer to treat both as invalid.

Perhaps a sophisticated implementation could delay Alice’s first login for a few seconds (after verifying) to see if any duplicates are received, and only proceed if there are none. This would limit the need to permanently store timestamps against a user’s account, but may be more complex from an implementation point of view.
Why use a timestamp rather than a random number?

I was going back and forth on this… a random number (nonce) would remove the vulnerability window, but it would require Bob’s site to store every number we’ve seen thus far, so I finally opted not to take this approach.

I’d be interested in your thoughts, so please, leave a comment!

Friend only posts and OpenPGP sign-in on a distributed social network

Posted on May 29, 2014

Marcus Povey

Distributed social networks – tools that give you all the social and political benefits of the siloed networks (Google+, Facebook, etc), but without being a massive honey pot for surveillance and data mining, are, in my view, the way we should be heading.

In this model, public posts are easy (that’s just the web), but limiting posts so that they can only be seen by a limited number of your friends is somewhat harder. On Elgg, and similar systems, the standard solution was to make everyone create an account, and profile, on your node. This is, to a large extent, the traditional approach, but basically ends up with you having multiple profiles around the internet (with multiple passwords to remember) which are, crucially, controlled by a third party.

This is a bad thing, and in the post Snowden world, a downright dangerous thing.

I’ve previously discussed a possible approach to providing distributed signon using OpenPGP keys as identity mechanism, and I’ve finally got around to fleshing this out, and building a prototype, now that distributed friending is in Idno/Known core.

Protocol overview

Two user profiles, Alice and Bob
Alice and Bob generate, or otherwise associate, a PGP key pair with their users (for the most part, only public keys are used in this. You only need to store the private key on the server if you’re automating the process of signing in, and if you can store your private key in your browser, there is eventually no need to store private keys on the server).
Alice adds Bob as a friend, and Alice’s site visits Bob’s profile for his public key (see “Public key discovery” below)
Rinse, repeat, for Clare, Dave, Emma, Fred, etc…
Alice writes a post, and only wants Bob to see it. She lists Bob’s profile URL as an approved viewer.
Bob visits the private post, and identifies himself by signing his profile URL with his key, and then POSTing the ascii armoured signature as signature to the post URL.
Alice verifies the signature, and confirms that the key’s fingerprint belongs to Bob’s key, and if so, lets Bob see the post.

Public key discovery

Bob makes his public key available by putting it on his web server, and making it easily discoverable to Alice in one or more of the following ways:

Via a HTTP Link header, with a rel of “key”, e.g. Link: https://example.com/bob/pubkey.asc; rel="key"
Via a META tag in the HTTP header, e.g. <meta href="https://example.com/bob/pubkey.asc" />
Via an anchor tag within the page body of rel=”key”, e.g. <a href="https://example.com/bob/pubkey.asc" rel="key">My Key</a>
By pasting the key into the body of the page, and giving it a class of “key”, e.g.

<pre class="key"> -----BEGIN PGP PUBLIC KEY BLOCK----- .... -----END PGP PUBLIC KEY BLOCK----- </pre>

Identifying Bob

When Bob wants to see the post that Alice has made, he identifies himself by making a POST request to that page, containing a signed URL of his profile. Alice then verifies the profile URL against those she as allowed access, and verifies that the signature is both correct and that the fingerprint belongs to Bob’s key.

Alice may want to store these access details in a session so she can give Bob access to other resources (logging Bob in, in effect), but this is not strictly necessary.

Other methods are available…

So, why not use OAuth, or signed HTTP requests?

Well first of all, all these authentication methods are not mutually exclusive, so there’s no reason why you can’t use multiple techniques.

Second, we’re using very standard tools (GPG, POST requests, etc), and standard formats, bolted together. Meaning, among other things, although this example (and the Idno implementation) uses a website to do the signing in, this isn’t really required. You can sign in and see a private post, just as easily, using curl and gpg from the command line, if you so require.

Finally, this is entirely distributed, and unlike some implementations of Oauth, or even things like IndieAuth, it requires no central authority to vouch for you. Update:Aaron points out that the latest versions of Indieauth don’t require a central authority.

Idno reference implementation

I have written a plugin that implements this protocol for Idno. In addition to the basic spec, the Idno plugin has the following enhancements, which you may want to consider as well.

Firstly, it uses OpenPGP.js to generate the keypair on the client machine. This preserves server entropy, making it better for hosted environments.

Secondly, the plugin provides you with a bookmarklet, which makes signing in to a compatible site nothing more than a button click.

Please kick both the Idno implementation and the overall spec about, and let me know what you think!

» Visit the project on Github...

Marcus Povey

Time, Space, and Plexiglas

Category Archives: Technology

Summary: Building a distributed social network

Initial requirements

Friending and profile discovery

Listening to Bob

Friend only/private posts & friend signon

OpenPGP signin

Moving forward

OpenPGP Login spec: Countering replay attacks

The Replay problem

Countermeasures

Proposed revision