Takahe / 5th Dec 2022

Understanding A Protocol

Yesterday I pushed out the 0.5.0 release of Takahē, and while there's plenty left to do, this release is somewhat of a milestone in its own right, as it essentially marks the point where I've implemented enough of ActivityPub to shift focus.

With the implementation of image posting in this release, there are now only a few things left at a protocol level that I know I'm missing:

Custom emoji (these are custom per-server and a mapping of name-to-image comes with each post)
Reply fan-out to the original author's followers
Pinned posts on profiles (and collections in general)
Shared inbox delivery (to reduce fan-out requests)

My current aim is to get Takahē to a point where a few small communities can run on it (including takahe.social), and while these are nice, they are not critical for that. The reply fan-out is probably most important, but is also the easiest given what we have written already.

Instead, it's now time to shift and focus on stability and efficiency. My general tactic for big new projects like this is an initial "spike" period, where I am more focused on pushing out code with a roughly correct architecture rather than focusing on query efficiency, caching or the like, and to then shift gears into more of a "polish" period.

Takahē is actually pretty useable for me as a daily driver for the @takahe@jointakahe.org account - sure, I find a few bugs here or there, but it's honestly not bad. That means, to me, it's time to shift focus a bit more towards polishing.

The other big missing feature for a community at this point is probably having mobile app support (which I plan to do by implementing a Mastodon-compatible client API) and better moderation features (reporting and user blocking, in addition to the existing server blocking).

So, I'm going to focus on adding those, polishing, and improving efficiency; there's now quite a few other contributors to the project who have been helping out with bugfixes, efficiency, and plenty more, which is helping a great deal.

I'll also be sending out a few invitations to takahe.social to use that as a testbed as the first small community; nothing like dogfooding your own software to see what it needs (as well as asking some existing Mastodon admins for their thoughts, if they are gracious enough to lend me some of their time).

Still, though, getting to this point is quite a big deal - I feel like I've learned a lot about ActivityPub and its related specifications by implementing them. So let's talk about it a little bit.

Fan-Out

ActivityPub is all about "fan-out" - the process of getting posts from their authors to their followers. At the basic level, this means one HTTP request per follower to deliver it to their inbox - but there's some efficiency gains to be made with "shared inboxes", where you can push things on a per-server basis rather than per-user.

Obviously, doing this is noisy and takes a lot of requests, and has to be done as background workers - especially as the server on the other end might be down when you try and send the message over, and you need to retry.

Plus, whenever you reply to someone, that reply is then sent to every one of their followers so that it can appear in reply threads. This means there's an increasing amplification effect as you get more and more followers, and your server spends a lot of its life just sending request and getting requests from other servers.

There's other aspects to fan-out, though; there was an excellent blog post about that last month that outlines the problems with link previews. See, when you post a message to a server's inbox, Mastodon (at least) goes and fetches any image attachments, and tries to generate web previews for any links. If people have mobile clients, some of those will also try to fetch previews. This does not end well for unprepared servers - and for those links, they could just be someone's random blog.

Takahē does not do this prefetching yet - we'll likely never do it for the link previews (but some clients connected to us will, once there's a client API). Post attachments and profile images are a different story - we need to at least proxy those for user privacy, but we can hopefully make it a caching proxy. If users have their timelines open, there's not a big difference between a caching proxy and prefetching for the source server, either.

How do we solve this? Well, bundling some of this data into the original post is one idea; having shared caching proxies split between multiple servers is another potential one as well.

That brings us, though, to the push-pull of scaling that's at the heart of ActivityPub.

Two Axes Of Scaling

In a previous post about ActivityPub and Takahē, I referred to the fact that there are two "axes" of scaling available in the protocol:

Having more people per server/instance
Running more servers/instances

Both of these have their pros and cons, and it's hard to go all in on one of them - having a million people on one server is difficult to scale for that individual server (you have to start building it as its own distributed system), but having a million servers makes the fan-out problem even worse (say hello to massive prefetching loads and shared inboxes being not very useful).

It seems to me like a bit of a separation between domain, moderator, and caching store is needed - Takahē already lets multiple domains be on a single server, but the server moderation and caching are scoped just to that one server.

I do believe that sharing moderation across domains is a very important scaling step; this doesn't have to purely be "multiple domains on the same server", either - I think there's scope for a moderation API where you can have a team of professional (volunteer or paid) moderators look after multiple servers.

Sharing caching and previews is also important, though; if there was just ten or so link preview caches around, and all servers used one of them, then we still avoid centralisation while massively lowering the load on the target of links.

That Transport Layer

I both love that ActivityPub is all over HTTP, and hate it.

On the plus side, it means there's all manner of pre-existing load balancers, gateways, frameworks and more at our disposal. Plus, every programming language on Earth has some way of slinging JSON over HTTP.

On the negative side, it's wildly inefficient. There's a lot of overhead for each individual call, Accept headers have to be bandied around everywhere, and there's a lot of HTTP implementation variation that has to be accounted for.

If, magically, I could change it - would I go to something like SMTP, with its own port and protocol? I'm not entirely sure, to be honest - I do like the ease of entry with HTTP, and it does mean there's a lot of framing and encoding already agreed. Maybe HTTP as a base protocol with an optional TCP alternate for high-traffic servers to talk to each other over.

The one thing I would get rid of, though, is JSON-LD. If you're not aware, ActivityPub is not just JSON - it's JSON-LD, which has schemas, namespaces, expanded and compressed forms, transforms, and all manner of other stuff. You need to transform each message to a canonical form before you parse it!

I get the idea, but I was never an RDF fan (it's just JSON RDF, basically) and it just makes everything so much more complex. A plain JSON specification with known keys would have been better, I think, though I was not there when the spec was written, so I'm sure there's more context I lack.

I do want to stress, though, that while I am not a huge fan of the transport layer, I think the object model is quite decent. If we could get preferredDomain in there along with some proper multi-size image support, I would be even happier than my usual buoyant self.

Difference Is Strength

A virtual Mastodon monopoly is not good for almost anyone, I think - I'm actually quite excited for Tumblr to implement ActivityPub, because it stands a chance of forcing protocol changes and improvements to be discussed, rather than directed almost entirely by one project.

If we can get Takahē to even 5% of active users on the Fediverse, that would be a significant impact, too. I'm not sure we'll get there, but I do at least hope the attempt will also place its own bit of pressure on the protocol in terms of evolving and trying to fix some of the scaling issues we're all sailing directly towards.

How to do that responsibly is another question - I would ideally like to make sure we have a server that is designed to easily handle things like DMCA requests, GDPR requests, and the awful spectre of terrorist content and CSAM. I'm looking into starting a fund to pay for some legal and compliance consultations on this front; it's the sort of work that every admin should not have to do themselves, and I'd love us to have a server designed to handle the requirements easily, and written guides as to how to do it.

Still, for me, the focus right now is on growing Takahē and hopefully fostering some communities under its wing, and that means getting stability, efficiency, and working closely with people who want to use us to run communities. It's also about slowly fostering a set of people who look after it with sensible governance, so I'm not needed as a decision-making leader forever.

I'm not yet focused on people migrating servers from Mastodon - supporting that is eventually on the roadmap, and I've reserved the appropriate URL patterns so actor/object URLs can move over seamlessly, but it's still a lot of work, and we're not ready for that quite yet.

If you're interested in helping out with Takahē, do pop over to our Discord or email me at andrew@aeracode.org and mention what you'd like to help out with - there's a large number of areas we need help with, not just coding!