Takahe / 16th Jan 2023

Takahē 0.7

Today is the 0.7 release of Takahē, and things are really humming along now; this release marks the point where we've built enough moderation and community features to make me happy that I can open up takahe.social to registrations, albeit with a user number cap.

We've also launched a Patreon for Takahē, in a quest to make development and operation of Takahē more sustainable - and work towards start paying some people to help out with the less exciting work like triaging tickets, user support, and moderation of takahe.social. If you want to volunteer directly, that's covered in our Contributing docs.

There's some interesting technical topics I want to dig into today, though - it's been a little while since my last blog post and ActivityPub and friends continue to surprise.

Snowflake IDs

The biggest visible change in 0.7 is that we've switched to Snowflake-style IDs - 63-bit integers with a timestamp offset baked into the start, a random segment in the middle (that we will eventually replace with sequence data) and a 3-bit type signifier at the end.

It turns out that the Mastodon Client API contract requires that IDs be time-sortable and have a single ID namespace between Posts (Statuses) and Boosts (Reblogs). In Mastodon, Boosts are just a special kind of Status, but in Takahē, we model Boosts along with Likes as a PostInteraction, since this more closely matches ActivityPub, and because it leaves us room to implement more interactions in future (like emoji reactions).

Thus, we were kind of forced into using Snowflake IDs, and more directly, Snowflake IDs that have type information in them so you can sort a list of Posts and PostInteractions sensibly without conflicts.

That said, this change was one I was considering for a while, as using a plain autoincrementing ID for public-facing ID numbers is generally a bad idea in general as it leaks a little information about your server, while Snowflake IDs merely leak the time the object was created (which is part of it anyway).

We also switched to Snowflake IDs for several other models that are user-facing, like Identity, for similar reasons. Internal tables, like FanOut and InboxMessage, remain as autoincrementing PKs however.

Posts From The Past

Another change in 0.7 is that we've switched the Home timeline from being ordered by published date - the date when the post was originally written on its origin server - to our own created date, which is the date when Takahē first ingested the post.

This may seem somewhat counterintuitive at first, but the nature of ActivityPub means that posts can take a while to arrive. Someone with 10,000 followers on a slightly underpowered Mastodon server might cause enough backlog in the task queue that it takes 10 or 20 minutes for a copy of the post to make it to our inbox - and if a server is offline for a bit and comes back, it can be even longer.

If you have a timeline that moves relatively fast, inserting a post written 20 minutes ago in the timeline at the "20 minutes ago" spot might mean it never even appears to the user - we generally want to make sure the top of the timeline is "things you've not seen yet". That's why we switched it to created (received) ordering instead, which incidentally matches Mastodon's behaviour.

We did make a change to our timeline builder so it doesn't add things to the Home timeline if they were received more than a day after they were published - we often receive new posts this late, either via boosts of old posts, replies to old posts (we have to go pull reply parents), or people putting a post URL into the search feature.

We'll leave things like this for now and see if there's any more tweaks to be made; one thing that comes to mind is that we probably should not add a post to your timeline if it's before when you followed the person, as right now there's potentially a window where their day-old posts could appear on your timeline right after following them.

Trimming

Stator, our reconciliation loop worker system, has got a few improvements, the most notable one being the ability for us to say "delete this item after a certain duration in a state".

This means we can now purge old InboxMessages, FanOuts, and more automatically once they're done processing; they stay around for a day or so in case they're needed for debugging and then self-delete.

The next adventure in automatic trimming will be posts and identities; ideally, we don't want to keep remote posts around forever, nor do we want to have a copy of every single identity on the Fediverse (the RSA keypairs chew up quite a bit of table space).

This is a bit harder of a problem; Mastodon chose to limit their home timeline to only 400 items, but we chose to make ours more indefinite, so there's no obvious cutoff point built into the design. My current plan is to have a configurable time horizon, defaulting to 30 days, after which all remote posts will be trimmed out and identities with no posts remaining will also be removed.

This is related to the whole conversation happening right now about Fediverse scraping and archiving, but we probably need to work on some extensions for that so the length of time remote servers can store posts, and what they can do with them, is somewhat dictated.

Of course, the nature of an open protocol is that a bad actor can just ignore all of that and archive things anyway - I think we need to be very clear to Fediverse users that everything they say is somewhat public. We can still try to make it as ephemeral as possible, though, when desired.

Client APIs

0.7 also brings a decent set of improvements to our Mastodon Client API support, and we've taken on Elk as one of our primary testing targets for it - not only because it's a lovely client, but also because the developers have been nice to me and my quest to figure out incompatabilities!

The API for Mastodon is essentially just its internal models translated out via a Rails API framework, so it's not super easy to adapt to in Django, but we're getting there. I've been using django-ninja so far, since it has a lot of the FastAPI goodness with more Django in it, but it has a few niggles about the way it does parameters and routing that mean I'll probably end up writing my own version at some point (which has been an inevitability for years, really - this just provides a concrete test-bed to do it on).

The Mastodon API also has a wonderful feature where you can submit to most POST and PATCH endpoints using urlencoded data, multipart data, or JSON bodies, and Django has been especially unhappy about unioning all those together. If you want to see some bad code, go take a gander at credentials_update in the api/views/accounts.py file.

We're still missing a few endpoints - the Conversations/DM one being the most obvious - but it's now very useable for daily usage and it's been nice to get notifications for mentions of @takahe on my phone.

Moderation Features

Finally, let's talk about moderation features. My plan was always to wait on opening up takahe.social until we had what I called the "core moderation features" in, and we finally have them:

With these, I feel relatively comfortable growing a community, and I hope it will be a good start for others who want to use Takahē too. There's still more to add - IP and email address blocks are definitely on my list, I'd like an auto-report system for certain words or links, and we probably need support for a CAPTCHA on the signup page - but it's a very good start.

I'm still strongly of the opinion that the challenge of building a sustainable Fediverse is one of making the moderation and communities scale, rather than the underlying protocol or technology, and I think there's still a lot to be done here even once we pull level with Mastodon's current level of moderation tooling.

Next Steps

So, what's next from here? The first thing is handling a few more content types; we currently can receive polls and votes and sort of show them, but I'd like to shore this up some more. We also now show video inbound, but I want to let our users upload it too, which means doing background processing of it.

After that, there is a request for 2FA, which I do want to get in as it's a pretty important security feature, as well as other reputation/visibility features like verified profile fields and the ability to manually approve followers.

I think the most notable change we'll have for 0.8, though, is really taking advantage of the Domains system. While right now you can have multiple domains hosted and have people choose their suffix, there's a lot more potential here, including:

Given my initial goal was to allow people to have very separate-feeling domains while sharing the same underlying infrastructure, I think these will all work well towards that. One of my initial plans for Takahē was a Geocities-style home page per domain - essentially personal websites with ActivityPub as the underlying mechanism - and while I think we're in a bit of a different place right now, I still like some of those ideas.

If you're interested in helping out with Takahē, please read our Contributing guide, come hang out with us on Discord, send some donations our way on Patreon, or maybe just sign up for an account on takahe.social (if we didn't hit the signup cap yet!) and take it for a spin!