Hacker News new | past | comments | ask | show | jobs | submit login
Takahē: An efficient ActivityPub Server for small installs with multiple domains (jointakahe.org)
233 points by todsacerdoti on Nov 24, 2022 | hide | past | favorite | 114 comments



Here are three specifically interesting things about Takahē:

1. The "multiple domains" feature. I'm running my own Mastodon instance right now purely so I can have my simonwillison.net domain as my identifier there (and protect myself from losing my identifier if the server I am using shuts down). This feels pretty wasteful! I'd much rather be able to point my domain at a Takahē instance shared with some of my friends, each with their own domains for it.

2. It's a Django app that's taking full advantage of the async features that have been added in the most recent releases of that framework. Async is a perfect match for ActivityPub due to the need to send thousands of outbound HTTP requests when publishing a message. And Takahē creator Andrew Godwin is the perfect person to build this because he's been driving the integration of async into Django for the past four years: https://www.aeracode.org/2018/06/04/django-async-roadmap/

3. The way it handles task queueing is super interesting. I've not fully got my head around it yet but it's the part of the codebase called Stator and it's modeled on things like the Kubernetes reconciliation loop - Andrew wrote a bit more about that here: https://www.aeracode.org/2022/11/14/takahe-new-server/ - Stator code is here: https://github.com/jointakahe/takahe/blob/main/stator/runner...


This is really interesting. Thanks for writing this comment and sharing.

Async is good for lots of IO work and managing independent tasks with low coupling.

I am interested in task scheduling and asynchronous code I am interested in programming language development and parallelism and simultaneity without parallelism and cooperative and preemptive scheduling.

As an experiment inspired by Protothreads (a C library for implementing cooperative multitasking with a switch statement) I recently implemented async/await in Java as a giant switch statement and a while loop.

Providing that each coroutine only runs once, the amount of memory used shall not grow. The goal is to be stackless.

I began writing a programming language that looks similar to JavaScript but targets an imaginary interpreter that is multithreaded. I hope to think of how to represent async await so that the high level language can target the interpreter. I need to think of the code I need to codegen to implement async/await.

I played around with an C++ coroutines but someone told me that the approach I used is not C++20.

Code is at https://GitHub.com/samsquire/multiversion-concurrency-contro...

The reconciliation loop idea sounds interesting.


You can use your domain on Mastodon via WebFinger, you don’t need to necessarily self host it.


That's what I'm doing right now - my personal site is https://simonwillison.net/ and serves WebFinger - my Mastodon instance is https://fedi.simonwillison.net/ which is hosted by https://masto.host/ (just because I don't want to sysadmin it).

The problem is that you can't have multiple domains point to a single Mastodon instance. I'd like to share my single instance with friends who can bring their own domain name.

Basically the problem is that current Mastodon only supports single settings for the LOCAL_DOMAIN and WEB_DOMAIN.

More details on how mine works here: https://til.simonwillison.net/mastodon/custom-domain-mastodo...


There is an open GitHub issue to add that functionality! If anyone is able to help your work would be appreciated!

https://github.com/mastodon/mastodon/issues/2668


Ahh, okay, this makes more sense now - appreciate the breakdown!


I know of an organization that just advertised their new Mastodon instance as being at social.[domain].com. Is it too late now for them to start using WebFinger and advertise Mastodon handles at simply [domain].com?


They would have to change all their addresses, but the advertise account change thing might be enough depending on need


Perhaps naive but is it possible to create some sort of Mastodon proxy that exists independent of any specific instance? Rather than run your own instance or point to a shared instance, a proxy could be a fairly simple system that uses DNS records (?) to route requests to the appropriate instance -- much like email.


Unfortunately that doesn't quite work with out-of-the-box Mastodon.

I'm running a bit of a proxy at https://simonwillison.net/.well-known/webfinger?resource=acc... but it still needs to point to my own dedicated instance, just because Mastodon can't have multiple domains pointed at a single instance of the software yet.

I'm using this pattern (also shared by Andrew, before he started to spin up Takahē) https://aeracode.org/2022/11/01/fediverse-custom-domains/


Found Andrew Godwin on mastodon: @andrew@aeracode.org


Ahh this is so exciting to see so much happening in this space all of a sudden! My quest to get a personal instance running has been a long slog for me personally.

I had been working on an ActivityPub server in Node.js/TypeScript for a while before the Twitter migration. It's got most of the features I'd want in a small server but it's basically bring-your-own-client at the moment.

https://github.com/michaelcpuckett/activitypub-core

Finding all the resources to build a complete server that can interact with other instances isn't easy, so maybe this can help someone. The spec is well worded, but the checklist is confusing, the test server is down, Mastodon has its own rules, etc. Plus you have to have at least a cursory knowledge of JSON-LD/RDF.


Your project looks super interesting.

I had the idea of running a single user server on CloudFlare Workers and using D2 (their SQLite based db). A light weight JS/TS implementation would be perfect. Looks like you have Postgres planned, it would probably be possible to expand from that to SQLite.


I work on an ActivityPub server in Go that supports an sqlite backend. Check my bio for details.


Cool idea! I'll start looking into that.


Yeh, that’s what I’d like too (or using one of the other edge compute services)


>Ahh this is so exciting to see so much happening in this space all of a sudden!

It's like Elon unknowingly funded this space!


Never underestimate HN's pettiness and spite!

Now it's mainstream to work on a a cool technology that's been around for awhile!

Oh and everyone can act like they weren't bad mouthing the tech and saying it wasn't going to work before.

Visit any Mastodon thread here before Elon's Twitter and it's nothing but negativity.


well, there's more than one person on this website.

also, a lot of these projects have been on a slow simmer for a long time, and are only just now starting to become complete and interesting.

edit: though it does seem to be true that takahe's initial commit was nov 5 :) and personally i don't consider it complete and interesting yet


Yes there are multiple people here, but the general sentiment was negative among most threads.

Go ahead and look over any Mastodon thread a year ago or before.

Generally it was dismissed with "oh it's too niche" or "moderation will be too difficult".

People ignored the communities already on it and the tech overall.

Only until people got pissy about Elon running Twitter instead of hedge funds did the general sentiment here change.

It wasn't about the tech, and not even about Elon specifically, it was the Twitter safe space got taken away.

But now hopefully the people who want that safe space will isolate themselves in mastodon instances that block all others and we can all live in peace from them.


I suspect you may be surprised which instances get islanded off from the rest of the fedi.

Or rather, which ones already are.


I've been in the mastodon world for longer than the Twitter drama, so I'm well aware.

When you block other instances you realize you are islanding yourself off, right? Not the other way around.

Everyone is federated until you block, so you are isolating yourself from the norm when you block others.

It's not too noticeable as the english speaking instances are small currently, but those who don't want wolfballs and friends, or this or that are more isolated than those who do.

It makes sense that those who want to hide views from others are the outliers. Those who want to be open and allowing of diverse thought are more interoperable.


If the majority of servers federate with each other and block the same servers, wouldn't the larger group be the "not islanded" group?

My argument is that if a server behaves in a way that the majority of large servers block it, then it is islanded, not the others.


Is the server ActivityPub Client to Server compliant?


I think this refers to handling JSON ActivityStreams objects at the `/outbox` endpoint for a logged-in user, and then broadcasting those out appropriately. If so, then yes that's the only API that's used. It also handles the uploadMedia endpoint and a few other details that are included in the spec.

I have tried unsuccessfully so far to set up an OAuth provider server along with it, so that you could log in on your phone, etc.


That is great news, one of the few implementations that does it. :D Do you have a demo server set up anywhere? Mine (based on my own activitypub code) is at https://federated.id :D


> Features on the long-term roadmap: > “Since you were gone” optional algorithmic timeline

That's exciting! The fediverse is severely lacking algorithmic curation presumably due to the belief that it's inherently evil (I'd strongly disagree; it's merely the algorithm not being user-controllable what's bad).


Fully agree, the algorithmic timeline (sprinkling in some likes and comments from other people that might be interesting to you) is one of Twitters best features even if many people (who mostly use third party clients for that reason) would not agree.


Do you people know that "# Explore" section on the right of Mastodon already lists posts which are gaining traction? It also lists news which are trending.

It says:

"These posts from this and other servers in the decentralized network are gaining traction on this server right now."

I don't know what the logic is, but on big servers it's listing a lot of content.


That's very different from the Twitter timeline though, which shows you "good" content from people you already follow that happened since the last time you used Twitter. So if you refresh a bunch of times you'll always see more interesting tweets / likes / comments.


Yes you are right, but IMO it's algorithmic curation already, like the parent said. It's just not from your timeline but from whole fediverse / server you are on.

It's probably not that difficult to add one based on your feed if there is one globalized already.


To me, the main difference to Twitter seems to be that you have to explicitly go to the "Explore" section to look for trending content, leaving your "default" home timeline chronological.

So if you want to check out what's the current buzz, you can, but you won't unknowingly be missing posts from those you follow (which seems to be the common complaint on Twitter).


I don't think those results are personalized which makes them worthless compared to twitter's suggestions.


The word "algorithm" has suffered wild semantic drift at the hands of journalists. Let's see if we can start to fix that now by making sure that on HN and in adjacent communities of all places we use the appropriate words for the thing we are talking about.

We are talking about heuristics here, not algorithms.


I would say that ship has sailed, just like "crypto" doesn't mean cryptography in the previous sense anymore.


Crypto has its own meanings and cryptocurrency also has some of those other meanings (not just the historical connection to encoding).

I especially like some of the biological metaphors:

https://www.cdc.gov/parasites/crypto/index.html

https://en.m.wikipedia.org/wiki/Cryptozoology

https://en.m.wikipedia.org/wiki/Aggressive_mimicry#Mimesis “cryptic aggressive mimicry is where the predator mimics an organism that its prey is indifferent to” i.e. wolf in sheep’s clothing.


Algorithm is the correct word. Why don't you think it is?


For the reason I already said.


Looks interesting.

Why does it need a Postgresql server? For just a handful of users, isn't sqlite the leaner, yet sufficient choice?

How does it compare to GoToSocial, which requires 50-100MB of RAM? They are also in alpha stage and i like their approach of keeping the web UI separate.


Author here - it's just to reduce support surface area. I know I'll need PostgreSQL's full text indexing and GIN indexes for hashtags/search eventually, and I probably also want to use some of the upsert and other specialised queries, and it's easier to just target one DB I know is very capable.

For reference, when I say "small to medium", in my head that means "up to about 1,000 people right now".


That sounds like a very low number I would have never have guessed. Is Mastodon so heavy?


People were getting priced out of hosting an instance with "only" 10-20k users and the instance hosting services quote <= 4k users with the 4k end being >$US100/month. With the "low end" 1-200 user instances having 4 cores, 5tb of monthly bandwidth, etc.

The general sense I have got is that mastodon - the default software at least - is extremely resource heavy for relatively low user counts. My assumption/hope was that the bulk of this is that the server software hasn't ever really been under sufficient pressure to improve, and takahē seems to indicate that there's at least some room for improvement on the server side (i.e. performance problems aren't entirely protocol/architecture problems)


SQLite has full text search and upsert.

GIN indexes sound cool - perhaps you can get away not using them however and instead support 2 DB backends?

If you want to accomodate "small" and not just medium, that would be great! ;-)


I was poking into this a bit yesterday.

Is there any advantage to using a traditional db as opposed to a graph db since json-ld is just a text representation of graph nodes?

I was thinking the easiest path would be have the server deal with all the activityPub stuff and expose something like a graphQL interface for a bring your own client implementation. Of all the stuff they shoehorned graphQL into this seems like a valid fit, like they were made for each other.

Anyhoo, just my random thoughts…


For better or worse, many servers are targeting Mastodon API compatibility to be able to leverage the existing clients. Adding GraphQL increases surface area without solving the bigger issue of creating the clients.


I didn’t get as far as looking into the mastodon API for clients but that makes perfect sense, I just assumed it was an overlay on the more general API.

Mostly I was thinking how one could implement something in the most efficient way and graph databases/graphQL were literally designed for this stuff.


if they don't have PSQL specific queries, it might be a trivial change: https://github.com/jointakahe/takahe/blob/main/takahe/settin...


I tried swapping that for SQLite and successfully ran the test suite about a week ago, but I've not tried that again against the large number of more recent changes.


I wonder if Postlite would work.

Actually that looks more like an interactive client.... https://news.ycombinator.com/item?id=30875837


SQLite is magical and incredibly lean, but it is not leaner than Postgres if you need real database features. You end up reimplementing a lot of features in code that belong in the db.


What kind of features are you talking about here?

This doesn't match my experience from the last few years. SQLite in WAL mode is extremely capable.

The only thing I really miss from PostgreSQL is that PostgreSQL has more built-in functions for things like date handling - but SQLite custom functions are very easy to register when you need them.


Constraints and validation for example. Efficient json store. Etc


SQLite has check constraints - and recent versions can have STRICT tables: https://www.sqlite.org/stricttables.html

It also has excellent JSON features - JSON maybe stored as text rather than a binary format like JSONB in PostgreSQL, but the SQLite JSON functions crunch through it at multiple GBs per second so it doesn't seem to matter.


Didn't know about strict tables, very cool!


Nice to see a Python/Django implementation of ActivityPub. Having a nice, lean implementation of ActivityPub that I can customize to my liking is the only thing that keeps me from using the Fediverse more regularly. So I am watching the space closely.

What I find a bit unfortunate about Takahe is the coupling with Docker.

An even leaner ActivityPub implementation seems to be MicroBlogPub. I have not yet managed to set it up though.

Anybody interested in collaborating on a MicroBlogPub install script that turns a fresh Ubuntu installation (or container) into a running MicroBlogPub instance?


It's not coupled with Docker. Docker is purely one suggested way of running it - it's a classic Django app so running it directly on Ubuntu should work the same as any other Django application.


Great!

When I saw "Prerequisites: Something that can run Docker/OCI images" in the documentation, my interpretation was that containers are needed. It also says "You need to run at least two copies of the Docker image". Maybe you want to change the wording a bit.

I would also collaborate on writing a setup script for Takahe then!

I really like to write a setup script instead of following manual installation guides. So for every software I try, my first step is to write a script that turns a fresh Debian installation into a running instance. (MicroBlogPub needs Python 3.10 which is not in Debian stable, so I would use Ubuntu)


Hmm.. does not look good for the non-Docker setup. The developer replied with "I am deliberately avoiding offering a non-Docker install path" and closed the issue:

https://github.com/jointakahe/takahe/issues/44

Creating a non-Docker fork would then probably be an uphill battle.


Full quote:

> So, I am deliberately avoiding offering a non-Docker install path that is supported right now as it leads to a lot of support burden with different OS package versions and the like!

That doesn't mean you can't write and share a script for people who want to install it without Docker.

It means that he doesn't want to take responsibility for non-Docker installation scripts as part of the official documentation (yet), because if he did that he'd be on the hook to keep researching and updating those scripts in the future.


The non docker install path for almost any project is to run the individual scripts from the dockerfile, plus adding the CMD to systemd or similar.


> What I find a bit unfortunate about Takahe is the coupling with Docker.

While I don't love it, it's very understandable for a single-dev application. Anything else involves blizzards of questions and bugs filed against people using their disto version of Django vs their downloaded version of Django and the many versions of distros and the many conventions for Python environments and...

Exhausting.


Are there any ActivityPub benchmarks to compare various implementations of Mastodon-compatible instances? Ie written in different languages, etc.

For instance, Go seems to be around an order of magnitude faster than Ruby, and I think I've seen a Golang implementation of ActivityPub somewhere. https://programming-language-benchmarks.vercel.app/go-vs-rub...


BSD3, yay - even though I’m worthless at python, love to see an implementation that is not AGPL


Surprised it hasn't been attacked over this yet as there's so much needless hand wringing about anything non-AGPL being a threat to the anti-capitalism views of the Fediverse


Django does many things but “efficient” is not one of them.


Mastodon currently needs 2GB of RAM. Takahē can run in a lot less than that.


Can I run it on a Raspberry Pi 3 with 1GB of RAM?


Let's find out.


You’re wrong. Just because it’s not the absolute peak of efficiency, written in C with asm routines to talk to the db, doesn’t mean it’s not efficient.


https://www.techempower.com/benchmarks/#section=data-r21&tes...

Django ranks #137 out of #142 across numerous web frameworks and languages. It’s literally one of the least performant options that exist.


Performance and efficiency aren’t the same thing. Django does a lot of things other frameworks ranked here don’t do.

Such framework rankings are also utterly irrelevant when you want something widely used enough to easily find contributors and integrations. That restricts you quite a bit more than “any so called framework that just handles http”.

Did you even look at the top performers on that page? This is number 2: https://github.com/Xudong-Huang/may_minihttp


mastodon is ruby on rails, just sligthly more "efficient" from django according to those benchmarks


> Django does many things but “efficient” is not one of them.

It depends on how you code. I wrote a user instance in django and and I'm happy with it's performance.


+1 for adopting django. with django's roots in journalism it somehow feels a natural building block of federated information exchange


I see honk hasn't been mentioned on this thread. It's also an activitypub server which is very lightweight (golang) and easy to set up your own server. https://humungus.tedunangst.com/r/honk


https://humungus.tedunangst.com/r/honk/v/tip/f/web.go reads like it is either been sent through an obfuscator or is in the church of templeOS



It's unfortunate, because Honk appears to be well designed otherwise, but I found it difficult enough to grok the idiosyncratic naming conventions that I gave up.


> butwhatabout(mdaniel)

I see the sibling comment about obfuscation, but not sure I follow either of you. Is this code not clear?

To me the code reads with humor and creativity, while every bit as self-evident as a Gary Larson FarSide cartoon on second glance. I mean, what else is nomoroboto going to do than what it does?

I've never seen this tone in the wild before, but got a kick out of it, might even find it refreshing maintaining it.


squints I can't tell if this is trolling or not

Anyway, you're right, all code should be written in haiku form, to maximize creativity and succinctness, plus keeping methods short! True elite coders ensure variable names are always a prime number of characters


    allinjest(originate(mdaniel),j)
// but not entirely jest ... "emptiness" and "thelistingoftheontologies" ... it's delightful!


Brief discussion of Takahe in the TalkPython podcast here https://youtu.be/LhBfMoR3bvI?t=2369


I'm very interested in this federated renaissance happening, but having trouble understanding how all the pieces fit together. Is there a good overview I can read? I think ActivityPub is the (a?) protocol, and Mastodon is one particular implementation of it, just like the software linked here? Are there other relevant or competing protocols? How does Matrix fit in exactly? What about identity? Is OpenID a part of all this somehow?


The Fediverse is the loose network of servers that exchange data with the ActivityPub protocol. Mastodon is a server that implements a chunk of AP. Mastodon also specifies a client API that is not AP, but is fairly often implemented by other packages because it's convenient.

AP is not entirely Twitter-style microblogging. It can be used to exchange (data or links) photos, video, audio, documents, invitations and meeting appointments. The default privacy assumption for all AP content is that it is basically public.

Matrix is not built on AP. Matrix is a real-time communications protocol suitable for private messages and public chats. Its mission appears to be to bridge every other protocol, so there's at least one Matrix-ActivityPub bridge module, MXToot.

OpenID is not part of these. Some Matrix servers can use OpenID for authentication. As far as I know, no ActivityPub servers currently use OpenID.


Matrix is actually switching to use OpenID natively: https://areweoidcyet.com


Your take is much more authoritative than mine!


Nice project! I like the goals, and hope we see more good ActivityPub options like this. Just a note that the web UI isn’t formatting correctly on iOS!


Takahē are interesting birds too. There is a related bird the pūkeko which was also blown to NZ but at a different time. It has relatives in Australia and South America also. It was thought to be extinct at one point, caused by predation by introduced pests, and introduced deer eating the grasses they rely on for food. Now there is a population of about 400?


Love that you’ve named it after an utterly charming little bird


[flagged]


Counterpoint: If the fediverse has any hope of achieving critical mass, it can’t confuse users by constantly breaking federation for every minor disagreement.


One more reason to break federation.

The federation was always meant to build communities around shared interests and values.

If I wanted the twitter "experience" I would just use twitter.

If I wanted a "free for all" environment I would be on 4chan.


Has that happened?

I was trying to understand how it work in practice. It would be relevant to picking which instance to use.


Constantly, it's great. There's big air-gaps between the US Lefty and US Righty fediverses. If you want to talk to both sides of that divide then you'll want to have accounts on multiple servers. If you decide to only be on one side then it's easy to migrate your accounts around between servers.


If I host my own instance, can I federate with both 'halves'?

Or will associating with one likely get my instance blocked from federating with the other?


At least in the past I know that would get you blocked as it acted as a bridge to allow tweets from the blocked instance to get through.


> Has that happened?

Yes. Some are over things that should be relatively uncontroversial, like loli-hosting instances. Others are more likely to garner upset on HN.


I think this is a prime example of why the fediverse won't even come close to replacing Twitter for Twitter users.


Nah. There are various people who call for policies like this, but then most reasonable instance admins ignore them, and the fediverse continues to operate.

The same thing has happened in the free software movement in general; some folks have called for copyleft-only, and the rest of the world has largely ignored them and is fine with shipping BSD licensed software along with GPL licensed software.


And now we have gazillions of phones going into the trash, unusable because of closed drivers making continued usage unviable. Nice.


There is constantly threats like that in Fediverse. For instance some threatened to defederate if Tumblr implements ActivityPub and don't remove Ads and what not.

In reality biggest servers still federate with most servers. With latest changes in Mastodon you can now see whom they don't federate. It's not huge list, and if you browse those who don't federate with, it's pretty obvious why.

A lot of those defederated services advertises themselves as some sort of free-speech absolutist alternatives.


I don't think OP's comment is in any way representative of the fediverse.


My point was moreso that this kind of attitude among the group of people who are technically adept enough to run servers is unattractive to the average user, who couldn't care less about the license of the open source software they're using.


Good.


I have come here to rail against (pun intended) the use of the name Takahē for a piece of software. The author is well-intentioned and there is some aptness to the name, but many people here in Aotearoa / New Zealand, are sensitive to the use of the names of our tāonga / treasures for businesses, technologies and other objects. Specifically, the takahē is an indigenous species tāonga as described in https://www.taiuru.maori.nz/branding/. As such it would be best to consult with the Ngāi Tahu people before using the takahē’s name in this manner. In this link from NZ’s Department of Conservation (https://www.doc.govt.nz/news/media-releases/2021-media-relea...), you can see that NZers take this sort of iwi partnership seriously.

More broadly, I find it sad when the the names of natural species and features are adopted in the business and technology world without any deep connection. A canonical case would be Amazon the company, which has prospered and become a household name while the Amazon itself, with its people and ecosystems has suffered and declined. An egregious case relevant to NZ is Kiwi Farms.

The trend of using species names in technology perhaps started with the O’Reilly books. The argument can be raised that such use raises awareness of endangered species such as the takahē. But perhaps that is best left to other means, for fear that the mauri of a species should be captured and harmed.


As with most things it's not that black and white. Ngāi Tahu are actually the biggest polluters of the water supply where I live as they cut down acres of forest to create massive dairy farms. The creator of this software picked a really great name since it actually promotes NZ conservation. I've donated my time and money to plant native species in NZ and I find your position ridiculous. Does everyone need to run their language past some woke authority now to make sure it doesn't offend some guys on the internet? If you want to help out with conservation why don't you spend your time volunteering for one of the many tree planting charities?


What makes you think I don't, mate?


The intentions of the creator of the product do seem genuine and worthy, but it's still cultural appropriation without consultation or acknowledgement of the tāonga behind the name.

Just because you disagree doesn't make it "woke".


> cultural appropriation

Is a woke term. By the way they mention how you can donate to a Takahe recovery programme here https://jointakahe.org/about/


I think we can aim for criticism that’s a little higher brow. Whataboutism and dismissal of a position because it’s ‘woke’ is pretty unhelpful.


People have every right not to buy into racist concepts like cultural appropriation and reject them without further arguments.


I'm tangata whenua and quite frankly this is ridiculous - Ngāi Tahu doesn't speak for all Māori. I'm perfectly fine with it.


Just to be clear, its actually the bird itself I care about. I think its identity is something that deserves respect and shouldn't just be randomly adopted by someone who thinks it is cool. I am glad that some Māori people are asserting a degree of hegemony over the use of NZ names and identities and I appreciate their intent. If you think this is ridiculous, just try and call your technology product Walmart, Amazon, The Warehouse or even the All Blacks and see what happens.


Good thing birds aren't running competing businesses in the same industry then.


I think that it’s worth a shot opening an issue on their GitHub page.

Also see this: https://fedi.aeracode.org/@andrew/109400227857910033




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: