Issue:Architecture on ActivityPub
Priority 3 Created 2023-02-09
If you use ActivityPub for crawling, it could cover the fediverse and not only mastodon, since basically anyone with a Fediverse account can use it then and no API access is required. Basically, If you would have a single user crawler bot instance, where you could sign up to be followed by the crawler, the federated timeline of that instance would be the index itself, without much effort. If you install elastic on that instance it's even searchable. If it is connected to a one-way relay, small instances could use it as a data source to populate their federated timeline.
Considering how efficient ActivityPub and ActivityStream are handling stuff, I might imagine that with increased usage, it might be the only viable solution. Otherwise you have to crawl tens of thousands of accounts individually via API instead of getting the stuff delivered with doubles automatically elliminated.
Another idea might be to to use activitypub-php:
https://github.com/pterotype-project/activitypub-php
That way your site becomes it's own instance and gets the stuff efficiently pushed via ActivityStream. The site can then by itself follow it's subscribers.
I just suggest stuff, because personally I'm quite sceptical about the API approach by my experience.
Another plugin (PHP Wordpress)