A Higher Mastodon Consumer – Tim Kellogg

Final night time I had an concept and went forward and constructed it. I’d wish to let you know about it. Discover the supply code here.
I exploit Mastodon as my major social media. I prefer it as a result of the sheer density of fine information in my feed. So
a lot good dialog occurs on Mastodon. However my timeline is getting just a little uncontrolled.
Mastodon let’s me observe hashtags, like #LLMs
or #AI
, at which level my timeline will get all toots that my server
(hachyderm.io)
dealt with that had been tagged accordingly. It’s not an enormous quantity, however hachyderm is pretty massive so I get a great quantity of
toots, in all probability 1,000-1,500 toots per day. It’s getting laborious to maintain up with.
I ought to have the ability to automate this!
So right here’s my concept: a streamlit dashboard that
- downloads newest toots in my timeline
- cache them in SQLite
- generate embeddings for every toot
- do k-means clustering to group them by comparable matter
- use an LLM to summarize every cluster of toots
- use tailscale to view it on my telephone
I selected streamlit as a result of it’s fast and soiled. I determine this isn’t going to be nice on the primary
move, so streamlit ought to assist me iterate shortly to make it work higher for me.
The wonderful thing about Mastodon is it’s utterly open supply, so the API is open and at all times might be,
not like Twitter/X or the opposite platforms which have been locking down. FWIW I do assume the fediverse is the
long-term proper mannequin for social media, for a wide range of causes.
Embeddings
A fast observe — embeddings are a numeric illustration of textual content that corresponds to the which means of the textual content.
I like to think about it as an “AI secret language”, in that it’s the illustration that enormous language fashions use to
work with the textual content. We’re utilizing a clustering algorithm right here to group comparable toots, there’s numerous different issues
you are able to do with embeddings too!
Constructing It
I went from “oh! I’ve an concept” to a working resolution in about 3.5 hours. I used Github Copilot, particularly
with the chat feature (CMD+I, sort “create a SQLite DB with a toots desk”). It’s unimaginable how shortly you
can check out concepts.
If you wish to take a peek:
- The UI (dashboard.py)
- The SQLite DB (core.py)
- Obtain timeline (core.py) — I used requests, no particular shopper
- Generate embeddings (core.py — I used OpenAI’s
text-embedding-ada-002
. Its low-cost and simple to setup. - Ok-means clustering (science.py) — scikit-learn makes this tremendous simple, simply 4 strains.
- Summarize clusters (science.py) — I used
gpt-3.5-turbo
as a result of it’s cheap-ish and adequate
The streamlit dashboard shows the clusters as an expander container. When the dashboard hundreds
you see a listing of cluster descriptions and you’ll select which to dive into.
The toots are displayed poorly, imo, it may use numerous work. I’d additionally like to have the ability to favourite and retoot
from this UI, at which level I may in all probability use it as my major shopper for my right-after-I-wake-up looking.
I’ve used it for a couple of hours and I like with the ability to skip over huge stretches of my timeline with relative
confidence that I do know what I’m skipping. I’m in management once more.
On a extra philosophical observe, I like the concept of social media algorithms however I hate the implementations.
Viewing social media in timeline order is much too noisy. Algorithms that curate my feed make it much more manageable.
Then again, I don’t understand how X or Instagram are curating my feed. So far as I can inform, they’re optimizing
for their very own revenue, which feels manipulative. I would like my feed to serve me, no different method.
What do you assume? How may or not it’s improved?
Subsequent: I wrote a followup to this put up, about open source and societal alignment.