a personal media (movies, tv episodes, video games, albums, listens) feed using HPI
MIT License
A personal feed/website using HPI
Live at https://sean.fish/feed/
This uses:
python
: to get my data using HPI, and to cleanup/enrich it with some local data/cached API requests. my_feed index
is called in the index
script, which syncs a JSON file up to the server which backend
can combine into the sqlite databasegolang
: basic REST API to let frontend paginate through the data, authenticated endpoints for updating the sqlite databasetypescript
: public-facing frontend; requests to the backend, lets user filter/order/search the dataIf not mentioned its likely a module in HPI
I periodically index all my data in the background:
Extracting my_feed.sources.listens.history...
Extracting my_feed.sources.listens.history: 5388 items (took 0.14 seconds)
Extracting my_feed.sources.games.steam...
Extracting my_feed.sources.games.steam: 285 items (took 0.01 seconds)
Extracting my_feed.sources.games.osrs...
Extracting my_feed.sources.games.osrs: 924 items (took 0.03 seconds)
Extracting my_feed.sources.games.game_center...
Extracting my_feed.sources.games.game_center: 141 items (took 0.02 seconds)
Extracting my_feed.sources.games.grouvee...
Extracting my_feed.sources.games.grouvee: 243 items (took 0.15 seconds)
Extracting my_feed.sources.games.chess...
Extracting my_feed.sources.games.chess: 681 items (took 2.98 seconds)
Extracting my_feed.sources.trakt.history...
Extracting my_feed.sources.trakt.history: 15327 items (took 11.51 seconds)
Extracting my_feed.sources.mpv.history...
Extracting my_feed.sources.mpv.history: 13807 items (took 13.67 seconds)
Extracting my_feed.sources.nextalbums.history...
Extracting my_feed.sources.nextalbums.history: 1938 items (took 2.36 seconds)
Extracting my_feed.sources.mal.history...
Extracting my_feed.sources.mal.history: 20865 items (took 3.58 seconds)
Total: 59599 items
Writing to 'backend/data/1644267551.json'
... which then gets synced up and combined into the sqlite
database on the backend
; all handled by index
That has a front-end so I can view/filter/sort stuff and view the data as an infinite scrollable list
Served with nginx
in prod, like:
location /feed/ {
proxy_pass http://127.0.0.1:4500/feed;
}
location /feed/_next/ {
# required since the above proxy pass doesn't end with '/'
proxy_pass http://127.0.0.1:4500/feed/_next/;
}
location /feed_api/ {
proxy_pass http://127.0.0.1:5100/;
}
For the python library:
git clone https://github.com/seanbreckenridge/my_feed
pip install -e ./my_feed
... installs my_feed
(or python3 -m my_feed
)
This uses the HPI
config structure (which you'd probably already have setup if you're using this)
To install dependencies for the servers, check the frontend and backend directories.
So, in ~/.config/my/my/config/feed.py
, create a top-level sources
function, which returns each function:
from typing import Iterator, Callable, TYPE_CHECKING
if TYPE_CHECKING:
from my_feed.sources.model import FeedItem
def sources() -> Iterator[Callable[[], Iterator["FeedItem"]]]:
# yields functions, when which called yield FeedItem
from my_feed.sources import games
yield games.steam
yield games.osrs
yield games.game_center
yield games.grouvee
yield games.chess
from my_feed.sources import (
trakt,
listens,
nextalbums,
mal,
mpv,
facebook_spotify_listens,
)
yield trakt.history
yield listens.history
yield nextalbums.history
yield mal.history
yield mpv.history
yield facebook_spotify_listens.history
The index
script in this repo:
my.time.tz.via_location
cache, so that timezones can be estimated for some of the data sources herersync
for some images hosted here/data/ids
endpoint on the server, which returns a list of known IDs (those are used to filter out duplicates before syncing)my_feed index
to save json objects to a local filescp
/check
), which makes the server process the json files, updating the local sqlite databaseTo blur images, my_feed index
accepts a -B
flag, which lets you match against the id
, title
, or image_url
with an fnmatch
or a regex
. Those are placed in a file, one per line, for example:
id:*up_2009_*
title:*up_2009_*
image_url:*up_2009_*
id_regex:.*up_2009_.*
title_regex:.*up_2009_.*
image_url_regex:.*up_2009_.*
my_feed
has a couple options that have developed over time, to let me ignore specific IDs (if I know they're already in the database), ignore sources which take a while to process (only do those once a week or so):
Usage: my_feed index [OPTIONS] [OUTPUT]
Options:
--echo / --no-echo Print feed items as they're computed
-i, --include-sources TEXT A comma delimited list of substrings of sources
to include. e.g. 'mpv,trakt,listens'
-e, --exclude-sources TEXT A comma delimited list of substrings of sources
to exclude. e.g. 'mpv,trakt,listens'
-E, --exclude-id-file PATH A json file containing a list of IDs to
exclude, from the /data/ids endpoint. reduces
amount of data to sync to the server
-C, --write-count-to PATH Write the number of items to this file
-B, --blur-images-file PATH A file containing a list of image URLs to blur,
one per line
--help Show this message and exit.
feed_check
updates some of my data which is updated more often (music (both mpv and listenbrainz), tv shows (trakt), chess, albums), by comparing the IDs of the latest items in the remote database to the corresponding live data sources.
This is pretty personal as it relies on my anacron
-like bgproc tool to handle updating data periodically.
So all of these follow some pattern like (e.g. for chess
)
end_time
of the last couple items from the my_feed
database (using the same JSON
endpoints the frontend uses)chess.com
API using chess_export
end_time
is not in the first page of the API), then:
evry tag
for the job that updates my chess games
bgproc_on_machine
to update all the expired datamy_feed
database on my serverfeed_check
runs once every 15 minutes, so my data is never more than 15 minutes out of date.
Example output:
[I 230921 15:44:15 feed_check:213] Checking 'check_albums'
[I 230921 15:44:18 feed_check:42] Requesting https://sean.fish/feed_api/data/?offset=0&order_by=when&sort=desc&limit=500&ftype=album
[I 230921 15:44:18 feed_check:213] Checking 'check_trakt'
[D 230921 15:44:18 export:32] Requesting 'https://api-v2launch.trakt.tv/users/purplepinapples/history?limit=100&page=1'...
[D 230921 15:44:20 export:46] First item: {'id': 9230963378, 'watched_at': '2023-09-21T08:03:23.000Z', 'action': 'watch', 'type': 'episode', 'episode': {'season': 1, 'number': 1, 'title': 'ROMANCE DAWN', 'ids': {'trakt': 5437335, 'tvdb': 8651297, 'imdb': 'tt11748904', 'tmdb': 2454621, 'tvrage': None}}, 'show': {'title': 'ONE PIECE', 'year': 2023, 'ids': {'trakt': 184618, 'slug': 'one-piece-2023', 'tvdb': 392276, 'imdb': 'tt11737520', 'tmdb': 111110, 'tvrage': None}}}
[I 230921 15:44:20 feed_check:42] Requesting https://sean.fish/feed_api/data/?offset=0&order_by=when&sort=desc&limit=10&ftype=trakt_history_movie,trakt_history_episode
[I 230921 15:44:21 feed_check:213] Checking 'check_chess'
[I 230921 15:44:21 feed_check:42] Requesting https://sean.fish/feed_api/data/?offset=0&order_by=when&sort=desc&limit=10&ftype=chess
Requesting https://api.chess.com/pub/player/seanbreckenridge/games/archives
Requesting https://api.chess.com/pub/player/seanbreckenridge/games/2023/09
[I 230921 15:44:22 feed_check:213] Checking 'check_mpv'
[I 230921 15:44:23 feed_check:42] Requesting https://sean.fish/feed_api/data/?offset=0&order_by=when&sort=desc&limit=500&ftype=listen
[I 230921 15:44:23 feed_check:213] Checking 'check_listens'
[I 230921 15:44:23 feed_check:42] Requesting https://sean.fish/feed_api/data/?offset=0&order_by=when&sort=desc&limit=500&ftype=listen
[D 230921 15:44:25 export:62] Requesting https://api.listenbrainz.org/1/user/seanbreckenridge/listens?count=100
[D 230921 15:44:25 export:84] Have 100, now searching for listens before 2023-09-11 04:39:08...
[I 230921 15:44:25 feed_check:213] Checking 'check_mal'
[I 230921 15:44:25 feed_check:42] Requesting https://sean.fish/feed_api/data/?offset=0&order_by=when&sort=desc&limit=50&ftype=anime,anime_episode
Expired: mpv.history
removed '/home/sean/.local/share/evry/data/my-feed-index-bg'
2023-09-21T15-44-35:bg-feed-index:running my_feed index...
Indexing...
This also has the upside of updating my local data whenever there are any changes to the data sources, which means any scripts using the corresponding HPI
modules also stay up to date.