A Dockerized RSS feed fetcher for NLP work, using asyncio
MIT License
An ongoing attempt at tying together various ML techniques for trending topic and sentiment analysis, coupled with some experimental Python async
coding, a distributed architecture, EventSource and lots of Docker goodness.
I needed a readily available corpus for doing text analytics and sentiment analysis, so I decided to make one from my RSS feeds.
Things escalated quickly from there on several fronts:
docker-compose
asyncio/uvloop
(as well as Sanic for the web front-end)This was originally the "dumb" part of the pipeline -- the corpus was fed into Azure ML and the Cognitive Services APIs for the nice stuff, so this started out mostly focusing fetching, parsing and filing away feeds.
It's now a rather more complex beast than I originally bargained for. Besides acting as a technology demonstrator for a number of things (including odds and ends like how to bundle NLTK datasets inside Docker) it is currently pushing the envelope on my Python Docker containers, which now feature Python 3.6.3 atop Ubuntu LTS.
auth0
supportimport.py
is a one-shot OPML importer (you should place your own feeds.opml
in the root directory)metrics.py
keeps tabs on various stats and pushes them out every few secondsscheduler.py
iterates through the database and queues feeds for fetchingfetcher.py
fetches feeds and stores them on DocumentDB/MongoDBparser.py
parses updated feeds into separate items and performs:langkit.py
)cortana.py
(WIP) will do topic detection and sentiment analysisweb.py
provides a simple web front-end for live status updates via SSE.Processes are written to leverage asyncio/uvloop
and interact via Redis (previously they interacted via ZeroMQ, but I'm already playing around with deploying this on Swarm and an Azure VM scaleset).
A Docker compose file is supplied for running the entire stack locally - you can tweak it up to version 3
and get things running on Swarm if you manually push the images to a private registry first, but I'll automate that once things are a little more stable.