Python script to fetch GitHub repos metadata.
MIT License
This is the Python script to fetch metadata for the most top-rated repositories on GitHub. "rated" here means "having most stargazers". This project (successfully) deals with multiple quirks of working with GitHub API:
As of Nov. 12, 2016, all repos with >= 50 stars can be collected.
PyGitHub must be installed:
pip3 install -r requirements.txt
Besides, you must get the token to access GitHub API at full limit (Settings -> Personal access tokens).
python3 github_stars.py -i abcdefabcdefabcdefabcdefabcdefabcdefabcd -o repos.pickle
See --help
for optional arguments. You can change "pickle" to "json".
There are two stages. On the first stage, we plan how we will fetch data from Search API. With the "updated" dual-order hack, we can suck 2000 results from a single query. So we probe star intervals which yield less than 2000, e.g. 50..50, 90..91 or 356..371. The second stage is doing actual massive API requests.
MIT.