DXY-COVID-19-Crawler

2019新型冠状病毒疫情实时爬虫及API | COVID-19/2019-nCoV Realtime Infection Crawler and API

MIT License

Stars
2K
Committers
2

COVID-19/2019-nCoV Infection Data Realtime Crawler

简体中文 | English

COVID-19/2019-nCoV infection data realtime crawler, the data source is Ding Xiang Yuan.

Please reduce the deployment of crawlers in order to prevent the crawlers from flooding the DXY and occupying too much traffic, such that other users in need cannot get the data in time.

I prepared an API for you to make visualizations and analysis, which is for free and does not have any limitation in using.

API:https://lab.isaaclin.cn/nCoV Remarks:

  1. The API will return both Chinese and English version of city names.
    For more information, please refer to Issue #61.
  2. Due to the limitation of the server's bandwidth, starting from March 19, 2020,
    /nCoV/api/overall and /nCoV/api/area do not response time-series data.
    You can fetch time-series data in json folder of the data warehouse.
    If you call the API with latest=0, please modify the request parameters,
    otherwise, you do not need to do any modification.

This project is subject to the MIT open source license. If you use the API, please declare the reference in your project.

Researchers Recently, many college teachers and students contacted me, hoping to use these data for scientific research. However, not everyone is familiar with the use of APIs and the format of JSON, so I deployed a data warehouse to publish the latest data in CSV format, which can be easily processed and loaded by most software.

Description

The deployed crawler will crawls the data every minutes, stores them into MongoDB, and saves all historical data updates. I hope it can be helpful in the future when backtracking the disease.

The description of the attributes is listed in the API page.

Noise Data

At present, some time series data in Zhejiang and Hubei are found containing noises. The possible reason is the manually processed data were recorded by mistake.

The crawler just crawl what it sees, do not deal with any noise data. Therefore, if you use the data for scientific research, please preprocess and clean the data properly.

In the meantime, I opened an issue for you to report the potential noise data. I will check and remove them periodically.

Reference

  1. If you would like to analyze the data with R,
    you can refer to pzhaonet/ncovr.
    This project will help you to directly load data into R from either GitHub Data Warehouse or API.

Research

All scientific research results are for reference only.

  1. yijunwang0805/YijunWang

Demonstration

  1. Website: https://ncov.deepeye.tech/
    Time-series data visualization.
  2. pzhaonet/ncov
    Website: https://ncov2020.org
  3. cuihuan/2020_wuhan
    Visualization: http://cuihuan.net/wuhan/news.html
  4. hack-fang/nCov
    Visualization: http://yiqing.ahusmart.com/
  5. ohdarling/2019-nCoV-Charts
    Visualization: https://2019-ncov-trends.tk/
  6. quadpixels/quadpixels.github.io
    Visualization: https://quadpixels.github.io/
  7. lzxue/yiqingditu
    Visualization: https://lzxue.github.io/yiqingditu/
  8. covid19viz/covid19viz.github.io
    Visualization: https://covid19viz.github.io/
  9. biluochun/data-ncov
    Visualization: https://biluochun.github.io/data-ncov/index.html
  10. Moyck/2019NCOV
  11. Mistletoer/NCP-historical-data-visualization-2019-nCoV-

Donation

No donation is needed.

Medical resources are in short supply throughout mainland China. If you want to donate, please move to Red Cross or officially recognized donation platforms, they can make better use of the funds or supplies to help those in need.

Wish you all the best.