crawlab

Distributed web crawler admin platform for spiders management regardless of languages and frameworks. 分布式爬虫管理平台,支持任何语言和框架

BSD-3-CLAUSE License

Stars
11.3K

Bot releases are hidden (Show)

crawlab - v0.4.1

Published by tikazyq almost 5 years ago

Features / Enhancement

  • Spiderfile Optimization. Stages changed from dictionary to array. #358
  • Baidu Tongji Update.

Bug Fixes

  • Unable to display schedule tasks. #353
  • Duplicate node registration. #334
crawlab - v0.4.0

Published by tikazyq almost 5 years ago

  • Configurable Spider
  • New Execute Task Mode (All Nodes)
  • Bug Fixes
crawlab - v0.3.5

Published by tikazyq almost 5 years ago

Features / Enhancement

  • Graceful Showdown. detail
  • Node Info Optimization. detail
  • Append System Environment Variables to Tasks. detail
  • Auto Refresh Task Log. detail
  • Enable HTTPS Deployment. detail

Bug Fixes

  • Unable to fetch spider list info in schedule jobs. detail
  • Unable to fetch node info from worker nodes. detail
  • Unable to select node when trying to run spider tasks. detail
  • Unable to fetch result count when result volume is large. #260
  • Node issue in schedule tasks. #244
crawlab - v0.3.4

Published by wo10378931 about 5 years ago

1、fix 非自定义爬虫前端无法查看爬虫的问题
2、fix kill主进程未kill子进程的问题
3、fix 爬虫异常退出状态错误的问题
4、fix kill进程后状态错误的问题

crawlab - V0.3.3

Published by wo10378931 about 5 years ago

1、fix mongo 密码特殊字符串错误的问题
2、fix 创建临时目录错误的问题
3、fix MD5值判断不正确的问题

crawlab - V0.3.2

Published by wo10378931 about 5 years ago

1、重构爬虫同步流程,修改为直接从GridFs上同步爬虫
2、fix 爬虫日志无法正常获取的问题
3、fix 爬虫无法正常同步的问题
4、fix 爬虫无法正常删除的问题
5、fix 任务状态无法正常停止的问题
6、优化爬虫列表的搜索

crawlab - v0.3.1

Published by tikazyq about 5 years ago

Features / Enhancement

  • Docker Image Optimization. Split docker further into master, worker, frontend with alpine image.
  • Unit Tests. Covered part of the backend code with unit tests.
  • Frontend Optimization. Login page, button size, hints of upload UI optimization.
  • More Flexible Node Registration. Allow users to pass a variable as key for node registration instead of MAC by default.

Bug Fixes

  • Uploading Large Spider Files Error. Memory crash issue when uploading large spider files. #150
  • Unable to Sync Spiders. Fixes through increasing level of write permission when synchronizing spider files. #114
  • Spider Page Issue. Fixes through removing the field "Site". #112
  • Node Display Issue. Nodes do not display correctly when running docker containers on multiple machines. #99
crawlab - v0.3.0

Published by tikazyq about 5 years ago

Features / Enhancement

  • Golang Backend: Refactored code from Python backend to Golang, much more stability and performance.
  • Node Network Graph: Visualization of node typology.
  • Node System Info: Available to see system info including OS, CPUs and executables.
  • Node Monitoring Enhancement: Nodes are monitored and registered through Redis.
  • File Management: Available to edit spider files online, including code highlight.
  • Login/Regiser/User Management: Require users to login to use Crawlab, allow user registration and user management, some role-based authorization.
  • Automatic Spider Deployment: Spiders are deployed/synchronized to all online nodes automatically.
  • Smaller Docker Image: Slimmed Docker image and reduced Docker image size from 1.3G to ~700M by applying Multi-Stage Build.

Bug Fixes

  • Node Status. Node status does not change even though it goes offline actually. #87
  • Spider Deployment Error. Fixed through Automatic Spider Deployment #83
  • Node not showing. Node not able to show online #81
  • Cron Job not working. Fixed through new Golang backend #64
  • Flower Error. Fixed through new Golang backend #57
crawlab - v0.2.4

Published by tikazyq about 5 years ago

Features / Enhancement

  • Documentation: Better and much more detailed documentation.
  • Better Crontab: Make crontab expression through crontab UI.
  • Better Performance: Switched from native flask engine to gunicorn. #78

Bugs Fixes

  • Deleting Spider. Deleting a spider does not only remove record in db but also removing related folder, tasks and schedules. #69
  • MongoDB Auth. Allow user to specify authenticationDatabase to connect to mongodb. #68
  • Windows Compatibility. Added eventlet to requirements.txt. #59
crawlab - Docker

Published by tikazyq over 5 years ago

  • Docker
  • CLI
  • Upload Spider
  • Edit Fields on Preview
crawlab - Automatic Extract Fields

Published by tikazyq over 5 years ago

  • Automatic Extract Fields
  • Download Results
  • Baidu Tongji
crawlab - Configurable Spider

Published by tikazyq over 5 years ago

  • Configurable Spider
  • Site List
crawlab - Advanced Analytics

Published by tikazyq over 5 years ago

  • Advanced stats (Spider Analytics)
  • Sites data
  • More spiders
crawlab - Basic Stats

Published by tikazyq over 5 years ago

  • Basic Stats
  • Advanced Stats
  • Near-realtime Task Info
  • Scheduled Tasks