Team Caelum OREGON STATE UNIVERSITY Senior Project!!!
Repo for capstone project
visual_web_crawler
|-Crawler Implementation of crawler algorithm (DEV)
|-crawler_app web application code [PROD]
|-deploy ansible code for deploying to server
|-[dev files] Jupyter notebook sprints, dev requirements, etc
# use -p if you want to use special python interpreter i'm using miniconda bc it is awesome
cas@ubuntu:~/working_dir/visual_web_crawler$ virtualenv -p /home/cas/miniconda/bin/python crawler
activate crawler virtual env
cas@ubuntu:~/working_dir/visual_web_crawler$ source crawler/bin/activate
(crawler) cas@ubuntu:~/working_dir/visual_web_crawler$ ls
crawler web_crawler_POC.ipynb README.md requirements.txt
install requirements
cas@ubuntu:~/working_dir/visual_web_crawler$ pip install -r requirements.txt
get geckodriver for selenium
# download latest
(crawler) cas@ubuntu:~/working_dir/visual_web_crawler$ curl -LO https://github.com/mozilla/geckodriver/releases/download/v0.19.1/geckodriver-v0.19.1-linux64.tar.gz
# untar/unzip
(crawler) cas@ubuntu:~/working_dir/visual_web_crawler$ gunzip geckodriver-v0.19.1-linux64.tar.gz
(crawler) cas@ubuntu:~/working_dir/visual_web_crawler$ tar -xvf geckodriver-v0.19.1-linux64.tar
# remove tarball
(crawler) cas@ubuntu:~/working_dir/visual_web_crawler$ rm geckodriver-v0.19.1-linux64.tar
# point it to virtualenv bin
(crawler) cas@ubuntu:~/working_dir/visual_web_crawler$ mv geckodriver crawler/bin/
look at jupyter notebook
(crawler) cas@ubuntu:~/working_dir/visual_web_crawler$ jupyter notebook