$/> cd <project_directory>
OS X & Linux:
$ sudo apt-get install python-pip
$ pip3 install virtualenv
$ virtualenv -p /usr/bin/python3 <virtualenv_name>
$ <virtualenv_name>/bin/activate
Windows:
> pip install virtualenv
> virtualenv <virtualenv_name>
> <virtualenv_name>\Scripts\activate
OS X & Linux:
$ pip3 install -r requirements.txt
Windows:
> pip install -r requirements.txt
Ensure the Database credentials stored in the warehouse_config.cfg
are replaced with your Database credentials. After this, you can do the following to run the programs:
source
folder of this project (if you created a virtual environment before installing the requirements.txt
file, ensure to activate the environment).\Then:
OS X & Linux:
$ python3 scraper.py
Windows:
> python scraper.py
This process will output a csv file that the etl.py
module will work on.
OS X & Linux:
$ python3 etl.py
Windows:
> python etl.py
This process will output a parquet file, create tables and load data into the tables of the specified database
...in progress