Course Notebooks for Python and Spark for Big Data
Course Outline:
Course Introduction
- Promo/Intro Video
- Course Curriculum Overview
- Introduction to Spark, RDDs, and Spark 2.0
Course Set-up
- Set-up Overview
- EC2 Installation Guide
- Local Installation Guide with VirtualBox
- Databricks Notebooks
- Unix Command Line Basics and Jupyter Notebook Overview
Spark DataFrames
- Spark DataFrames Section Introduction
- Spark DataFrame Basics
- Spark DataFrame Operations
- Groupby and Aggregate Functions
- Missing Data
- Dates and Timestamps
Spark DataFrame Project
- DataFrame Project Exercise
- DataFrame Project Exercise Solutions
Machine Learning
- Introduction to Machine Learning and ISLR
- Machine Learning with Spark and Python and MLlib
- Consulting Project Approach Overview
Linear Regression
- Introduction to Linear Regression
- Discussion on Data Transformations
- Linear Regression with PySpark Example (Car Data)
- Linear Regression Consulting Project (Housing Data)
- Linear Regression Consulting Project Solution
Logistic Regression
- Introduction to Logisitic Regression
- Logistic Regression Example
- Logistic Regression Consulting Project (Customer Churn)
- Logistic Regression Consluting Project Solution
Tree Methods
- Introduction to Tree Methods
- Decision Tree and Random Forest Example
- Random Forest Classification Consulting Project - Dog Food Data
- RF Classification Consulting Project Solutions
- RF Regression Project - (Facebook Data)
Clustering
- Introduction to K-means Clustering
- Clustering Example - Iris Dataset
- Clustering Consulting Project - Customer Segmentation (Fake Data)
- Clustering Consulting Project Solutions
Recommender System
- Introduction to Recommender Systems and Collaborative Filtering
- Code Along Project - MovieLens Dataset
- Possible Consulting Project ? Company Service Reviews
Natural Language Processing
- Introduction to Project/NLP/Naive Bayes Model
- What are pipelines?
- Code Along
Spark
- Introduction to Spark
- Spark Code-along!