mahmoudparsian/data-algorithms-with-spark

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

"... This book will be a great resource for
both readers looking to implement existing
algorithms in a scalable fashion and readers
who are developing new, custom algorithms
using Spark. ..."

Dr. Matei Zaharia
Original Creator of Apache Spark

FOREWORD by Dr. Matei Zaharia




All programs are tested with the following software:

SparkPythonScalaJava
Apache Spark 3.4.0Python 3.10.5Scala 2.13Java 11

ChapterTitle
GlossaryGlossary of Big Data, MapReduce, Spark
Chapter 1Introduction to Data Algorithms
Chapter 2Transformations in Action
Chapter 3Mapper Transformations
Chapter 4Reductions in Spark
Chapter 5Partitioning Data
Chapter 6Graph Algorithms
Chapter 7Interacting with External Data Sources
Chapter 8Ranking Algorithms
Chapter 9Fundamental Data Design Patterns
Chapter 10Common Data Design Patterns
Chapter 11Join Design Patterns
Chapter 12Feature Engineering in PySpark

Bonus ChapterTitle / Description
GlossaryGlossary of Big Data, MapReduce, Spark
Word CountSolutions for Word Count using RDDs and DataFrames
AnagramsFind words, which are anagrams
Lambda ExpressionsUsing Lambda Expressions in PySpark programs
TF-IDFTerm Frequency - Inverse Document Frequency
K-mersK-mers for DNA Sequences
CorrelationAll vs. All Correlation
Mapping PartitionsmapPartitions() Complete Example
UDFUser-Defined Function Examples
DataFrames TransformationsExamples on Creation and Transformation of DataFrames
DataFrames TutorialsDataFrames Tutorials: from collections and CSV text files
Join OperationsExamples on join of RDDs and DataFrames
PySpark Tutorial 101Examples on using PySpark RDDs and DataFrames
Physical Data PartitioningTutorial of Physical Data Partitioning
Monoids and CombinersMonoid as a Design Principle

Data Algorithms with Spark Data Algorithms with Spark Data Algorithms with Spark