Sabareh/The-Forex-Data-Pipeline

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

7 Commits
 
 
 
 

Repository files navigation

The Forex Data Pipeline is a comprehensive solution designed to collect, process, and prepare currency exchange rate data for downstream machine-learning pipelines. This repository showcases the creation of a data pipeline that fetches currency rates from an external API, performs data transformation using PySpark, and loads the processed data into a Hive table within the Hadoop Distributed File System (HDFS). The primary goal is to provide clean and structured currency rate data for seamless integration into subsequent machine-learning workflows.

  • Data Extraction: The pipeline connects to an external API to retrieve real-time currency exchange rates, ensuring that the most up-to-date information is captured.

  • Data Transformation: PySpark is employed to perform data wrangling and transformation tasks, ensuring that the raw data is refined, cleansed, and structured for analysis.

  • Hive Integration: The processed data is stored in a Hive table within HDFS, facilitating efficient storage and retrieval of the prepared currency rate information.

  • Seamless ML Integration: By providing clean and well-structured data, the Forex Data Pipeline sets the stage for downstream machine-learning pipelines to integrate and build predictive models seamlessly.

The repository is organized as follows:

  • /forex_data_pipeline_final.py: Contains the source code for the Forex Data Pipeline.

To use the Forex Data Pipeline:

  1. Clone this repository to your local machine:
git clone https://.com/sabareh/forex-data-pipeline.git

About

The Forex Data Pipeline is a comprehensive solution designed to collect, process, and prepare currency exchange rate data for downstream machine-learning pipelines. This repository showcases the creation of a data pipeline that fetches currency rates from an external API and performs data transformation using PySpark.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published