An end to end guide on data extraction, transformation and loading (ETL)from a local website using Python and Beautiful Soup library.

The complete source code can be found on either github,
https://github.com/JacksonCakes/dataengineering/blob/main/ETL_for_Malaysia's_14th_General_Election_for_the_14th_Selangor_State_Legislative_Assembly.ipynb
or google colab
https://colab.research.google.com/drive/1wrN3WV8KPLe00ufySLygaeNP3B3bTzRx?usp=sharing#scrollTo=zkawnYsZ9fRl

What is ETL?

Extraction, transformation and loading (ETL) is one of the major workflows in the field of data engineering.

Usually, ETL involves integration of various sources of data into a single, usable and centralized data warehouse for different purposes such as insight analysis or business intelligence.

Extraction

Data extraction is the first process in ETL that involves retrieving data from multiple sources for further processing, storage or…

Jackson Kek

Computer Science Student | Aspiring Data Scientist | I Post what I Practice

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store