❗The content presented here is sourced directly from Udemy platform. For comprehensive course details, including enrollment information, simply click on the 'Go to class' link on our website.
Updated in [September 05th, 2023]
Skills and Knowledge Acquisition:
Participants in the "Data Engineering using Kafka and Spark Structured Streaming" course will acquire the following skills and knowledge:
Environment Setup: Learn how to set up a self-supported lab environment with Hadoop, Hive, Spark, and Kafka on a single-node Linux-based system, providing the foundation for data engineering tasks.
Kafka Fundamentals: Gain a deep understanding of Kafka, including creating Kafka topics, producing and consuming messages, and using Kafka Connect for data ingestion from web server logs into Kafka topics.
Data Ingestion: Explore data ingestion processes, including ingesting data from web server logs into Kafka topics and ingesting data from Kafka topics into HDFS as a sink.
Spark Structured Streaming: Understand the key concepts of Spark Structured Streaming, a powerful framework for real-time data processing.
Streaming Pipeline Development: Develop streaming pipelines that consume data from Kafka topics using Spark Structured Streaming, process the data, and write it to different target destinations.
Incremental Data Processing: Learn how to handle incremental data processing efficiently using Spark Structured Streaming.
Course Contribution to Professional Growth:
This course offers significant contributions to professional growth:
Data Engineering Proficiency: Participants will become proficient data engineers capable of building streaming data pipelines, a skill in high demand across industries.
Hands-on Experience: The course provides hands-on experience in setting up the environment and working with Kafka and Spark Structured Streaming, enhancing practical skills.
Real-world Application: Learning to build streaming pipelines prepares professionals for real-world data engineering tasks, making them valuable contributors to data-centric projects.
Problem-Solving Skills: Participants will develop problem-solving skills related to data engineering challenges and gain the ability to design and implement efficient data processing solutions.
Suitability for Preparing Further Education:
The "Data Engineering using Kafka and Spark Structured Streaming" course is suitable for individuals preparing for further education or seeking to deepen their knowledge in the field of data engineering:
Graduate Studies: Students pursuing advanced degrees in data engineering, computer science, or related fields can use this course as a foundation for deeper exploration of data engineering technologies.
Certification: Those planning to pursue certifications related to data engineering or real-time data processing can benefit from this course as a preparation resource.
Professional Development: IT professionals looking to expand their knowledge of data engineering, Kafka, and Spark Structured Streaming can use this course to enhance their expertise and prepare for further career advancement.
Course Syllabus
Introduction
Getting Started with Kafka
Data Ingestion using Kafka Connect
Overview of Spark Structured Streaming
Kafka and Spark Structured Streaming Integration
Incremental Loads using Spark Structured Streaming
Setting up Environment using AWS Cloud9
Setting up Environment - Overview of GCP and Provision Ubuntu VM
Setup Single Node Hadoop Cluster
Setup Hive and Spark
Setup Single Node Kafka Cluster