❗The content presented here is sourced directly from Coursera platform. For comprehensive course details, including enrollment information, simply click on the 'Go to class' link on our website.
Updated in [June 30th, 2023]
This course, Big Data Integration and Processing, provides an introduction to the fundamentals of data integration and processing in the context of big data. Upon completion of the course, students will be able to retrieve data from example databases and big data management systems, describe the connections between data management operations and the big data processing patterns needed to utilize them in large-scale analytical applications, identify when a big data problem needs data integration, and execute simple big data integration and processing on Hadoop and Spark platforms.
No prior programming experience is necessary to take this course, although the ability to install applications and utilize a virtual machine is necessary to complete the hands-on assignments. Hardware requirements include a quad core processor (VT-x or AMD-V support recommended), 64-bit, 8 GB RAM, and 20 GB disk free. Software requirements include Windows 7+, Mac OS X 10.10+, Ubuntu 14.04+ or CentOS 6+ VirtualBox 5+. A high speed internet connection is also necessary as students will be downloading files up to 4 Gb in size. Completion of Intro to Big Data is recommended.
[Applications]
Upon completion of this course, students should be able to apply the knowledge and skills acquired to integrate and process big data. They should be able to identify when a big data problem needs data integration and execute simple big data integration and processing on Hadoop and Spark platforms. Additionally, students should be able to retrieve data from example databases and big data management systems, as well as describe the connections between data management operations and the big data processing patterns needed to utilize them in large-scale analytical applications.
[Career Path]
Job Position Path:Big Data Integration and Processing Engineer
Description:Big Data Integration and Processing Engineers are responsible for designing, developing, and maintaining data integration and processing solutions for large-scale analytical applications. They must have a deep understanding of data management operations and the big data processing patterns needed to utilize them. They must be able to retrieve data from example databases and big data management systems, and execute simple big data integration and processing on Hadoop and Spark platforms.
Development Trend:The demand for Big Data Integration and Processing Engineers is expected to grow significantly in the coming years as businesses increasingly rely on data-driven decision making. Companies are investing heavily in big data technologies, and the need for skilled professionals to manage and process this data is growing. As the technology continues to evolve, Big Data Integration and Processing Engineers will need to stay up to date with the latest trends and technologies in order to remain competitive.
[Education Path]
The recommended educational path for learners interested in Big Data Integration and Processing is a Bachelor's degree in Computer Science or a related field. This degree will provide learners with the foundational knowledge and skills needed to understand and work with big data. The curriculum typically includes courses in programming, data structures, algorithms, databases, operating systems, computer networks, and software engineering. Additionally, courses in mathematics, statistics, and machine learning are often included.
The development trend for this degree is to focus on the application of big data technologies, such as Hadoop and Spark, to solve real-world problems. This includes courses in data mining, data visualization, and data analytics. Additionally, courses in cloud computing, distributed systems, and artificial intelligence are becoming increasingly important. As the field of big data continues to evolve, the curriculum of this degree will continue to adapt to the changing needs of the industry.
Course Syllabus
Welcome to Big Data Integration and Processing
Welcome to the third course in the Big Data Specialization. This week you will be introduced to basic concepts in big data integration and processing. You will be guided through installing the Cloudera VM, downloading the data sets to be used for this course, and learning how to run the Jupyter server. Retrieving Big Data (Part 1)
This module covers the various aspects of data retrieval and relational querying. You will also be introduced to the Postgres database. Retrieving Big Data (Part 2)
This module covers the various aspects of data retrieval for NoSQL data, as well as data aggregation and working with data frames. You will be introduced to MongoDB and Aerospike, and you will learn how to use Pandas to retrieve data from them.Big Data Integration
In this module you will be introduced to data integration tools including Splunk and Datameer, and you will gain some practical insight into how information integration processes are carried out. Processing Big Data
This module introduces Learners to big data pipelines and workflows as well as processing and analysis of big data using Apache Spark. Big Data Analytics using Spark
In this module, you will go deeper into big data processing by learning the inner workings of the Spark Core. You will be introduced to two key tools in the Spark toolkit: Spark MLlib and GraphX. Learn By Doing: Putting MongoDB and Spark to Work
In this module you will get some practical hands-on experience applying what you learned about Spark and MongoDB to analyze Twitter data.