Spark Fundamentals

Solid understanding and experience, with core tools, in any field promotes excellence and innovation. Apache Spark, as a general engine for large scale data processing, is such a tool within the big data realm. This learning path addresses the fundamentals of this program's design and its application in the everyday.

About this learning path

Ever waited over night to run a report and to come back to your computer in the morning to find it still running. When the heat is on and you have a deadline, something is not working.  With larger and larger data sets you need to be fluent in the right tools to be able to make your commitments. This learning path is your opportunity to learn from industry leaders about Spark. This path provides hands on opportunities and projects to build your confidence within this tool set.

Come along and start your journey to receiving the following badges: Spark – Level 1 and Spark – Level 2.

Courses

Spark Fundamentals I

Effort: 5 hours
Level: Beginner
Available in: English
About the course

Ignite your interest in Spark with an introduction to the core concepts that make this general processor an essential tool set for working with Big Data.

Spark Fundamentals II

Effort: 4 Hours
Level: Intermediate
Available in: English
About the course

Building on your foundational knowledge of Spark, take this opportunity to move your skills to the next level. With a focus on Spark Resilient Distributed Data Set operations this course exposes you to concepts that are critical to your success in this field.

Spark MLlib

Effort: 5 Hours
Level: Beginner
Available in: English
About the course

Spark provides a machine learning library known as MLlib. Spark MLlib provides various machine learning algorithms such as classification, regression, clustering, and collaborative filtering. It also provides tools such as featurization, pipelines, persistence, and utilities for handling linear algebra operations, statistics and data handling.

Exploring Spark’s GraphX

Effort: 5 Hours
Level: Beginner
Available in: English
About the course

Spark provides a graph-parallel computation library in GraphX. Graph-parallel is a paradigm that allows representation of your data as vertices and edges. Spark GraphX provides a set of fundamental operators in addition to a growing collection of algorithms and builders to simplify graph analytics tasks.

Analyzing Big Data in R using Apache Spark

Effort: 4 Hours
Level: Beginner
Available in: English
About the course

A single firework being launched in the night sky is only so effective, but when clustered with together, magic happens and the sky comes to life. In this course, learn how to apply a similar type of magic when working with Apache Spark for analyzing big data in r. Don't blink or you might miss it!

Complete Spark Fundamentals Learning path

Our learning paths are designed to build on the content learned in the first course and then build upon the concepts in courses that follow. We recommend that they are completed in the order outlined in this learning path to ensure you get the most out of your investment of time. If you like what you see here, come and discover other learning paths and browse our course catalog.