Cloudera Analyst and Engineer

DATE

Duration

LOCATION

FEES

Book Now

27 May
- 31 May 2024

5 Days

Virtual Online

$3,450

10 Nov
- 14 Nov 2024

5 Days

Virtual Online

$3,450

This course is designed to provide participants with the intricacies of the Cloudera platform, mastering the art of data processing, management, analysis, and visualization. From understanding the fundamental components of Cloudera’s architecture to wielding advanced data engineering and machine learning techniques.

By the end of the course, you‘ll be able to:

  • How the open-source ecosystem of big data tools addresses challenges not met by traditional RDBMSs
  • Using Apache Hive and Apache Impala to provide SQL access to data
  • Hive and Impala syntax and data formats, including functions and subqueries
  • Create, modify, and delete tables, views, and databases; load data; and store results of queries
  • Create and use partitions and different file formats
  • Combining two or more datasets using JOIN or UNION, as appropriate
  • What analytic and windowing functions are, and how to use them
  • Store and query complex or nested data structures
  • Process and analyze semi-structured and unstructured data

This course is made for

  • Data analysts
  • Business intelligence specialists
  • Developers
  • System architects
  • Database administrators

Day One

  • The Motivation for Hadoop
  • Hadoop Overview
  • Data Storage: HDFS
  • Distributed Data Processing: YARN, MapReduce, and Spark
  • Data Processing and Analysis: Pig, Hive, and Impala

Day Two

  • Database Integration: Sqoop
  • Other Hadoop Data Tools
  • What Is Hive?
  • What Is Impala?
  • Why Use Hive and Impala?
  • Schema and Data Storage

Day Three

  • Databases and Tables
  • Basic Hive and Impala Query Language Syntax
  • Data Types
  • Using Hue to Execute Queries
  • Using Beeline (Hive’s Shell)

Day Four

  • Operators
  • Scalar Functions
  • Aggregate Functions
  • Creating Databases and Tables
  • Loading Data
  • Altering Databases and Tables

Day Five

  • Partitioning Tables
  • Loading Data into Partitioned Tables
  • When to Use Partitioning
  • Choosing a File Format
  • UNION and Joins
  • Other Analytic Functions
  • Using Regular Expressions with Hive and Impala
  • Processing Text Data with SerDes in Hive
  • Sentiment Analysis and n-grams
Training Subject
Training Location