Thank you for sending your enquiry! One of our team members will contact you shortly.
Thank you for sending your booking! One of our team members will contact you shortly.
Course Outline
1: HDFS (17%)
- Explain the roles of HDFS Daemons.
- Describe the standard operation of an Apache Hadoop cluster regarding both data storage and processing.
- Recognize key characteristics of modern computing systems that necessitate a solution like Apache Hadoop.
- Outline the primary objectives of HDFS Design.
- Select appropriate use cases for HDFS Federation based on specific scenarios.
- Identify the components and daemons required for an HDFS HA-Quorum cluster.
- Evaluate the role of HDFS security mechanisms, specifically Kerberos.
- Select the optimal data serialization method for a given scenario.
- Describe the pathways for file read and write operations.
- Identify commands for manipulating files using the Hadoop File System Shell.
2: YARN and MapReduce version 2 (MRv2) (17%)
- Comprehend the impact of upgrading a cluster from Hadoop 1 to Hadoop 2 on cluster settings.
- Understand the deployment of MapReduce v2 (MRv2 / YARN), including all associated YARN daemons.
- Grasp the fundamental design strategy of MapReduce v2 (MRv2).
- Determine how YARN manages resource allocations.
- Identify the workflow of a MapReduce job executing on YARN.
- Identify necessary file modifications to migrate a cluster from MapReduce version 1 (MRv1) to MapReduce version 2 (MRv2) on YARN.
3: Hadoop Cluster Planning (16%)
- Key considerations when selecting hardware and operating systems for hosting an Apache Hadoop cluster.
- Analyze options for selecting an operating system.
- Understand kernel tuning and disk swapping mechanisms.
- Identify hardware configurations suitable for a given scenario and workload pattern.
- Determine the ecosystem components required for a cluster to meet SLA requirements in a given scenario.
- Cluster Sizing: Identify workload specifics, including CPU, memory, storage, and disk I/O, based on a scenario and execution frequency.
- Disk Sizing and Configuration: Understand JBOD versus RAID, SANs, virtualization, and disk sizing requirements within a cluster.
- Network Topologies: Understand network usage in Hadoop (for HDFS and MapReduce) and propose or identify essential network design components for a given scenario.
4: Hadoop Cluster Installation and Administration (25%)
- Identify cluster resilience against disk and machine failures in a given scenario.
- Analyze logging configuration and the format of logging configuration files.
- Understand the fundamentals of Hadoop metrics and cluster health monitoring.
- Identify the functions and purposes of available cluster monitoring tools.
- Install all ecosystem components in CDH 5, including (but not limited to): Impala, Flume, Oozie, Hue, Manager, Sqoop, Hive, and Pig.
- Identify the functions and purposes of available tools for managing the Apache Hadoop file system.
5: Resource Management (10%)
- Understand the overarching design goals of each Hadoop scheduler.
- Determine how the FIFO Scheduler allocates cluster resources in a given scenario.
- Determine how the Fair Scheduler allocates cluster resources under YARN in a given scenario.
- Determine how the Capacity Scheduler allocates cluster resources in a given scenario.
6: Monitoring and Logging (15%)
- Understand the functions and features of Hadoop’s metric collection capabilities.
- Analyze the NameNode and JobTracker Web UIs.
- Understand methods for monitoring cluster Daemons.
- Identify and monitor CPU usage on master nodes.
- Describe how to monitor swap and memory allocation on all nodes.
- Identify methods to view and manage Hadoop’s log files.
- Interpret log files effectively.
Requirements
- Foundational Linux administration skills
- Basic programming proficiency
35 Hours
Testimonials (3)
I genuinely enjoyed the many hands-on sessions.
Jacek Pieczatka
Course - Administrator Training for Apache Hadoop
I genuinely enjoyed the big competences of Trainer.
Grzegorz Gorski
Course - Administrator Training for Apache Hadoop
I mostly liked the trainer giving real live Examples.