Get in Touch

Course Outline

Big Data Overview:

  • Definition of Big Data
  • Reasons behind the growing popularity of Big Data
  • Case studies on Big Data
  • Key characteristics of Big Data
  • Solutions for managing Big Data

Hadoop and Its Components:

  • Understanding Hadoop and its core components
  • Hadoop architecture and the characteristics of data it can handle and process
  • A brief history of Hadoop, including companies that use it and the motivations behind their adoption
  • Detailed explanation of the Hadoop framework and its components
  • Explanation of HDFS and the read/write operations within the Hadoop Distributed File System
  • Procedures for setting up a Hadoop cluster in various modes: standalone, pseudo-distributed, and multi-node cluster

(This section covers setting up a Hadoop cluster on VirtualBox, KVM, or VMware, addressing critical network configurations, running Hadoop daemons, and testing the cluster).

  • Overview of the MapReduce framework and its operational mechanisms
  • Executing MapReduce jobs on a Hadoop cluster
  • Concepts of replication, mirroring, and rack awareness within Hadoop clusters

Hadoop Cluster Planning:

  • Strategies for planning your Hadoop cluster
  • Aligning hardware and software requirements for effective cluster planning
  • Analyzing workloads to plan a cluster that prevents failures and ensures optimal performance

Introduction to MapR and Its Advantages:

  • Overview of MapR and its architecture
  • Understanding and working with the MapR Control System, MapR Volumes, snapshots, and mirrors
  • Planning a cluster specifically for MapR environments
  • Comparing MapR with other distributions and Apache Hadoop
  • MapR installation and cluster deployment processes

Cluster Setup and Administration:

  • Managing services, nodes, snapshots, mirrored volumes, and remote clusters
  • Understanding and managing nodes
  • Understanding Hadoop components and installing them alongside MapR services
  • Accessing cluster data, including via NFS, and managing services and nodes
  • Managing data through volumes, user and group management, role assignment to nodes, node commissioning and decommissioning, cluster administration, performance monitoring, configuring and analyzing performance metrics, and administering MapR security
  • Understanding and working with M7 native storage for MapR tables
  • Configuring and tuning the cluster for optimal performance

Cluster Upgrade and Integration with Other Setups:

  • Upgrading the MapR software version and types of upgrades
  • Configuring the MapR cluster to access an HDFS cluster
  • Setting up a MapR cluster on Amazon Elastic MapReduce

All topics include demonstrations and practice sessions to provide learners with hands-on experience with the technology.

Requirements

  • Fundamental knowledge of Linux file systems
  • Basic Java programming skills
  • Familiarity with Apache Hadoop (recommended)
 28 Hours

Number of participants


Price per participant

Testimonials (1)

Upcoming Courses

Related Categories