Get in Touch

Course Outline

Introduction to Data Analysis and Big Data

  • What Defines Big Data as "Big"?
    • Velocity, Volume, Variety, Veracity (VVVV)
  • Limits of Traditional Data Processing
  • Distributed Processing
  • Statistical Analysis
  • Types of Machine Learning Analysis
  • Data Visualization

Big Data Roles and Responsibilities

  • Administrators
  • Developers
  • Data Analysts

Languages Used for Data Analysis

  • R Language
    • Why R for Data Analysis?
    • Data manipulation, calculation, and graphical display
  • Python
    • Why Python for Data Analysis?
    • Manipulating, processing, cleaning, and crunching data

Approaches to Data Analysis

  • Statistical Analysis
    • Time Series analysis
    • Forecasting using Correlation and Regression models
    • Inferential Statistics (estimation)
    • Descriptive Statistics in Big Data sets (e.g., calculating means)
  • Machine Learning
    • Supervised vs. unsupervised learning
    • Classification and clustering
    • Estimating the cost of specific methods
    • Filtering
  • Natural Language Processing
    • Processing text
    • Understanding the meaning of the text
    • Automatic text generation
    • Sentiment analysis / topic analysis
  • Computer Vision
    • Acquiring, processing, analyzing, and understanding images
    • Reconstructing, interpreting, and understanding 3D scenes
    • Using image data to inform decisions

Big Data Infrastructure

  • Data Storage
    • Relational databases (SQL)
      • MySQL
      • Postgres
      • Oracle
    • Non-relational databases (NoSQL)
      • Cassandra
      • MongoDB
      • Neo4j
    • Understanding the nuances
      • Hierarchical databases
      • Object-oriented databases
      • Document-oriented databases
      • Graph-oriented databases
      • Others
  • Distributed Processing
    • Hadoop
      • HDFS as a distributed filesystem
      • MapReduce for distributed processing
    • Spark
      • Comprehensive in-memory cluster computing framework for large-scale data processing
      • Structured streaming
      • Spark SQL
      • Machine Learning libraries: MLlib
      • Graph processing with GraphX
  • Scalability
    • Public cloud
      • AWS, Google, Aliyun, etc.
    • Private cloud
      • OpenStack, Cloud Foundry, etc.
    • Auto-scalability

Selecting the Appropriate Solution for the Problem

The Future of Big Data

Summary and Next Steps

Requirements

  • A foundational understanding of mathematics
  • A foundational understanding of programming
  • A foundational understanding of databases

Target Audience

  • Developers / Programmers
  • IT Consultants
 35 Hours

Number of participants


Price per participant

Testimonials (7)

Upcoming Courses

Related Categories