Address
Sector 67, Gurugram
Email
query@onlydata.in
Contact
+91 9216977371
Enroll
Home
About Us
Courses
Blog
contact
Course Details
Home
Course Details
Big data with Hadoop and spark
Duration
65 hrs
Enroll
Learning Outcomes
Learn how to navigate the Hadoop ecosystem and understand how to optimize its use
Ingest data using Sqoop, Flume, and Kafka
Implement partitioning, bucketing, and indexing in Hive
Work with RDD in Apache Spark
Implement User-Defined Functions (UDF) and User-Defined Attribute Functions (UDAF) in Spark
Perform DataFrame operations in Spark using SQL queries
Process real-time streaming data
Topics Covered
Introduction to Big Data and Hadoop
Hadoop Distributed File System
Fundamentals of MapReduce
Building a MapReduce Program
Advanced MapReduce
Distributed Cache and Job Chaining
Hadoop Scheduler
Introduction to Apache Hive
Hive Architecture and Data Types
Hive Data Serialization and Optimization
Pig – Data Analysis Tool
Data Ingestion
Data Lake vs. Data Warehouse
Apache Kafka Architecture
Apache Flume
Introduction to YARN
YARN Infrastructure and Architecture
Hive Analytics UDF and UDAF
Functions, OOPS, and Modules in Python
Apache Spark Framework
Introduction to PySpark Shell
Working with Spark RDDs
Introduction to Machine Learning
Machine Learning using Spark ML
Applications of Machine Learning
Introduction to Spark ML
Traditional Computing Methods and Its Drawbacks
Real-Time Processing of Big Data
Data Processing Architectures
Introduction to DStreams
Introduction to Spark Structured Streaming
Structured Streaming APIs
Introduction to Spark GraphX
Algorithms in Spark
Pregel API
GraphFrames
X