Exam Sections
These are the current exam sections and the
percentage of the exam devoted to these topics.
1. Core Hadoop Concepts (CCD-410:25% |
CCD-470: 33%)
Objectives
·
Recognize and identify Apache Hadoop
daemons and how they function both in data storage and processing under both
CDH3 and CDH4.
·
Understand how Apache Hadoop exploits data
locality, including rack placement policy.
·
Given a big data scenario, determine the
challenges to large-scale computational models and how distributed systems
attempt to overcome various challenges posed by the scenario.
·
Identify the role and use of both MapReduce
v1 (MRv1) and MapReduce v2 (MRv2 / YARN) daemons.
Section Study Resources
· CDH4 update including MapReduce v2 (MRv2).
2. Storing Files in Hadoop (7%)
Objectives
· Analyze the benefits and challenges of the HDFS
architecture
· Analyze how HDFS implements file sizes, block sizes, and
block abstraction.
· Understand default replication values and storage
requirements for replication.
· Determine how HDFS stores, reads, and writes files.
· Given a sample architecture, determine how HDFS handles
hardware failure.
Section Study Resources
· Hadoop: The Definitive Guide, 3rd edition: Chapter 3
· Hadoop Operations: Chapter 2
· Hadoop in Practice: Appendix C: HDFS Dissected
3. Job Configuration and Submission (7%)
Objectives
· Construct proper job configuration parameters
· Identify the correct procedures for MapReduce job
submission.
· How to use various commands in job submission
Section Study Resources
· Hadoop: The Definitive Guide, 3rd Edition: Chapter 5
4. Job Execution Environment (10%)
Objectives
· Given a MapReduce job, determine the lifecycle of a
Mapper and the lifecycle of a Reducer.
· Understand the key fault tolerance principles at work in
a MapReduce job.
· Identify the role of Apache Hadoop Classes, Interfaces,
and Methods.
· Understand how speculative execution exploits differences
in machine configurations and capabilities in a parallel environment and how
and when it runs.
Section Study Resources
· Hadoop in Action: Chapter 3
· Hadoop: The Definitive Guide, 3rd Edition: Chapter 6
5. Input and Output (6%)
Objectives
· Given a sample job, analyze and determine the correct
InputFormat and OutputFormat to select based on job requirements.
· Understand the role of the RecordReader, and of sequence
files and compression.
Section Study Resources
· Hadoop: The Definitive Guide, 3rd Edition: Chapter 7
· Hadoop in Action: Chapter 3
· Hadoop in Practice: Chapter 3
6. Job Lifecycle (18%)
Objectives
· Analyze the order of operations in a MapReduce job.
· Analyze how data moves through a job.
· Understand how partitioners and combiners function, and
recognize appropriate use cases for each.
· Recognize the processes and role of the the sort and
shuffle process.
Section Study Resources
· Hadoop: The Definitive Guide, 3rd Edition: Chapter 6
· Hadoop in Practice: Techniques in section 6.4
Two blog posts from Philippe Adjiman’s
Hadoop Tutorial Series
7. Data processing (6%)
Objectives
· Analyze and determine the relationship of input keys to
output keys in terms of both type and number, the sorting of keys, and the
sorting of values.
· Given sample input data, identify the number, type, and
value of emitted keys and values from the Mappers as well as the emitted data
from each Reducer and the number and contents of the output file(s).
Section Study Resources
· Hadoop: The Definitive Guide, 3rd Edition: Chapter 7 on
Input Formats and Output Formats
· Hadoop in Practice: Chapter 3
8. Key and Value Types (6%)
Objectives
· Given a scenario, analyze and determine which of Hadoop’s
data types for keys and values are appropriate for the job.
· Understand common key and value types in the MapReduce
framework and the interfaces they implement.
Section Study Resources
· Hadoop: The Definitive Guide, 3rd Edition: Chapter 4
· Hadoop in Practice: Chapter 3
9. Common Algorithms and Design Patterns
(7%)
Objectives
· Evaluate whether an algorithm is well-suited for
expression in MapReduce.
· Understand implementation and limitations and strategies
for joining datasets in MapReduce.
· Analyze the role of DistributedCache and Counters.
Section Study Resources
· Hadoop: The Definitive Guide, 3rd Edition: Chapter 8
· Hadoop in Practice: Chapter 4, 5, 7
· Hadoop in Action: Chapter 5.2
10. The Hadoop Ecosystem (8%)
Objectives
· Analyze a workflow scenario and determine how and when to
leverage ecosystems projects, including Apache Hive, Apache Pig, Sqoop and
Oozie.
· Understand how Hadoop Streaming might apply to a job
workflow.
Section Study Resources
· Hadoop: The Definitive Guide, 3rd Edition: Chapters 11,
12, 14, 15
· Hadoop in Practice: Chapters 10, 11
· Hadoop in Action: Chapters 10, 11
· Each project in the Hadoop ecosystem has at least one
book devoted to it. The exam scope does not require deep knowledge of programming
in Hive, Pig, Sqoop, Cloudera Manager, Flume, etc. rather how those projects
contribute to an overall big data ecosystem.
Nice blog has been shared by you. before i read this blog i didn't have any knowledge about this but now i got some knowledge.
ReplyDeleteso keep on sharing such kind of an interesting blogs. BigData Course in Delhi
Very nice blog,keep sharing more blogs with us.
ReplyDeletethank you....
best online training for big data and hadoop