Friday, April 25, 2014

Complex Hadoop Interview Question

Is Hadoop designed for real-time systems?

No, Hadoop was initially designed for batch processing. That means, take a large dataset in input all at once, process it, and write a large output. The very concept of MapReduce is geared towards batch and not real-time. But to be honest, this was only the case at Hadoop's beginning, and now you have plenty of opportunities to use Hadoop in a more real-time way.

First I think it's important to define what you mean by real-time. It could be that you're interested in stream processing, or could also be that you want to run queries on your data that return results in real-time.

For stream processing on Hadoop, natively Hadoop won't provide you with this kind of capabilities, but you can integrate some other projects with Hadoop easily:

Storm-YARN allows you to use Storm on your Hadoop cluster via YARN.
Spark integrates with HDFS to allow you to process streaming data in real-time.

For real-time queries there are also several projects which use Hadoop:

Impala from Cloudera uses HDFS but bypasses MapReduce altogether because there's too much overhead otherwise.
Apache Drill is another project that integrates with Hadoop to provide real-time query capabilities.
The Stinger project aims to make Hive itself more real-time.

There are probably other projects that would fit into the list of "Making Hadoop real-time", but these are the most well-known ones.

So as you can see, Hadoop is going more and more towards the direction of real-time and, even if it wasn't designed for that, you have plenty of opportunities to extend it for real-time purposes.

Type of table in Hive :

How can we optimize Hive tables....

How can we optimize MapReduce job....

What kind of data you will have ...

What is the size of cluster ?

What is the size of data ?

What is Distributed Cache in mapreduce framework?

Distributed cache is an important feature provide by map reduce framework. Distributed cache can cache text, archive, jars which could be used by application to improve performance. Application provide details of file to jobconf object to cache. Mapreduce framework would copy the specified file to data node before processing the job. Framework copy file only once for each job, and has the ability of archival. Application needs to specify the file path via http:// or hdfs:// to cache.

Hbase vs RDBMS
HBase is a database but has totally different implementation in comparison to RDBMS. HBase is a distributed, column-oriented, versioned data storage system.It become a hadoop eco system project and helps hadoop to over come with challenges in random read and write. HDFS is underneath layer for HBase and provides fault tolerance, linear scalability. saves data in key value pair. Has built in support for dynamically adding column in table schema of preexisting column family.HBase is not relational and does not support SQL

RDBMS. follows codd’s 12 rule. RDBMS are designed to follow strictly fixed schema. These are row oriented databases and does not natively designed for distributed scalability. RDBMS welcomes secondary index and improvise in data retrieval through SQL language. RDBMS has very good and easy support of complex joins and aggregate functions

What is map side join and reduce side join?`
Two different large data can be joined in map reduce programming also. Joins in Map phase refers as Map side join, while join at reduce side called as reduce side join. Lets go in detail, Why we would require to join the data in map reduce. If one Dataset A has master data and B has sort of transactional data(A & B are just for reference). we need to join them on a coexisting common key for a result. It is important to realize that we can share data with side data sharing techniques(passing key value pair in job configuration /distribution caching) if master data set is small. we will use map-reduce join only when we have both dataset is too big to use data sharing techniques.
Joins at Map Reduce is not recommended way. Same problem can be addressed through high level frameworks like Hive or cascading. even if you are in situation then we can use below mentioned method to join.

Map side Join
Joining at map side performs the join before data reached to map. function It expects a strong prerequisite before joining data at map side.

1.Data should be partitioned and sorted in particular way.
2.Each input data should be divided in same number of partition.
3.Must be sorted with same key.
4.All the records for a particular key must reside in the same partition.

What is shuffleing in mapreduce?
Once map tasks started to complete, A communication from reducers is started. where map output sents to reducer, which is looking for the output data to process. at same time data nodes are still process multiple other tasks. The data transfer of mappers output to reducer known as shuffling.

What is partitioning?
Partitioning is a process to identify the reducer instance which would be used to supply the mappers output. Before mapper emits the data (Key Value) pair to reducer, mapper identify the reducer as an recipient of mapper output. All the key, no matter which mapper has generated this, must lie with same reducer.

Difference between Hive managed tables vs External tables
Hive managed tables are completely managed by hive, Hive creates a copy of table(data source) in their own data warehouse and at time of removing hive it self is responsible of removing this file from warehouse.In counter of managed table,external table directly are created by hive using External keyword at the time of table creation and does not copy any data in warehouse. During drop table data would remain intact.

External Tables: An external table refers to the data that is outside of the warehouse directory.
CREATE EXTERNAL TABLE ( col string)
LOCATION ‘/user/husr/’;
LOAD DATA INPATH ‘/user/husr/data.txt’ INTO ;

In case of external tables, Hive does not move the data into its warehouse directory. If the external table is dropped, then the table metadata is deleted but not the data.
Note: Hive does not check whether the external table location exists or not at the time the external table is created.

Normal Tables: Hive manages the normal tables created and moves the data into its warehouse directory.
As an example, consider the table creation and loading of data into the table.
CREATE TABLE (col string);

LOAD DATA INPATH ‘/user/husr/data.txt’ INTO TABLE ;

49 comments:

UnknownOctober 12, 2014 at 12:08 AM
Thanks. Will keep you posted new articles
ReplyDelete
Replies
UnknownAugust 10, 2015 at 11:15 PM
This comment has been removed by a blog administrator.
ReplyDelete
Replies
UnknownFebruary 1, 2016 at 5:40 AM
Excellent Post, I welcome your interest about to post blogs. It will help many of them to update their skills in their interesting field.
Regards,

SAS Training in Chennai|SAS Institutes in Chennai
ReplyDelete
Replies
UnknownApril 20, 2016 at 7:56 AM
The Author did a great job! Nice work. It will helpful for who are looking for Hadoop Interview Questions. But it’s in advanced level. Suppose if you’re looking for beginner as well as advanced level then just have a look: https://goo.gl/rVWW8g
ReplyDelete
Replies
UnknownJuly 11, 2016 at 3:14 AM
Nice collection of questions thank you for sharing. Know more about Big Data Hadoop Training
ReplyDelete
Replies
UnknownJuly 11, 2016 at 3:14 AM
Nice collection of questions thank you for sharing. Know more about Big Data Hadoop Training
ReplyDelete
Replies
Mehgna SharmaMay 19, 2017 at 12:20 AM
Thanks for Sharing. your articles is very clear and informative.
Web Designing training in noida | SAS Summer Training in Noida | Java Training in Noida
ReplyDelete
Replies
TejutejuMay 3, 2018 at 10:27 PM
Thank you.Well it was nice post and very helpful information onB Big Data Hadoop Online Training Hyderabad
ReplyDelete
Replies
Anoushka SakthiOctober 4, 2018 at 10:30 PM
The strategy you followed on this technology helped me to get to the next level and had a lot of information in it.
Digital Marketing Chennai
Digital Marketing Training in Chennai
Digital Marketing Chennai
Selenium Training
Hadoop Training in Chennai
Big Data Training
JAVA Training in Chennai

ReplyDelete
Replies
akshayaOctober 5, 2018 at 3:53 AM
Nice article I was really impressed by seeing this blog, it was very interesting and it is very useful for me.

Japanese Classes in Chennai
French Class in Chennai
Japanese Language Classes in Chennai
Spanish Institute in Chennai
Japanese Language Course in Chennai
German Courses in chennai
Japanese Course in Chennai
German Language Course in Chennai
Japanese Institute in Chennai
Japanese Coaching Classes in Chennai
Japanese Training in Chennai
ReplyDelete
Replies
mercyroyOctober 5, 2018 at 10:15 PM
Nice way of expressing your ideas with us.
thanks for sharing with us and please add more informations
AWS Course in Bangalore
AWS Course in Anna Nagar
AWS Certification Training in T nagar
ReplyDelete
Replies
HemapriyaOctober 5, 2018 at 10:17 PM
This comment has been removed by the author.
ReplyDelete
Replies
Aruna RamOctober 5, 2018 at 10:31 PM
Amazing Article, thank you!.I am very glad to read your informative blog. Kindly keep updating your blog.
Big Data Hadoop Training in Tnagar
Big Data Hadoop Training in Nungambakkam
Big Data Hadoop Training in Saidapet
Big Data Hadoop Training in Amjikarai
Big Data Hadoop Training in Vadapalani
ReplyDelete
Replies
HemapriyaOctober 5, 2018 at 10:56 PM
This blog is very much helpful to us. Thanks for your information.
SAS Training Chennai
SAS Training Institute in Chennai
SAS Courses in Chennai
SAS Training Center in Chennai
SAS Analytics Training in Chennai
ReplyDelete
Replies
Anbarasan14October 5, 2018 at 11:53 PM
Thanks for giving a detailed idea. This was very helpful to me. kindly keep continuing the great work.

IELTS Classes in T Nagar | IELTS Classes in Chennai Valasaravakkam | IELTS Classes in Chennai Nungambakkam | IELTS Training Institute in T Nagar | IELTS Classes in KK Nagar | IELTS Classes in Chennai Ashok Nagar
ReplyDelete
Replies
pavithra dassOctober 6, 2018 at 2:13 AM
This is an awesome post.Really very informative and creative contents. These concept is a good way to enhance the knowledge.I like it and help me to development very well.Thank you for this brief explanation and very nice information.Well, got a good knowledge.
Ethical Hacking Course in Chennai
Hacking Course in Chennai
Ethical Hacking Training in Chennai
Certified Ethical Hacking Course in Chennai
Ethical Hacking Course
ReplyDelete
Replies
TejutejuOctober 7, 2018 at 9:47 PM
Nice post ! Thanks for sharing valuable information with us. Keep sharing..Big Data Hadoop Online Training Bangalore
ReplyDelete
Replies
AnonymousOctober 7, 2018 at 10:18 PM
Thank you for providing this wonderful information. Keep up the good work.

Oracle Training institute in chennai| Oracle Training in Chennai | Oracle course in Chennai | Oracle Training | Oracle Certification in Chennai
ReplyDelete
Replies
Aruna RamOctober 7, 2018 at 11:15 PM
Amazing Article, thank you!.I am very glad to read your informative blog. Kindly keep updating.
CCNA Course in Tambaram
CCNA Training in Tambaram
CCNA Training in Chennai Velachery
CCNA Training in Tnagar
CCNA Training in Saidapet
ReplyDelete
Replies
pavithra dassOctober 8, 2018 at 9:47 PM
Thank you for sharing such great information with us. I really appreciate everything that you’ve done here and am glad to know that you really care about the world that we live in.
Selenium Training in Chennai
Selenium course
Software testing selenium training
Selenium testing training
Selenium Courses in Chennai
Selenium training Chennai
ReplyDelete
Replies
UnknownOctober 9, 2018 at 12:37 AM
Thank you for such an amazing post. Keep sharing this kind of useful information.
Primavera Training in Chennai
Primavera Course in Chennai
Primavera Software Training in Chennai
Best Primavera Training in Chennai
Primavera p6 Training in Chennai
Primavera Coaching in Chennai
Primavera Course
Primavera Training
Primavera p6 Training
ReplyDelete
Replies
sangeetha sathyanOctober 10, 2018 at 11:09 PM
Awesome post, you got the best interview questions and answers for hadoop interview. You’re doing a great job.
ReactJS Training Institutes in Chennai
ReactJS Training in Chennai
ReactJS Certification
ReactJS Training in Adyar
Angularjs Training in Chennai
Angular 6 Training in Chennai
AWS Certification in Chennai
ReplyDelete
Replies
Aruna RamOctober 12, 2018 at 3:00 AM

Really useful information about this,very helful for me. Keep it up.
Machine Learning Training in Tambaram
Machine Learning Training in Chennai Velachery
Machine Learning Training in Saidapet
Machine Learningp Training in Aminjikarai
Machine Learning Training in Vadapalani
ReplyDelete
Replies
Aruna RamOctober 12, 2018 at 10:50 PM
I appreciate you sharing this article. Really thank you! Much obliged.This is one awesome blog article. Much thanks again.
PHP Training in Chennai Velachery
PHP Training in Nungambakkam
PHP Training in Vadapalani
PHP Training in Kandanchavadi
PHP Training in Navalur
PHP Training in Karappakkam
ReplyDelete
Replies
UnknownOctober 19, 2018 at 6:01 AM
It is very excellent blog and useful article thank you for sharing with us, keep posting.

Ethical Hacking
Hacking Course in Chennai
Ethical Hacking Training in Chennai
Certified Ethical HackingCourse in Chennai
ReplyDelete
Replies
UnknownOctober 20, 2018 at 11:34 PM
Thank you for such amazing post. Keep up the good work.

SAS Training Center in Chennai
SAS Analytics Training in Chennai
Clinical SAS Training in Chennai
SAS Training in Velachery
SAS Courses in Velachery
SAS Training in Tambaram
SAS Training in Adyar
SAS Courses in Adyar

ReplyDelete
Replies
LindaJasmineOctober 31, 2018 at 2:55 AM
Nice Post. Looking for more updates from you. Thanks for sharing.

Pega training in chennai
Pega course in chennai
Pega training institutes in chennai
Pega course
Pega training
Pega certification training

ReplyDelete
Replies
TejutejuNovember 1, 2018 at 5:50 AM
Thank you.Well it was nice post and very helpful information on Big Data Hadoop Online Training

ReplyDelete
Replies
Aruna RamNovember 2, 2018 at 3:43 AM
First of all thank for your great content. It's very useful for improve myself. Keep more updates...
Digital Marketing Classes in Bangalore
Best Digital Marketing Course in Bangalore
Digital Marketing Training in Tnagar
Digital Marketing Training in Nungambakkam
Digital Marketing Training in Kelambakkam
Digital Marketing Training in Karappakkam
ReplyDelete
Replies
LindaJasmineNovember 4, 2018 at 9:07 PM

Great Post. It shows your deep understanding of the topic. Thanks for Posting.

Node JS Training in Chennai
Node JS Course in Chennai
Node JS Advanced Training
Node JS Training Institute in chennai
Node JS Training Institutes in chennai
Node JS Course
ReplyDelete
Replies
Aruna RamNovember 11, 2018 at 8:58 PM
Thank you for such a wonderful post. I really apprecite for your great information. keep posting...
Hadoop Training in Bangalore
Big Data Hadoop Training Bangalore
Big Data Hadoop Course in Bangalore
Big Data Hadoop Training institutes in Bangalore
Big Data Hadoop Training institute in Bangalore
Big Data Hadoop Training in velachery
Big Data Hadoop Course in kandanchavadi
ReplyDelete
Replies
mercyroyNovember 12, 2018 at 9:18 PM
Brilliant ideas that you have share with us.It is really help me lot and i hope it will help others also.update more different ideas with us.
Cloud computing Training institutes in Bangalore
Cloud Computing Training in Thirumangalam
Cloud Computing Training in Vadapalani
Cloud Computing Training in Kelambakkam
ReplyDelete
Replies
Aruna RamNovember 19, 2018 at 9:44 PM
Really good work. your content is very creative, it's very useful for improve my self. Keep it up....
Ethical Hacking Certification in Bangalore
Learn Ethical Hacking in Bangalore
Ethical Hacking Course in Bangalore
Ethical Hacking Classes near me
Ethical Hacking Course in Annanagar
Ethical Hacking Course in Tnagar
Ethical Hacking Course in Chennai
ReplyDelete
Replies
KayalNovember 25, 2018 at 8:51 PM
Your blog is very attractive!!! It's very helpful for improve myself. Thank you for your sharing with great concept.
Web Designing Course in Bangalore
Web Designing Training in Bangalore
Web Development Courses in Bangalore
Web Designing Training in Tnagar
Web Designing Training in Velachery
Web Designing Course in Omr
Web Designing Training in Tambaram
ReplyDelete
Replies
Aruna RamNovember 26, 2018 at 9:18 PM
Very fantastic idea... This post is very impressed to me and it's very useful info. Thanks for sharing with us.
Tableau Certification in Bangalore
Tableau Training Institutes in Bangalore
Tableau Classes in Bangalore
Tableau Coaching in Bangalore
Tableau Training in Bangalore
Tableau Course in Bangalore
ReplyDelete
Replies
VenuBharath2010@gmail.comDecember 1, 2018 at 1:20 AM
Extra-Ordinary. The way the blog was written is amazing. Waiting for your next post.
Xamarin Training in Chennai
Xamarin Course in Chennai
Xamarin Training
Xamarin Course
Xamarin Training Course
Xamarin Classes
Best Xamarin Course
Xamarin Training Institute in Chennai
Xamarin Training Institutes in Chennai
ReplyDelete
Replies
Aruna RamDecember 3, 2018 at 9:57 PM
I learn many info from your blog. It's very interesting post and very useful concept. Thanks for your sharing with us..!
Data Science Training in Adyar
Data Science Training in Ambattur
Data Science Course in Perambur
Data Science Training in Tnagar
Data Science Course in Vadapalani
Data Science Training in Nungambakkam
ReplyDelete
Replies
LindaJasmineDecember 21, 2018 at 11:36 PM
Amazing Post. The content is very interesting. Waiting for your future updates.
Xamarin Training in Chennai
Xamarin Course in Chennai
SAS Training in Chennai
SAS Course in Chennai
Informatica Training in Chennai
Informatica course in Chennai
Informatica Training Center Chennai
Best Informatica Training in Chennai

ReplyDelete
Replies
jvimalaJune 19, 2019 at 3:08 AM
Very informative piece of article, this blog has helped me to understand the concept even better.

software testing training in chennai | software testing course in chennai | testing courses in chennai | software testing institute in chennai | software testing training institute in chennai | testing courses in chennai with placement | best software testing training institute in chennai | best software testing institute in chennai
ReplyDelete
Replies
saranJune 9, 2020 at 7:11 AM
" you have been delivering a useful & unique information to our vision.keep blogging..
Digital Marketing Training Course in Chennai | Digital Marketing Training Course in Anna Nagar | Digital Marketing Training Course in OMR | Digital Marketing Training Course in Porur | Digital Marketing Training Course in Tambaram | Digital Marketing Training Course in Velachery

"
ReplyDelete
Replies
deivaJune 19, 2020 at 8:45 AM
Great Article...Thanks for sharing the best information of pega interview Q&A.It was so good to read and useful to improve my knowledge as updated one.
Digital Marketing Training Course in Chennai | Digital Marketing Training Course in Anna Nagar | Digital Marketing Training Course in OMR | Digital Marketing Training Course in Porur | Digital Marketing Training Course in Tambaram | Digital Marketing Training Course in Velachery
ReplyDelete
Replies
Radley Co TadJuly 17, 2020 at 6:32 AM
Great Article
big data projects for cse final year students

Java Training in Chennai

Final Year Projects for CSE

Java Training in Chennai
ReplyDelete
Replies
ramAugust 14, 2020 at 8:27 PM

best info article published here thank u so much oracle training in chennai
ReplyDelete
Replies
veeraOctober 7, 2020 at 3:10 AM
Very nice article,keep sharing more info with us.
thank you...
Big data training

Big data hadoop certification
ReplyDelete
Replies
JayaApril 20, 2021 at 6:57 AM

Such a great blog.Thanks for sharing.........
IELTS Coaching in Hyderabad
IELTS Coaching in Bangalore
IELTS Coaching in Pune
IELTS Coaching in Gurgaon
IELTS Coaching in Delhi
ReplyDelete
Replies
UnknownJuly 18, 2021 at 9:59 PM
You should be a piece of a challenge for probably the best website on the web. I will suggest this site!

tech news
ReplyDelete
Replies
tharanDecember 4, 2021 at 1:27 AM
This post is so helpfull and informative.keep updating with more information...
Python Courses In Mumbai
Python Course In Ahmedabad
Python Course In Kochi
Python Course In Trivandrum
Python Course In Kolkata
ReplyDelete
Replies
NiyazDecember 6, 2021 at 2:28 AM
Awesome article! You are providing us very valid information. This is worth reading. Keep sharing more such articles.
why become a data scientist
why data science
ReplyDelete
Replies
AnnJanuary 6, 2024 at 4:24 AM
python course in kochi
ReplyDelete
Replies

Add comment