Friday, June 28, 2013

Why HBase and Why Hive

 
There is not replacement with Hadoop with Hbase and Hive with Hbase . Its totally Though, you can efficiently put or fetch data to/from HBase by writing MapReduce jobs. Or you can write sequential programs using other HBase API, such as Java to put or fetch the data. But we use Hadoop, HBase etc to deal with gigantic amounts of data, so that doesn't make much sense. Using normal sequential programs would be highly inefficient when your data is too huge.
 
From Question point of view , Hadoop is  2 main components  
 
  1. HDFS- Distributed file system 
  2. MapReduce-Processing Framework.
Like all other FS, HDFS also provides us storage, but in a fault tolerant manner with high throughput and lower risk of data loss(because of the replication). But, being a FS, HDFS lacks random read and write accees. This is where HBase comes into picture. It's a distributed, scalable, big data store, modelled after Google's BigTable. It stores data as key/value pairs.
 
Now Hive. It provides us data warehousing facilities on top of an existing Hadoop cluster. Along with that it provides an SQL like interface which makes your work easier, in case you are coming from an SQL background. You can create tables in Hive and store data there. Along with that you can even map your existing HBase tables to Hive and operate on them.
 
While Pig is basically a dataflow language that allows us to process enormous amounts of data very easily and quickly.Pig basically has 2 parts, the Pig Interpreter and the language, 'PigLatin'. You write Pig script in PigLatin and using Pig interpreter process them. Pig makes our life a lot easier, otherwise writing MapReduce is always not easy. Infact in some cases it can really become a pain.
 
HBase's internals allow fast read/write which is crucial for real time data handling. Whereas  Hadoop with Map Reduce can be used to process large amount about data.

3 comments: