Name: Hadoop: The Definitive Guide
SKU: 286779
Price: 106.00 AED
Availability: InStock
Rating: 4.888888888888889 (9 reviews)

Description

Hadoop: The Definitive Guide: 9781449311520: Computer Science Books @ desertcart.com Review: My Experience Getting Certified In Hadoop - This book is the single best source to begin your career in Big Data Development. However this book should not be the first entry point, which will frustrate you. This review hopes to help the juniors and newbies, who want to enter the big data world. Cloudera CCD-410 certification ranges between tough to very tough. Period. TRAINING : You are not mandated to take a training. I took a relatively inexpensive training ($300) from edureka dot in, an online training website in India. They give a good overview at 10,000 feet are very good for the price,but no where close enough to get certified. Check out their first session available for free at Youtube. They do have steps to install your own VM, simple project , HIVE,PIG etc. If time and money permits, I strongly suggest going to official cloudera training. It costs about $3000 and includes a free test voucher , so effectively about $2700. Saves you months in preparation time and distinct advantage over your peers that should pay for itself. Install VM, try few commands, PIG, hive commands, Also try desertcart elastic mapreduce which reduces lot of manual typing and allows you to focus on the coding itself. LEARNING FROM THIS BOOK: After a training, start with this book. The first Eight chapters are critical (Approximately 300 out of 550 pages). If you are smart,sharp and young , expect to read these eight chapters about three times, more is just fine. Add some time to read rest of chapters once Or twice before the test and all the external links. If you are a busy professional, give a six month window to take the test. Knowing Java is a definitive plus. Buy the Cloudera mock examination after getting comfortable and familiar with Mapreduce($125). It is a nice resource. Explains every answer, links to where you can get more information . Just as an FYI, the real test was far more complex and difficult. SCENARIOS BASED ON A MAPREDUCE CODE: You will need to go through the example code, understand what each line does, why it is there, what happens if you comment out a line of the code. As an example, job.setMapOutputKeyClass(Text.class); job.setMapOutputValueClass(Text.class); return job.waitForCompletion(false) ? 0 : -1; > What does waitForCompletion mean?, > Is Reduce Job Must Or Optional ? > How Many Files will running a Map job produce? > Will the code compile or will it error at run time based on datatypes.? > What will happen if you run the same job twice ? > What happens to the map data after the job? > How does Hadoop handle huge files that cross block boundaries ? > What happens if you do not explicitly set a mapper or reducer ? > Will a combiner help , based on a scenario ? > Which daemon decides the number of Map job to run ? > How does hadoop handle the blocks when a node crashes? SCENARIOS BASED ON HIVEQL: This is an extension of previous scenarios. A small table, a simple SQL query ( example : select stationid,max(temp) from tableX. Answer choice are four set of mapreduce code and you have to chose the right one. Expect to read and understand the mapreduce that emulates how you create a distinct, how you do a sum, average, max, min etc. According to Cloudera website, these are the percentage of questions. CHAPTER 3 : 17 Percent CHAPTER 4 : 6 Percent CHAPTER 5 : 7 Percent CHAPTER 6 : 18 Percent CHAPTER 7 : 6 Percent CHAPTER 8 : 7 Percent PIG /HIVE/SQOOP/Zookeeper : 8 percent combined (no Hbase) Chapter no 2 has no reference but is very important. Expect several questions from that chapter since it gives a good overview. Remaining is all the links that cloudera suggests to read and get familier. SQOOP import syntax, creating a hive table via sqoop , creating and populating hive table via sqoop are must knows. WHY GETTING CERTIFIED: I have heard the tiring argument that certification is purely academic. Tell that to your doctor or your Dentist. Sound fundamentals are the foundations behind real world experience. Big Data is no different. Understanding the basics will give the confidence; experience will follow while you keep your client happy. WHY BIG DATA : My interest on Big Data was spooked by the Harvard Business Review Article claiming that "Data Scientist" was the hottest job of the 21st century. Follow that by googling for "Rayid Ghani", claimed as the data scientist behind Obama's second term victory. hbr dot org forwardslash 2012 forwardslash 10 forwardslash data-scientist-the-sexiest-job-of-the-21st-century forwardslash ar forwardslash1 OTHER CHOICES : > Coursera provides a free course "Introduction To Data Science". I signed up for their first batch but could not finish with office commitments. > Youtube for "Stanford University Hadoop" by Amr Awadallah I was impressed with these books; You also might like them. > Big Data: A Revolution That Will Transform How We Live, Work and Think > Big Data at Work: Dispelling the Myths, Uncovering the Opportunities > Data Science for Business: What you need to know about data mining and data-analytic thinking SUMMARY: Some day Big Data will become a commodity skillset,but not now. I did a search in glassdoor to see the demand for Hadoop vs some other hot ones. Hadoop is head and shoulders above the rest. Hadoop - 30,011 postings on Apr 2014 Oracle DBA - 9227 postings ( A Perpetual hot skillset) Salesforce - 9968 postings Please post any questions in the comment section and I will certainly try to answer them. Review: good book on a complex subject - Hadoop is a pretty complex technology for even seasoned engineers to grasp and appreciate fully. Attempting to explain its core concepts and usage in a book is no small feat but I think the author did an admirable job in capturing the essence of Hadoop and the surrounding landscape. The thing that makes Hadoop so fascinating but so hard to fully grasp is that it really involves an understanding of its surrounding complementing technologies to truly understand what Hadoop is and why it is so popular. Can this book serves as a beginners guide? I am not sure. I have read a few Hadoop blogs and articles and have some prior hello world setup experinces with Hadoop and yet I couldn't always follow the book. It is definitely not a beginners book with fools-proof detailed instructions to setup and run every example. It is however an excellent book to educate users to the world of Hadoop, what Hadoop really is, what it involves and the complementing set of technologies that integrate and/or build on top of Hadoop that makes it even more useful. I walk away from this book with a much better understanding of the inner workings of Hadoop (HDFS, MapReduce), a solid grasp of its surrounding technologies (Pig, Hive, HBase) and a much better appreciation of the power of Hadoop, especially when used alongside its many complementing technologies. This is not a beginners introductory book, nor does it cover any high level data analysis or any BI solutions scenarios. This is also not an admin/configuration guide to setup, design and maintain complex Hadoop clusters. But if you read this book with the right expectations, you won't be disappointed. My take on the current state of Hadoop is it is still in its infancy, with an overly complex set of technologies and functioning at pretty low-level. In due time, Hadoop will form the backbone distributed technology but will pretty much shielded and be invisible to most users. Higher level data analysis solutions and real time queries will be the new rage powered by Hadoop in the background. I am looking forward to the next battleground!

Reviews

A**R

My Experience Getting Certified In Hadoop

This book is the single best source to begin your career in Big Data Development. However this book should not be the first entry point, which will frustrate you. This review hopes to help the juniors and newbies, who want to enter the big data world. Cloudera CCD-410 certification ranges between tough to very tough. Period. TRAINING : You are not mandated to take a training. I took a relatively inexpensive training ($300) from edureka dot in, an online training website in India. They give a good overview at 10,000 feet are very good for the price,but no where close enough to get certified. Check out their first session available for free at Youtube. They do have steps to install your own VM, simple project , HIVE,PIG etc. If time and money permits, I strongly suggest going to official cloudera training. It costs about $3000 and includes a free test voucher , so effectively about $2700. Saves you months in preparation time and distinct advantage over your peers that should pay for itself. Install VM, try few commands, PIG, hive commands, Also try Amazon elastic mapreduce which reduces lot of manual typing and allows you to focus on the coding itself. LEARNING FROM THIS BOOK: After a training, start with this book. The first Eight chapters are critical (Approximately 300 out of 550 pages). If you are smart,sharp and young , expect to read these eight chapters about three times, more is just fine. Add some time to read rest of chapters once Or twice before the test and all the external links. If you are a busy professional, give a six month window to take the test. Knowing Java is a definitive plus. Buy the Cloudera mock examination after getting comfortable and familiar with Mapreduce($125). It is a nice resource. Explains every answer, links to where you can get more information . Just as an FYI, the real test was far more complex and difficult. SCENARIOS BASED ON A MAPREDUCE CODE: You will need to go through the example code, understand what each line does, why it is there, what happens if you comment out a line of the code. As an example, job.setMapOutputKeyClass(Text.class); job.setMapOutputValueClass(Text.class); return job.waitForCompletion(false) ? 0 : -1; > What does waitForCompletion mean?, > Is Reduce Job Must Or Optional ? > How Many Files will running a Map job produce? > Will the code compile or will it error at run time based on datatypes.? > What will happen if you run the same job twice ? > What happens to the map data after the job? > How does Hadoop handle huge files that cross block boundaries ? > What happens if you do not explicitly set a mapper or reducer ? > Will a combiner help , based on a scenario ? > Which daemon decides the number of Map job to run ? > How does hadoop handle the blocks when a node crashes? SCENARIOS BASED ON HIVEQL: This is an extension of previous scenarios. A small table, a simple SQL query ( example : select stationid,max(temp) from tableX. Answer choice are four set of mapreduce code and you have to chose the right one. Expect to read and understand the mapreduce that emulates how you create a distinct, how you do a sum, average, max, min etc. According to Cloudera website, these are the percentage of questions. CHAPTER 3 : 17 Percent CHAPTER 4 : 6 Percent CHAPTER 5 : 7 Percent CHAPTER 6 : 18 Percent CHAPTER 7 : 6 Percent CHAPTER 8 : 7 Percent PIG /HIVE/SQOOP/Zookeeper : 8 percent combined (no Hbase) Chapter no 2 has no reference but is very important. Expect several questions from that chapter since it gives a good overview. Remaining is all the links that cloudera suggests to read and get familier. SQOOP import syntax, creating a hive table via sqoop , creating and populating hive table via sqoop are must knows. WHY GETTING CERTIFIED: I have heard the tiring argument that certification is purely academic. Tell that to your doctor or your Dentist. Sound fundamentals are the foundations behind real world experience. Big Data is no different. Understanding the basics will give the confidence; experience will follow while you keep your client happy. WHY BIG DATA : My interest on Big Data was spooked by the Harvard Business Review Article claiming that "Data Scientist" was the hottest job of the 21st century. Follow that by googling for "Rayid Ghani", claimed as the data scientist behind Obama's second term victory. hbr dot org forwardslash 2012 forwardslash 10 forwardslash data-scientist-the-sexiest-job-of-the-21st-century forwardslash ar forwardslash1 OTHER CHOICES : > Coursera provides a free course "Introduction To Data Science". I signed up for their first batch but could not finish with office commitments. > Youtube for "Stanford University Hadoop" by Amr Awadallah I was impressed with these books; You also might like them. > Big Data: A Revolution That Will Transform How We Live, Work and Think > Big Data at Work: Dispelling the Myths, Uncovering the Opportunities > Data Science for Business: What you need to know about data mining and data-analytic thinking SUMMARY: Some day Big Data will become a commodity skillset,but not now. I did a search in glassdoor to see the demand for Hadoop vs some other hot ones. Hadoop is head and shoulders above the rest. Hadoop - 30,011 postings on Apr 2014 Oracle DBA - 9227 postings ( A Perpetual hot skillset) Salesforce - 9968 postings Please post any questions in the comment section and I will certainly try to answer them.

12.04.2014

A**I

good book on a complex subject

Hadoop is a pretty complex technology for even seasoned engineers to grasp and appreciate fully. Attempting to explain its core concepts and usage in a book is no small feat but I think the author did an admirable job in capturing the essence of Hadoop and the surrounding landscape. The thing that makes Hadoop so fascinating but so hard to fully grasp is that it really involves an understanding of its surrounding complementing technologies to truly understand what Hadoop is and why it is so popular. Can this book serves as a beginners guide? I am not sure. I have read a few Hadoop blogs and articles and have some prior hello world setup experinces with Hadoop and yet I couldn't always follow the book. It is definitely not a beginners book with fools-proof detailed instructions to setup and run every example. It is however an excellent book to educate users to the world of Hadoop, what Hadoop really is, what it involves and the complementing set of technologies that integrate and/or build on top of Hadoop that makes it even more useful. I walk away from this book with a much better understanding of the inner workings of Hadoop (HDFS, MapReduce), a solid grasp of its surrounding technologies (Pig, Hive, HBase) and a much better appreciation of the power of Hadoop, especially when used alongside its many complementing technologies. This is not a beginners introductory book, nor does it cover any high level data analysis or any BI solutions scenarios. This is also not an admin/configuration guide to setup, design and maintain complex Hadoop clusters. But if you read this book with the right expectations, you won't be disappointed. My take on the current state of Hadoop is it is still in its infancy, with an overly complex set of technologies and functioning at pretty low-level. In due time, Hadoop will form the backbone distributed technology but will pretty much shielded and be invisible to most users. Higher level data analysis solutions and real time queries will be the new rage powered by Hadoop in the background. I am looking forward to the next battleground!

17.04.2013

B**.

Most up to date Hadoop Reference Guide

Open Source projects have a very fast release cycle, so books can be obsolete in as little as 4-6 months, thankfully the 3rd edition is very up to date even as of Aug 2012. One thing I find interesting is even the hadoop documentation on the Apache site, uses the old mapreduce APIs (org.apache.hadoop.mapred.*), while the new APIs (org.apache.hadoop.mapreduce.*) have been around now for a while. Thankfully this book uses the new APIs, and also specifically lists the difference between the new and the old APIs. (Not the new and old APIs have got nothing to do with YARN the new framework). Also like the coverage of other Hadoop Ecosystem projects like hive/pig/hbase/zookeeper etc. In short if you want to learn/use/administer/manage hadoop, this book is a must.

18.08.2012

R**H

The Bible, but not a Tutorial

This is the best reference out there regarding Hadoop, but do not mistake it for a tutorial -- it's not really meant to be read cover to cover. If you want that, I've heard good things about Chuck Lam's "Hadoop in Action". Now that you know what it isn't, here's what this book is: A comprehensive, "roll up your sleeves, here's some Java" deep dive into Hadoop. It covers the basics as well as advanced topics and a brief tour of the supporting projects (like Hive, Pig, etc). No single book will do Haddop justice, but this book is the best attempt so far. If you only have enough cheddar to buy a single book, this is the one you should own.

11.12.2012

A**N

I hope to have to use in the future Hadoop in a production evironment: with this book you can understand what is and how you use big data processing.

25.03.2013

D**E

Dieses Buch kann als die "Referenz-Bibel" für Hadoop bezeichnet werden, deshalb ist das Buch ein Muss für jeden ersthaften Hadoop-User...egal ob Anfänger oder Fortgeschrittener. Was "Java ist auch eine Insel" für die allgemeine Java-Programmierung ist, ist diese Buch für Hadoop. Darüber hinaus werden auch noch optionale Tools und Addons wie Hive, Pig, Snoop usw. behandelt.

6.12.2013

S**Y

Book is best to learn Hadoop basics. And got the book as promised in good condition.

23.01.2017

S**V

Good

9.06.2016

P**A

I've found this book to be the perfect kickstarter for a novice in Hadoop (previous technical skills prerequisite though). It features a balanced coverage of concepts, architecture, planning, programming, deployment, administration and to a small extent, tuning. I think the book does a good service to the technology covering it from a multitude of aspects to show integration capabilities and demonstrating its versatility. I like the author's ability to lead readers of various thinking habits (relational, OOAD, dimensional) into discovering Hadoop by starting from one's own comfortable perspective. If there is only one thing I'd improve - make clearer the importance of writing a smart Map Reducer, provide a formal definition of what makes a function compliant with the Combiner and provide tips&tricks on writing effective Map Reducers.

12.11.2012

Best Sellers Rank	#618,392 in Books ( See Top 100 in Books ) #28 in Parallel Computer Programming #87 in Java Programming #111 in Database Storage & Design
Customer Reviews	4.2 4.2 out of 5 stars (130)
Dimensions	7 x 1.5 x 9.19 inches
Edition	Third
ISBN-10	1449311520
ISBN-13	978-1449311520
Item Weight	2.32 pounds
Language	English
Print length	688 pages
Publication date	June 12, 2012
Publisher	Yahoo Press

Hadoop: The Definitive Guide

Description

Specifications

Reviews

Common Questions