This blog describes how Hortonworks Sandbox helps in learning Hadoop quickly. Big Data Analytics has to deal with a very large scale data-set and Apache Hadoop provides an Open-Source software framework for managing it.Apache Hadoop consists of 4 modules which are Hadoop Common, Hadoop Distributed File System(HDFS), Hadoop Yarn and Hadoop Map Reduce. It is quite difficult to learn Hadoop and understand these modules unless you refer right set of resources.
I used Hortonworks Sandbox to do a quick prototype of how Hadoop works and found it to be quite useful and easy.Lot of training material/videos also available on their site. For the installation :-
- Go to the installation page Hortonworks download page
- Installation size is 2.5GB, RAM requirement is minimum 4GB (8GB preferred)
- Windows XP/7/8/ MAC OSX with Virtualization enabled in the BIOS. Chrome 25+, IE9+(Except IE10) and Safari 6+
- Virtualization Product preferably Virtual Box need to be installed first. Sandbox image is packaged with the CentOS linux. After download,this image can be loaded in the Virtual Box and is ready to use.
- Use the Login-id as root and Password as hadoop for login
- When all the components are loaded and configuration is done you will see the IP address and port number on the screen -> http://127.0.0.1:8888 . Enter this address in the browser and you will get the GUI which will have all the Hadoop components and projects.
- Pick one Hadoop example project from the tutorial and follow the instructions present, for doing the prototyping.
- Final result from the prototyping can be exported to Excel using the ODBC connection.