Most of the scripting languages like php, python, perl, ruby bash is good. Any language able to read from stdin, write to sdtout and parse tab and new line characters will work: Hadoop Streaming just pipes the string representations of key value pairs as concatenated with a tab to an arbitrary program that must be executable on each task tracker node.
And Java is obviously true.
Heart Beat of child nodes will not be sent to the parent nodes when we are using scripting languages
Question : How does Hadoop process large volumes of data?
1. Hadoop uses a lot of machines in parallel. This optimizes data processing. 2. Hadoop was specifically designed to process large amount of data by taking advantage of MPP hardware 3. Access Mostly Uused Products by 50000+ Subscribers 4. Hadoop uses sophisticated caching techniques on namenode to speed processing of data
The basic design principles of Hadoop is to eliminate the data copying between different datanodes
Refer HadoopExam.com Recorded Training Module : 2 and 3
Question : What are sequence files and why are they important?
1. Sequence files are binary format files that are compressed and are splitable. They are often used in high-performance map-reduce jobs 2. Sequence files are a type of the file in the Hadoop framework that allow data to be sorted 3. Access Mostly Uused Products by 50000+ Subscribers 4. All of above
Explanation: Hadoop is able to split data between different nodes gracefully while keeping data compressed. The sequence files have special markers that allow data to be split across entire cluster
The sequence file format supported by Hadoop breaks a file into blocks and then optionally compresses the blocks in a splittable way
It is also worth noting that, internally, the temporary outputs of maps are stored using SequenceFile. The SequenceFile provides a Writer, Reader and Sorter classes for writing, reading and sorting respectively.