Correct Answer : Get Lastest Questions and Answer : Explanation: You can use streaming for either rapid prototyping using sed/awk, or for full-blown MapReduce deployments. Note that the streaming feature does not include C++ programs " these are supported through a similar feature called pipes.
Be aware that streaming may introduce some performance penalty:
Framework still creates JVMs for tasks Scripted programs may run more slowly Streaming may improve performance for example:
Code doing map and reduce functions may perform better than Java
Question : Please put below in Order of processing (Hadoop Streaming) A. PipeMap task processes input from your input files/directories and passes them to your script as standard input. B. map function processes key-value pairs one record at a time in an input split (just as in a normal MapReduce job). C. write your output to standard output which is wired into the standard input of the PipeMap task D. PipeMap task then processes intermediate results from your map function, and the Hadoop framework sorts and shuffles the data to the reducers. E. PipeReduce sends these intermediate results to its standard out, which is wired to the standard input of your reduce script. F. reduce script processes a record from standard input, it may write to its standard output (which is wired to the PipeReduce standard input.) G. PipeReduce program will then collect all the output and write to the output directory.
Correct Answer : Get Lastest Questions and Answer : Explanation: : The PipeMap task processes input from your input files/directories and passes them to your script as standard input. Your map function processes key-value pairs one record at a time in an input split (just as in a normal MapReduce job). You write your output to standard output which is wired into the standard input of the PipeMap task. The PipeMap task then processes intermediate results from your map function, and the Hadoop framework sorts and shuffles the data to the reducers.
The same data flow mechanism occurs now on the reduce side. PipeReduce sends these intermediate results to its standard out, which is wired to the standard input of your reduce script. After your reduce script processes a record from standard input, it may write to its standard output (which is wired to the PipeReduce standard input.) The PipeReduce program will then collect all the output and write to the output directory.
1. Pig is more powerfull and allow certain types of data manipulation not possible with MapReduce. 2. Pig provides has same capabilities as MapReduce. Pig programs are executed as MapReduce jobs via the Pig interpreter. 3. Access Mostly Uused Products by 50000+ Subscribers 4. Pig provides the additional capability of allowing you to control the flow of multiple MapReduce jobs and Chaining the MapReduce jobs, which is not possible only with the MapReduce.
1. Input file splits may cross line breaks. A line that crosses tile splits is ignored. 2. The input file is split exactly at the line breaks, so each Record Reader will read a series of complete lines. 3. Access Mostly Uused Products by 50000+ Subscribers 4. Input file splits may cross line breaks. A line that crosses file splits is read by the RecordReader of the split that contains the end of the broken line. 5. Input file splits may cross line breaks. A line that crosses file splits is read by the RecordReader of the split that contains the beginning of the broken line.
1. Increase the parameter that controls minimum split size in the job configuration. 2. Write a custom MapRunner that iterates over all key-value pairs in the entire file. 3. Access Mostly Uused Products by 50000+ Subscribers 4. Write a custom FileInputFormat and override the method isSplittable to always return false.
1. A SequenceFile contains a binary encoding of an arbitrary number of homogeneous writable objects. 2. A SequenceFile contains a binary encoding of an arbitrary number of heterogeneous writable objects. 3. Access Mostly Uused Products by 50000+ Subscribers 4. A SequenceFile contains a binary encoding of an arbitrary number key-value pairs. Each key must be the same type. Each value must be same type.