Question : All keys used for intermediate output from mappers must: 1. Implement a splittable compression algorithm. 2. Be a subclass of FileInputFormat. 3. Access Mostly Uused Products by 50000+ Subscribers 4. Override isSplitable. 5. Implement a comparator for speedy sorting.
Correct Answer : Get Lastest Questions and Answer : Explanation: The MapReduce framework operates exclusively on pairs, that is, the framework views the input to the job as a set of pairs and produces a set of pairs as the output of the job, conceivably of different types. The key and value classes have to be serializable by the framework and hence need to implement the Writable interface. Additionally, the key classes have to implement the WritableComparable interface to facilitate sorting by the framework.
Question : Which Hadoop component is responsible for managing the distributed file system metadata?
Correct Answer : Get Lastest Questions and Answer : Explanation: Prior to Hadoop 2.0.0, the NameNode was a single point of failure (SPOF) in an HDFS cluster. Each cluster had a single NameNode, and if that machine or process became unavailable, the cluster as a whole would be unavailable until the NameNode was either restarted or brought up on a separate machine.
This impacted the total availability of the HDFS cluster in two major ways: In the case of an unplanned event such as a machine crash, the cluster would be unavailable until an operator restarted the NameNode. Planned maintenance events such as software or hardware upgrades on the NameNode machine would result in windows of cluster downtime.
The HDFS High Availability feature addresses the above problems by providing the option of running two redundant NameNodes in the same cluster in an Active/Passive configuration with a hot standby. This allows a fast failover to a new NameNode in the case that a machine crashes, or a graceful administrator-initiated failover for the purpose of planned maintenance.
Question : You need to move a file titled "weblogs" into HDFS. When you try to copy the file, you can't. You know you have ample space on your DataNodes. Which action should you take to relieve this situation and store more files in HDFS?
1. Increase the block size on all current files in HDFS. 2. Increase the block size on your remaining files. 3. Access Mostly Uused Products by 50000+ Subscribers 4. Increase the amount of memory for the NameNode. 5. Increase the number of disks (or size) for the NameNode.
Correct Answer : Get Lastest Questions and Answer : Explanation: Namenode: The core metadata server of Hadoop. This is the most critical piece of the system, and there can only be one of these. This stores both the file system image and the file system journal. The namenode keeps all of the filesystem layout information (files, blocks , directories, permissions, etc) and the block locations. The filesystem layout is persisted on disk and the block locations are kept solely in memory. When a client opens a file, the namenode tells the client the locations of all the blocks in the file; the client then no longer needs to communicate with the namenode for data transfer.
Namenode: We recommend at least 8GB of RAM (minimum is 2GB RAM), preferably 16GB or more. A rough rule of thumb is 1GB per 100TB of raw disk space; the actual requirements is around 1GB per million objects (files, directories, and blocks). The CPU requirements are any modern multi-core server CPU. Typically, the namenode will only use 2-5% of your CPU. As this is a single point of failure, the most important requirement is reliable hardware rather than high performance hardware. We suggest a node with redundant power supplies and at least 2 hard drives.
1. Hadoop uses a lot of machines in parallel. This optimizes data processing. 2. Hadoop was specifically designed to process large amount of data by taking advantage of MPP hardware 3. Access Mostly Uused Products by 50000+ Subscribers 4. Hadoop uses sophisticated caching techniques on namenode to speed processing of data
1. Sequence files are binary format files that are compressed and are splitable. They are often used in high-performance map-reduce jobs 2. Sequence files are a type of the file in the Hadoop framework that allow data to be sorted 3. Access Mostly Uused Products by 50000+ Subscribers 4. All of above