Question :Select the correct staement regarding reducer
1. Number of reducer is defined as part of Job Configuration 2. All values of the same key can be processed by multiple reducer. 3. Access Mostly Uused Products by 50000+ Subscribers 4. 1,2 and 3 are correct 5. 1 and 3 are correct
Question : Distributing the values among associated with the key in sorted order to the reducer is defined as ? 1. Map and Reduce 2. Shuffle and Sort 3. Access Mostly Uused Products by 50000+ Subscribers 4. None of the above
Ans : 1 Exp : : When exporting a table from Hive, the data file will use the delimiters form the table. Because table3 wasn't created with specific delimiters, it will use the default Hive delimiter, which is \001 or Control-A. When the file is imported into R as a CSV, there will be only 1 column because the file isn't actually comma delimited.
Question : You use Sqoop to import a table from your RDBMS into HDFS. You know that Sqoop typically instantiates four Mappers. However, after the table import, you notice that five Mappers have run, there are five output files in HDFS, and one of the output files is empty. Why? 1. The administrator has set the sqoop.num.maps property on the slave nodes to 7 2. Some Map tasks failed and had to be rerun 3. Access Mostly Uused Products by 50000+ Subscribers 4. The HDFS block size was set to a very small value, resulting in more Mappers than usual running 5. The table was modified by a user of the RDBMS as Sqoop was running
Ans : 3 Exp : If some Map task attempts failed, they would be rerun but no data from the failed task attempts would be stored on disk. There is no sqoop.num.maps property. Sqoop typically reads the table in a single transaction, so modifying the data would have no effect; and the HDFS block size is irrelevant to the number of files created. The correct answer is that by default, Sqoop uses the table's primary key to determine how to split the data. If there is no numeric primary key, Sqoop will make a best-guess attempt at how the data is distributed, and may run more than its default four Mappers, although some may end up not actually reading any data.
Question : Using Apache Sqoop you can import the data to
Ans : 5 Exp : : Apache Sqoop can be used to import data from any relational DB into HDFS, Hive or HBase. To import data into HDFS, use the sqoop import command and specify the relational DB table and connection parameters:
sqoop import --connect "JDBC connection string" --table "tablename" --username "username" --password "password" This will import the data and store it as a CSV file in a directory in HDFS. To import data into Hive, use the sqoop import command and specify the option 'hive-import'.
sqoop import --connect "JDBC connection string" --table "tablename" --username "username" --password "password" --hive-importThis will import the data into a Hive table with the approproate data types for each column.
Question : You decide to use Hive to process data in HDFS. You have not created any Hive tables until now. Hive is configured with its default settings. You run the following commands from the Hive shell:
Ans : 2 Exp : : When you create a database named HADOOPEXAM in Hive, that creates a subdirectory of Hive's warehouse directory named HADOOPEXAM.db. All tables are placed in subdirectories of HADOOPEXAM.db; those subdirectory names are the names of the tables
Question :. For HadoopExam.com user profiles you need to analyze roughly ,, JPEG files of all the. Each file is no more than 3kB.Because your Hadoop cluster isn't optimized for storing and processing many small files, you decide to group the files into a single archive. The toolkit that will be used to process the files is written in Ruby and requires that it be run with administrator privileges. Which of the following file formats should you select to build your archive?
Exp :The two formats that are best suited to merging small files into larger archives for processing in Hadoop are Avro and SequenceFiles. Avro has Ruby bindings; SequenceFiles are only supported in Java.
JSON, TIFF, and MPEG are not appropriate formats for archives. JSON is also not an appropriate format for image data.
Question : SequenceFiles are flat files consisting of binary key/value pairs. SequenceFile provides Writer, Reader and SequenceFile.Sorter classes for writing, reading and sorting respectively. There are three SequenceFile Writers based on the SequenceFile.CompressionType used to compress key/value pairs: You have created a SequenceFile (MAIN.PROFILE.log) with custom key and value types. What command displays the contents of a SequenceFile named MAIN.PROFILE.log in your terminal in human-readable format?
1. Disable speculative execution for the data insert job 2. Enable speculative execution for the data insert job 3. Access Mostly Uused Products by 50000+ Subscribers 4. Configure only single mapper for the data insert job