Question : Place the following steps in order of execution for MapR Direct Shuffle work flow in YARN.
A. After that the Node Manager on each node launches containers using information about the node s local volume from the LocalVolumeAuxiliaryService. B. The Application Master service initializes the application by calling initialize Application() on the LocalVolumeAuxiliaryService. C. Application Master Service requests task containers from the Resource Manager. D. Then the Resource Manager sends the App Master information that AppMaster uses to request containers from the NodeManager.
Correct Answer : Get Lastest Questions and Answer : Explanation: MapR Direct Shuffle work flow in YARN The Application Master service initializes the application by calling initialize Application() on the LocalVolumeAuxiliaryService. The Application Master service requests task containers from the Resource Manager. The Resource Manager sends the App Master information that App Master uses to request containers from the Node Manager. Then the Node Manager on each node launches containers using information about the node s local volume from the LocalVolumeAuxiliaryService. Data from map tasks is saved in the App Master for later use in Task Completion events which are requested by reduce tasks. As the map tasks completes, map outputs and map-side spills are written to the local volumes on the map task nodes, generating Task Completion events. Reduce tasks fetches Task Completion events from the Application Manager. The task Completion events include information on the location of map output data, enabling reduce tasks to copy data from MapOutput locations. Reduce tasks reads the map output information. Spills and interim merges are written to local volumes on the reduce task nodes. Finally the Application Master calls stopApplication() on the LocalVolumeAuxiliaryService to clean up data on the local volume.
Question : You have an Hadoop ecosystem components that use MapReduce under the hood , how can you define which Version of the to use either Classic or yarn
1. set a parameter called default_mode in their /opt/mapr/conf/hadoop_version configuration file of each component
2. While submitting the job we must use maprcli command and setting mode argument
Correct Answer : Get Lastest Questions and Answer : Explanation: You can set the MapReduce mode with an environment variable on a client or cluster node. The MapReduce mode set in the environment variable overrides the default_mode set on the client node and the cluster. Using this method, you can open multiple terminals and set each shell to use a different mode. Therefore, you can run both MapReduce v1 and MapReduce v2 jobs from the same client machine at the same time. For information on how to set the environment variable for an ecosystem service, see Managing the MapReduce Mode for Ecosystem Components. To set the MapReduce mode with an Environment Variable: Open terminal on the client node. Enter one of the following commands on the shell: export MAPR_MAPREDUCE_MODE=yarn export MAPR_MAPREDUCE_MODE=classic
You can use the maprcli command line or the MapR Control System (MCS) to change the MapReduce mode for the entire cluster. The maprcli command line and the MCS utilize Central Configuration to update all the nodes in the cluster with the MapReduce mode. To set the cluster MapReduce mode using maprcli: Run the following command: maprcli cluster mapreduce set -mode
You can edit the hadoop_version file on a MapR client node to configure the default MapReduce mode for all jobs and application that you submit from the client node. The mode that you set on the client overrides the MapReduce mode of the cluster. For information about how this impacts ecosystem clients. To set the MapReduce mode on the client node: Open the hadoop_version file in the following location: /opt/mapr/conf/ Edit the default_mode parameter. The following values are valid for the default_mode parameter: default_mode=classic Specifies that the client submits MapReduce v1 jobs to the cluster. default_mode=yarn Specifies that the client submits MapReduce v2 applications to the cluster.
You can set the MapReduce mode in the command line to override the MapReduce mode set in an environment variable, and the default mode of the client node and the cluster. Using this method, you can run MapReduce v1 and MapReduce v2 jobs one after the other from the same shell.
To set the MapReduce mode for ecosystem component that connect directly to the cluster: Open a terminal on the client node. Enter one of the following commands on the shell: export MAPR_MAPREDUCE_MODE= yarn export MAPR_MAPREDUCE_MODE= classic Launch the ecosystem client and submit the job or application.
Question : In YARN cluster, which of the following component, you can use to monitor your job?
Correct Answer : Get Lastest Questions and Answer : Explanation: The history server REST API s allow the user to get status on finished applications. YARN has single Mapreduce job history server on master node. As indicated by the name, the function of the mapreduce job history server is to store and serve a history of the mapreduce jobs that were run on the cluster
ou need only one historyserver. It can run on any node you like, including a dedicated node of its own, but traditionally runs on the same node as the resourcemanager. The one history server is declared in mapred-site.xml:
mapreduce.jobhistory.address: MapReduce JobHistory Server host:port Default port is 10020. mapreduce.jobhistory.webapp.address: MapReduce JobHistory Server Web UI host:port Default port is 19888. mapreduce.jobhistory.intermediate-done-dir: Directory where history files are written by MapReduce jobs (in HDFS). Default is /mr-history/tmp mapreduce.jobhistory.done-dir: Directory where history files are managed by the MR JobHistory Server (in HDFS). Default is /mr-history/done You can access the history via the historyserver REST API, you do not access directly the internal history files. For casual browsing, the history is available in the resouremanager web UI.