Question : You are working with a Social Networking Company. Now, they have huge volume of unstructured and semi structured data like Video file, Audio Files, Images, HTMLs and PDFs etc. Which of the following technology will be helpful for that type of storage and data should be available online in highly available manner
1. Spark
2. RDBMS
3. Hadoop
4. IBM Netzza
Correct Answer : 3 Explanation:
Question : You are working as a BigData Solution Architect, and you have implemented a Hadoop based solution in your organization and it is working fine since last years. Now suddenly user base is increasing as well as new requirements are coming for using Flume, Spark Streaming etc. How, would you tackle this solution? Storage is not an issue with the cluster
1. Increase number of data nodes in Hadoop Cluster
2. Start using Oozie- workflow for your existing and new jobs
3. Implement Yarn to decouple MapReduce and resource management
4. Integrate RDBMS based system to manage new resources in hadoop Cluster.
Correct Answer : 3 Explanation: As its architectural center, YARN enhances a Hadoop compute cluster in the following ways:
Multi-tenancy : YARN allows multiple access engines (either open-source or proprietary) to use Hadoop as the common standard for batch, interactive and real-time engines that can simultaneously access the same data set.
Multi-tenant data processing improves an enterprise's return on its Hadoop investments.
Cluster utilization :YARN's dynamic allocation of cluster resources improves utilization over more static MapReduce rules used in early versions of Hadoop
Scalability :Data center processing power continues to rapidly expand. YARN's ResourceManager focuses exclusively on scheduling and keeps pace as clusters expand to thousands of nodes managing petabytes of data.
Compatibility : Existing MapReduce applications developed for Hadoop 1 can run YARN without any disruption to existing processes that already work
Given problem , is related to cluster utilization.
Question : Which of the following does NOT consist of collaboration from various organizations to drive innovation and standardization across big data technologies?
1. Fair Scheduler allows assigning guaranteed minimum shares to queues 2. queue does not need its full guaranteed share, the excess will not be splitted between other running apps. 3. it is also possible to limit the number of running apps per user and per queue 4. 1 and 3 5. 1,2 and 3
1. Open a remote terminal to the node running the ApplicationMaster and kill the JVM.
2. yarn application -kill "application_id" 3. Use CTRL-C from the terminal where the MapReduce job was started. 4. hadoop datanode -rollback 5. rmadmin -refreshQueues