Question : Which of the following skills a data scientists required? A. Web designing to represent best visuals of its results from algorithm. B. He should be creative C. Should possess good programming skills D. Should be very good at mathematics and statistic E. He should possess database administrative skills.
1. A,B,C 2. B,C,D 3. C,D,E 4. A,D,E 5. A,C,E
Correct Answer : 2 Explanation: Yes a data scientists should have combination of skills like to solve the complex problem he should be creative as well as able to find new solutions and use of existing data. And solve the problem skills required are programming as currently we see SAS, R, Python, Spark, Java and SPSS even day by day new technologies are coming. To apply various existing and new algorithm using Machine Learning, or AI it require good mathematics and statistics skills (Where the programmer feels, weaknesses). Another skill required is using visualization techniques like Qlik, Tableau etc.
Question : Which of the following steps you will be using in the discovery phase? A. What all are the data sources for the project? B. Analyze the Raw data and its format and structure. C. What all tools are required, in the project? D. What is the network capacity required E. What Unix server capacity required?
Correct Answer : 5 Explanation: During the discovery phase you need to find how much resources are required as early as possible and for that even you can involve various stakeholders like Software engineering team, DBAs, Network engineers, System administrators etc. for your requirement and these resources are already available or you need to procure them. Also, what would be source of the data? What all tools and software's are required to execute the same?
Question : Which of the following tool can be used to load and clean the huge volume of data?
1. Spark GraphX
2. Cloudera Knox
3. Apache Hadoop
4. Oracle MySQL
5. Qlik
Correct Answer : 3 Explanation: Hadoop has the capability to transform the data in required format by writing Custom MapReduce code or even you can use Sqoop (SQL to Hadoop) for loading data from RDBMS to HDFS. Similarly you can use Flume to load file based data. All are underline uses the MapReduce code to copy data from one system to HDFS. And during the data load you can even apply the transformation logic as well. So this it will help while data loading transformations to be applied.