Question : You are working as a chief data scientist, in Arinika Inc for a market research company.' You have a team of data scientist, who knows Python and Machine Learning. Now which of the best tool, which your team can use with their existing skill sets. Hence, learning curve can be reduced 1. Big Insight
Correct Answer : Get Lastest Questions and Answer : Explanation: Apache Spark is the open standard for flexible in-memory data processing that enables batch, real-time, and advanced analytics on the Apache Hadoop platform.
Simple, yet rich, APIs for Java, Scala, and Python open up data for interactive discovery and iterative development of applications. Through shared common code, data scientists and developers can increase productivity with rapid prototyping for batch and streaming applications, using the language and third-party tools on which they already rely.
Question : You are working as Chief Data Architect in a media comapny, where everyday B viewers watch their media programs (News, Movies etc). You have a very well setup, so that you can continuosly receiving customer behaviour data like their viewing habits, peak usage. Your company has various advertiers, hence you need to have segmentation of customer data with the public data, such as voter registration, so that more accurate targeted campaigns to sepcific demographics can be launched. Which all technology will be required in such scenario ?
Correct Answer : Get Lastest Questions and Answer : Explanation: SPSS Statistics has several statistical algorithms for creating segmentation.
Two step K-Means Hierarchical Tree Discriminant Nearest neighbor
These are the top hits of the clustering algorithms in general use. You can also throw a neural network on that list, but in SPSS Statistics, that algorithm is listed separately. Each of these algorithms has strengths and weaknesses, depending on the amount of data you have, the type or characteristics of the variables, and your end purpose in classifying the data. I concentrate on two of the algorithms for this article: K-Means and Tree. (Tree in this case really is more broadly called Decision Trees.)
IBM PureData System for Analytics is a purpose-built, standards-based data warehouse and analytics appliance that integrates database, server, storage and analytics into an easy-to-manage system. It is designed for high-speed analysis of big data volumes, scaling into the petabytes.
Question : You have considered to store data , such that they can take lesser space and effeciently processed by Hadoop framework. Hence, you decided to go for parquet data format. You still want to compress this parquet data, which is the default codec for compressing parquet data? 1. Gzip
Correct Answer : Get Lastest Questions and Answer : Explanation: The supported compression types are UNCOMPRESSED, GZIP, and SNAPPY for parquet and Snappy is the default one.