Question : A Data Scientist is assigned to build a model from a reporting data warehouse. The warehouse contains data collected from many sources and transformed through a complex, multi-stage ETL process. What is a concern the data scientist should have about the data?
1. It is too processed 2. It is not structured 3. It is not normalized 4. It is too centralized
Correct Answer : 1 Explanation: Prior to conducting data analysis, the required data must be collected and processed to extract the useful information. The degree of initial processing and data preparation depends on the volume of data, as well as how straightforward it is to understand the structure of the data. Highly processed data may loose some imporatnt information.
Question : Which word or phrase completes the statement? Emphasis color is to standard color as _______ .
1. Main message is to key findings 2. Frequent item set is to item 3. Main message is to context 4. Pie chart is to proportions
Correct Answer : 3 Explanation: Our brains are compelled to find meaning, whether it is intended or not. Because the eyes are attracted to bright and high-contrast colors, viewers will derive meaning from something that stands out. When you use color for emphasis, it's like shouting that this object or element has the greatest value. At the Lynda.com site, the bright yellow is used to prominently display their most important message.
Question : Which data asset is an example of semi-structured data?
1. XML data file 2. Database table 3. Webserver log 4. News article
Correct Answer : 1 Explanation: 5.3. Semi-Structured Data idea predates XML but not HTML data is available electronically in database systems file systems, e.g., bibliographic data, Web data data exchange formats, e.g., EDI, scientific data attempt to reconcile database and document "worlds" semi-structured data organised in semantic entities similar entities are grouped together entities in same group may not have same attributes order of attributes not necessarily important not all attributes may be required size of same attributes in a group may differ type of same attributes in a group may differ 5.4. Example of Semi-Structured Data
name: Peter Wood email: ptw@dcs.bbk.ac.uk, p.wood@bbk.ac.uk name: first name: Mark last name: Levene email: mark@dcs.bbk.ac.uk name: Alex Poulovassilis affiliation: Birkbeck 5.5. Semi-Structured Data Models
based on labelled graphs rather than labelled trees used for data exchange among, and integration of, heterogeneous data sources
schema information is in the edge labels sometimes called schemaless or self-describing data stored at the leaves
1. Decrease the number of measures used 2. Increase the number of clusters 3. Decrease the number of clusters 4. Identify additional measures to add to the analysis
1. Selects the values in vector v that are less than 1000 and assigns them to the vector nv 2. Sets nv to TRUE or FALSE depending on whether all elements of vector v are less than 1000 3. Removes elements of vector v less than 1000 and assigns the elements >= 1000 to nv 4. Selects values of vector v less than 1000, modifies v, and makes a copy to nv