Dell EMC Data Science and BigData Certification Questions and Answers

Question : Which data asset is an example of quasi-structured data?

1. XML data file
2. Database table
3. Access Mostly Uused Products by 50000+ Subscribers
4. Webserver log

Correct Answer : Get Lastest Questions and Answer :

Explanation: Types of quasi-structured data and examples of each

totally unstructured data - google search results cover all websites, but are hard to further categorize without access the google database itself
intuitive-structure - my wordtree algorithm accepts any pasted text and yields a network map based on similarity of langauge within the text, as well as proximity of words to each other within the text. But it is not
"tagged" the way youtube and flickr track content in images
emergent structure - algorithms to extract the main idea of groups of stories
pseudo-structuring - looking at content and assigning structure to all possible variations of a single document type, such as I did with the auditing tool.
guess, apply a rule, and refine - in this mode the algorithm tries an approach and refines it iteratively based on user feedback. IF the feedback is automated in the form of a score on the result, this approach
becomes evolutionary programming.

These strategies for structuring Big Data have come about as a consequence of two trends. First - 100 times more content is added online each year than the sum of all books ever written in history. Second - most of
this content is structured by institutions that for various reasons don't want to release the fully annotated version of the information. So pragmatic programmers like me build "wrappers" to restructure the parts that
are available. Eventually there will be a universal wrapper for all content about financial records, and another one for all organization reports. These data sets will organize content into clusters that are similar
enough for us to study patterns on a global scale. That's when "big data" begins to get interesting. Today, we're in the early stages of deconstructing the structure so that we can reconstruct larger data sets from
the individual parts that each have unique yet "incompatible" structures. It is like taking apart all the cars in a junk yard so we can categorize all the parts and deliver them to customers that want to build fresh
cars. You see cars go in and cars go out, but a lot happens in between.

Last year, if someone had asked you to track all the work you do on your computer, you would have probably filled out a survey (like the "time tracking" reports I fill out monthly at work). In the future your computer
will fill them out for you and in greater detail, and these data will be "mashable" with other reporting systems. This will not happen because two systems are built to work together, but instead because someone build
a third system that allows two systems to share information. Eventually we will build "genetic algorithms" that will write programs that can re-organize data into usable structures regardless of how the original data
was structured. This is going to happen in the next ten years and we will ask ourselves why we didn't do it sooner.

Question : What would be considered "Big Data"?

1. An OLAP Cube containing customer demographic information about 100, 000, 000 customers

2. Aggregated statistical data stored in a relational database table

3. Access Mostly Uused Products by 50000+ Subscribers

4. Spreadsheets containing monthly sales data for a Global 100 corporation

Correct Answer : Get Lastest Questions and Answer :

Explanation: Information sets that approach the size of all information known about "X". For example, instead of a sample of e-books, it means a comprehensive set of all e-books ever written (~70% to N=ALL). Big Data sets
are noisier yet do not require us to know beforehand what questions we will pose of it. We can drill down in Big Data sets and ask arbitrary questions. It is a complementary method to statistics, which rely on
sampling to eliminate bias through random sampling. Instead, Big Data assumes bias and quantifies what the biases are in the data set, so that they can be detected, inspected, and corrected.

Question : When creating a presentation for a technical audience, what is the main objective?

1. Show that you met the project goals
2. Show how you met the project goals
3. Access Mostly Uused Products by 50000+ Subscribers
4. Show the technique to be used in the production environment

Correct Answer : Get Lastest Questions and Answer :

Explanation: Using visualization for data exploration is different from presenting results to stakeholders. Not every type of plot is suitable for all audiences. Most of the plots presented earlier try to detail the data as
clearly as possible for data scientists to identify structures and relationships. These graphs are more technical in nature and are better suited to technical audiences such as data scientists. Nontechnical
stakeholders, however, generally prefer simple, clear graphics that focus on the message rather than the data.

When presenting to a technical audience such as data scientists and analysts, focus on how the work was done. Discuss how the team accomplished the goals and the choices it made
in selecting models or analyzing the data. Share analytical methods and decision-making processes so other analysts can learn from them for future projects. Describe methods, techniques, and technologies used, as this
technical audience will be interested in learning about these details and considering whether the approach makes sense in this case and whether it can be extended to other, similar projects. Plan to provide specifics
related to model accuracy and speed, such as how well the model will perform in a production environment.

Related Questions

Question :

In which of the following scenario you should apply the Bay's Theorem

1. The sample space is partitioned into a set of mutually exclusive events { A1, A2, . . . , An }.
2. Within the sample space, there exists an event B, for which P(B) > 0.
3. Access Mostly Uused Products by 50000+ Subscribers
4. In all above cases

Question :

Marie is getting married tomorrow, at an outdoor ceremony in the desert. In recent years, it has rained only 5 days each year.
Unfortunately, the weatherman has predicted rain for tomorrow. When it actually rains, the weatherman correctly forecasts
rain 90% of the time. When it doesn't rain, he incorrectly forecasts rain 10% of the time. Which of the following
will you use to calculate the probability whether it will rain on the day of Marie;s wedding?

1. Naive Bayes
2. Logistic Regression
3. Access Mostly Uused Products by 50000+ Subscribers
4. All of the above

Question : Your company has organized an online campaign for feedback on product quality and you
have all the responses for the product reviews, in the response form people have check box as well
as text field. Now you know that people who do not fill in or write non-dictionary word in the text
field are not considered valid feedback. People who fill in text field with proper English words
are considered valid response. Which of the following method you should not use to identify whether
the response is valid or not?

1. Naive Bayes
2. Logistic Regression
3. Access Mostly Uused Products by 50000+ Subscribers
4. Any one of the above

Question :

Suppose you have been given a relatively high-dimension set of independent variables and you are asked
to come up with a model that predicts one of Two possible outcomes like "YES" or "NO",
then which of the following technique best fit.

1. Support vector machines
2. Naive Bayes
3. Access Mostly Uused Products by 50000+ Subscribers
4. Random decision forests
5. All of the above

Question :

A bio-scientist is working on the analysis of the cancer cells. To identify whether the cell
is cancerous or not, there has been hundreds of tests are done with small variations to say yes to the problem.
Given the test result for a sample of healthy and cancerous cells, which of the following technique
you will use to determine whether a cell is healthy?

1. Linear regression
2. Collaborative filtering
3. Access Mostly Uused Products by 50000+ Subscribers
4. Identification Test

Question : Find out the classifier which assumes independence among all its features?

1. Neural networks
2. Linear Regression
3. Access Mostly Uused Products by 50000+ Subscribers
4. Random forests