Premium

Dell EMC Data Science Associate Certification Questions and Answers (Dumps and Practice Questions)



Question : Which functionality do regular expressions provide?

 :   Which functionality do regular expressions provide?
1. increased numerical precision
2. underflow prevention
3. text pattern matching
4. decreased processing complexity

Correct Answer : 3
Explanation: A regular expression is a method used in programming for pattern matching. Regular expressions provide a flexible and concise means to match strings of text. For example, a regular expression could be used to search through large volumes of text and change all occurrences of "cat" to "dog".

Regular expressions are used for syntax highlighting systems, data validation and in search engines such as Google, to try to determine an algorithmic match to the query a user is asking.

Regular expressions are also known in short form as regex or regexp. Utilities, text editors and programming languages use regular expressions to manipulate and search patterns of text. While some languages integrate regular expressions into the core of the language syntax, like TCL, Awk, PERL and RUBY, others use regular expressions through libraries, such as Java, C++ and C. This means there are implementation differences so a regular expression that works well with one application might or might not work with another. Subtle differences do exist.

Regular expressions can be incredibly powerful. Essentially, if the pattern can be defined, a regular expression can be created. A simple pattern might be something as simple as finding all situations where a sentence ends in "that" and is replaced with "which". The pattern could get more complex by doing the same replacement but only on the 3rd and 5th occurrence of a match. Or it could get even more complicated by using different sets of matching characters depending on the frequency and location of previous matching characters.

The three main components of a regular expression are anchors that are used to specify the position of a pattern in relation to a line of text, character sets that match one or more characters in a single position, and modifiers that specify the number of times the previous character set is repeated.

The operations that help in building regular expressions are:
Quantification: Quantifiers dictate how often the preceding element is allowed to occur.
Grouping: Operators can have their scope and precedence specified using parentheses.
Boolean Conditions: An OR or AND condition can be stated for operators and groups.
Regular expressions use algorithms such as Deterministic Finite Automation (DFA) and Non-deterministic Finite Automation (NFA) to match a string. In an NFA, for each pair of state and input symbol there are several possible next states, while a DFA accepts a finite string of symbols.





Question : When creating a project sponsor presentation, what is the main objective?
 :  When creating a project sponsor presentation, what is the main objective?
1. Show that you met the project goals
2. Show how you met the project goals
3. Show how well the model will meet the SLA (service level agreement)
4. Clearly describe the methods and techniques used

Correct Answer : 1
Explanation:
Goals are high-level statements that provide the overall context for what the project is trying to accomplish. Let's look at an example and some of the characteristics of a goal statement. One of the goals of a project might be to "increase the overall satisfaction levels for clients calling to the company helpdesk with support needs". Key software project goals include

Functionality. This is the number-one driver in most software projects, which are obsessed with meeting functional requirements. And for good reason too. Software is a tool meant to solve a problem. Solving the problem invariably involves getting something done in a particular way.

Usability. People like programs to be easy to use, especially when it comes to conducting electronic commerce. Usability is very important, because without it software becomes too much of a hassle to work with. Usability affects reliability too, because human error often leads to software failure. The problem is, security mechanisms, including most uses of cryptography, elaborate login procedures, tedious audit requirements, and so on, often cause usability to plummet. Security concerns regularly impact convenience too. Security people often deride cookies, but cookies are a great user-friendly programming tool!

Efficiency. People tend to want to squeeze every ounce out of their software (even when efficiency isn't needed). Efficiency can be an important goal, although it usually trades off against simplicity. Security often comes with significant overhead. Waiting around while a remote server somewhere authenticates you is no fun, but it can be necessary.

Time-to-market. "Internet time" happens. Sometimes the very survival of an enterprise requires getting mind share fast in a quickly evolving area. Unfortunately, the first thing to go in a software project with severe time constraints is any attention to software risk management. Design, analysis, and testing corners are cut with regularity. This often introduces grave security risks. Fortunately, building secure software does not have to be slow. Given the right level of expertise, a secure system can sometimes be designed more quickly than an ad hoc cousin system with little or no security.

Simplicity. The good thing about simplicity is that it is a good idea for both software projects and security. Everyone agrees that keeping it simple is good advice.






Question : The average purchase size from your online sales site is $, . The customer experience team
believes a certain adjustment of the website will increase sales. A pilot study on a few hundred
customers showed an increase in average purchase size of $1.47, with a significance level of
p=0.1.
The team runs a larger study, of a few thousand customers. The second study shows an
increased average purchase size of $0.74, with a significance level of 0.03. What is your
assessment of this study?


 :  The average purchase size from your online sales site is $, . The customer experience team
1. The change in purchase size is not practically important, and the good p-value of the second
study is probably a result of the large study size.
2. The change in purchase size is small, but may aggregate up to a large increase in profits over
the entire customer base.
3. The difference in the change in purchase size between the two studies is troubling; The team
should run another, larger study.
4. The p-value of the second study shows a statistically significant change in purchase size. The
new website is an improvement.



Correct Answer : 1
Explanation: The p-value for each term tests the null hypothesis that the coefficient is equal to zero (no effect). A low p-value (< 0.05) indicates that you can reject the null hypothesis. In other words, a predictor that has a low p-value is likely to be a meaningful addition to your model because changes in the predictor's value are related to changes in the response variable. Conversely, a larger (insignificant) p-value suggests that changes in the predictor are not associated with changes in the response. Significance of the estimated coefficients: Are the t-statistics greater than 2 in magnitude, corresponding to p-values less than 0.05 If they are not, you should probably try to refit the model with the least significant variable excluded, which is the "backward stepwise" approach to model refinement.

Remember that the t-statistic is just the estimated coefficient divided by its own standard error. Thus, it measures "how many standard deviations from zero" the estimated coefficient is, and it is used to test the hypothesis that the true value of the coefficient is non-zero, in order to confirm that the independent variable really belongs in the model.

The p-value is the probability of observing a t-statistic that large or larger in magnitude given the null hypothesis that the true coefficient value is zero. If the p-value is greater than 0.05-which occurs roughly when the t-statistic is less than 2 in absolute value-this means that the coefficient may be only "accidentally" significant.

There's nothing magical about the 0.05 criterion, but in practice it usually turns out that a variable whose estimated coefficient has a p-value of greater than 0.05 can be dropped from the model without affecting the error measures very much-try it and see


Related Questions


Question : You are using the Apriori algorithm to determine the likelihood that a person who owns a home
has a good credit score. You have determined that the confidence for the rules used in the
algorithm is > 75%. You calculate lift = 1.011 for the rule, "People with good credit are
homeowners". What can you determine from the lift calculation?


 : You are using the Apriori algorithm to determine the likelihood that a person who owns a home
1. Support for the association is low
2. Leverage of the rules is low
3. Access Mostly Uused Products by 50000+ Subscribers
4. The rule is true




Question : Consider a database with transactions:
Transaction 1: {cheese, bread, milk}
Transaction 2: {soda, bread, milk}
Transaction 3: {cheese, bread}
Transaction 4: {cheese, soda, juice}
The minimum support is 25%. Which rule has a confidence equal to 50%?

 : 	Consider a database with  transactions:
1. {bread} => {milk}
2. {bread, milk} => {cheese}
3. Access Mostly Uused Products by 50000+ Subscribers
4. {bread} => {cheese}



Question : Under which circumstance do you need to implement N-fold cross-validation after creating a regression model?

 : Under which circumstance do you need to implement N-fold cross-validation after creating a regression model?
1. The data is unformatted.
2. There is not enough data to create a test set.
3. Access Mostly Uused Products by 50000+ Subscribers
4. There are categorical variables in the model.




Question : What is an appropriate data visualization to use in a presentation for an analyst audience?

 : What is an appropriate data visualization to use in a presentation for an analyst audience?
1. Pie chart
2. ROC curve
3. Access Mostly Uused Products by 50000+ Subscribers
4. Stacked bar chart



Question : When would you use GROUP BY ROLLUP clause in your OLAP query?

 : When would you use GROUP BY ROLLUP clause in your OLAP query?
1. where only the subtotals are to be included in the output
2. where only the grand totals are to be included in the output
3. Access Mostly Uused Products by 50000+ Subscribers
in the output
4. where all subtotals and grand totals are to be included in the output


Question : Which type of numeric value does a logistic regression model estimate?
 : Which type of numeric value does a logistic regression model estimate?
1. A p-value
2. Any integer
3. Access Mostly Uused Products by 50000+ Subscribers
4. Any real number