Correct Answer : Get Lastest Questions and Answer : Exp: You can create histograms with the function hist(x) where x is a numeric vector of values to be plotted. The option freq=FALSE plots probability densities instead of frequencies. The option breaks= controls the number of bins. Histograms are used very often in public health to show the distributions of your independent and dependent variables. Although the basic command for histograms (hist()) in R is simple, getting your histogram to look exactly like you want takes getting to know a few options of the plot. Here I present ways to customize your histogram for your needs.
First, I want to point out that ggplot2 is a package in R that does some amazing graphics, including histograms. I will do a post on ggplot2 in the coming year. However, the hist() function in base R is really easy and fast, and does the job for most of your histogram-ing needs. However, if you want to do complicated histograms, I would recommend reading up on ggplot2.
Okay so for our purposes today, instead of importing data, I'll create some normally distributed data myself. In R, you can generate normal data this way using the rnorm() function:
BMI<-rnorm(n=1000, m=24.2, sd=2.2)
So now we have some BMI data, and the basic histogram plot that comes out of R looks like this:
hist(BMI)
Question : Review the following code: SELECT pn, vn, sum(prc*qty) FROM sale GROUP BY CUBE(pn, vn) ORDER BY 1, 2, 3; Which combination of subtotals do you expect to be returned by the query?
Correct Answer : Get Lastest Questions and Answer : Explanation: Queries that use the ROLLUP and CUBE operators generate some of the same result sets and perform some of the same calculations as OLAP applications. The CUBE operator generates a result set that can be used for cross tabulation reports. A ROLLUP operation can calculate the equivalent of an OLAP dimension or hierarchy. In addition to the subtotals generated by the ROLLUP extension, the CUBE extension will generate subtotals for all combinations of the dimensions specified. If "n" is the number of columns listed in the CUBE, there will be 2n subtotal combinations.
SELECT fact_1_id, fact_2_id, SUM(sales_value) AS sales_value FROM dimension_tab GROUP BY CUBE (fact_1_id, fact_2_id) ORDER BY fact_1_id, fact_2_id;
Correct Answer : Get Lastest Questions and Answer : Explanation: MADlib is an open-source library for scalable in-database analytics. It provides data-parallel implementations of mathematical, statistical and machine learning methods for structured and unstructured data.
The MADlib mission: to foster widespread development of scalable analytic skills, by harnessing efforts from commercial practice, academic research, and open-source development. There are a number of data analytics solutions that support the MapReduce principle and able to work with NoSQL databases. However, most enterprises still rely on mature SQL data stores and, therefore, need traditional analytics solutions to provide in-depth analysis of their business-critical data.
MADlib is a scalable in-database analytics library that features sophisticated mathematical algorithms for SQL-based systems. MADlib was developed jointly by researchers from UC Berkeley and engineers from Pivotal (formerly EMC/Greenplum). It can be considered as an enterprise alternative to Hadoop in machine learning, data mining, and statistics tasks. In addition, MADlib supports time series rows, which could not be processed appropriately by Hadoop, greatly extending capabilities for building prediction systems, Like Analyst First, MADlib benefits from the support of the good folk at EMC-Greenplum. In particular, EMC-Greenplum's Australian team is responsible for developing the support vector machines, Latent Dirichlet Allocation (a technique for doing topic discovery in unstructured text) and sparse vectors modules. It takes on "data warehousing" and "business intelligence" as outmoded, low-tech approaches to getting value out of Big Data. Instead, it advocates a "Magnetic, Agile, Deep" (MAD) approach to data, that shifts the locus of power from what Brian Dolan calls the "DBA priesthood" to the statisticians and analysts who actually like to crunch the numbers. This is a good thing, on many fronts. It describes a state-of-the-art parallel data warehouse that sits on 800TB of disk, using 40 dual-processor dual-core Sun Thumper boxes. It presents a set of general-purpose, hardcore, massively parallel statistical methods for big data. They're expressed in SQL (OMG!) but could be easily translated to MapReduce if that's your bag.