1.2 types of data

Ace your homework & exams now with Quizwiz!

Big data

Refers to data sets so large and so complex that their analysis is beyond the capabilities of traditional software tools. Analysis of big data may require software simultaneously running in parallel on many different computers.

Which of the following consists of discrete data?

Number of suitcases on a plane

There are 17,246,372 high school students in the United States. In a study of 8505 U.S high school students 16 years of age or older, 44.5% of them said that they texted while driving at least once during the previous 30 days (based on data in "texting while driving and other risky motor vehicles behaviors among U.S high school students") Parameter- Statistic-

Parameter: the population size of all 17,246,372 high school students is a parameter, because it is the size of the entire population of all high school students in the United States. If we somehow knew the percentage of all 17,246,372 who reported they had texted while driving, that percentage would also be a parameter. Statistic: The value of 44.5% is a statistic, because it is based on the sample, not on the entire population.

In a survey of 1812 women in Virginia, 43% so that they wash their hands after attending a sporting event. A. Identify the sample and the population. B. is the value of 48% is statistic or a parameter?

Population: all women in virginia Sample: the 1812 women surveyed B. The given value is a statistic because the numerical measurement describes a characteristic of a sample.

Which of the following is NOT a level of measurement? Quantitative Ratio Nominal Ordinal

Quantitative

The ages (in years) of subjects enrolled in a clinical trial

Quantitative data

Level of measurement: Ratio Interval Ordinal Nominal -brief descriptions, examples

Ratio- there is a natural zero starting point and ratios make sense. Example: heights, lengths, distances, volumes Interval- differences are meaningful, ur there is no natural zero starting point and ratios are meaningless. Example: body temperatures in degrees Fahrenheit or Celsius Ordinal- data can be arranged in order, but differences either can't be found or are meaningless. Example: ranks of colleges in U.S. news and world report Nominal- categories only. Data cannot be arranged in order. Example: eye colors

Examples of data at the ratio level of measurement. Note the presence of the natural zero value, and also note the use of meaningful ratios of "twice" and "three times."

1. Height of students: Heights of 180 cm and 90 cm for a high school student and a preschool student (0 cm represents no height, and 180 cm is twice as tall as 90 cm.) 2. Class times: The times of 50 min and 100 min for a statistics class (0 min represents no class time, and 100 min is twice as long as 50 min.)

Examples of sample data at the nominal level of measurement

1. Yes/no/undecided: survey responses of yes, no, and undecided 2. For an item on a survey, respondents are given a choice of possible answers, and they are coded as follows: "I agree" is coded as 1; "I disagree" is coded as 2; "I don't care" is quoted as 3; "I refuse to answer" is coded as 4; "go away and stop bothering me" is coded as 5. The numbers 1, 2, 3, 4, 5 don't measure or count anything.

Determine whether the given volume is from a discrete or continuous data set. When a car is randomly selected, it is found to have a weight of 1873.7 kg.

A continuous data set because there are infinitely many possible values and those values cannot be counted.

Missing completely at random

A data value is missing completely at Random if the likelihood of it being missing is independent of its value or any of the other values in the data set. That is, any data value is just as likely to be missing as any other data value.

Missing not at random

A data value is missing not at Random if the missing value is related to the reason that it is missing

The genders (male/female) of subjects enrolled in a clinical trial

Categorical data as labels

The identification numbers 1,2,3....,25 are assigned randomly to the 25 subjects in a clinical trial.

Categorical data as numbers (Those numbers are substitutes for names. They don't measure or count anything, so they are categorical data, not quantitive data.)

Nominal level of measurement

Characterized by data that consists of names, labels, or categories only. It is not possible to arrange the data in some order (such as low to high)

Quantative (or numerical) data

Consists of numbers representing counts or measurements

When the typical patient has blood drawn as a part of a routine examination, the volume of blood drawn is between 0 mL and 50 mL.

Continuous data (There are infinitely many values between 0 mL and 50 mL. Because it is impossible to count the number of different possible values on such a continuous scale, these amounts are continuous data.)

Example of sample data at the ordinal level of measurement

Course grades: A bio statistics professor assigns grades of A, B, C, D, or F. These grades can be arranged in order, but we can't determine the differences between the grades. For example, we know that A is higher than B (so there is an ordering, but we cannot subtract B from A (so the difference cannot be found)

Which of the following is associated with a parameter?

Data that were obtained from an entire population.

Each of several physicians plans to count the number of physical examinations given during the next full week.

Discrete data of the finite type (The data are discrete because they are finite numbers, such as 27 and 46 that result from a counting process)

Researchers plan to test the accuracy of a blood typing test by repeating the process of submitting a sample of the same blood (type O+) until the test yields an an error.

Discrete data of the infinite type (It is possible that each researcher could repeat this test forever without ever getting an error, but they can still count the number of tests as they proceed. The collection of number of tests is countable, because you can count them, even though the counting can go on forever.)

interval level of measurement

If they can be arranged in order, and differences between data values can be found and are meaningful; but data at this level do not have a natural zero starting point at which none of the quantity is present.

Ratio level of measurement

If they can be arranged in order, differences can be found and are meaningful, and there is a natural zero starting point (where zero indicates that none of the quantity is present). For data at this level, differences and ratios are both meaningful.

ordinal level of measurement

If they can be arranged in some order, but differences (obtained by subtraction) between data values either cannot be determined or are meaningless.

Data science

Involves applications of statistics, computer science, and software engineering, along with some other relevant fields (such as biology and epidemiology)

Which level of measurement consist of categories only or data cannot be arranged in an ordering scheme?

Nominal

Discrete data

Results when the data values are quantitative and the number of values is finite or "countable." (If there are infinitely many values, the collection of those values is countable if it is possible to count them individually, such as the number of tosses of a coin before getting tails or the number of births in Houston before getting a male.)

Among the subjects surveyed as a part of a health interview survey. Several subjects are randomly selected and their heights are recorded.

Since Heights ARE NOT countable, the data are from a CONTINUOUS data set.

In a study of weight gains by college students in their freshman year, researchers record the amounts of time spent online by randomly selected students

Since the amounts of time spent online ARE NOT countable, the data are from a CONTINUOUS data set.

A sample of married couples is randomly selected and the number of children in each family is recorded.

Since the number of children ARE countable, the data are from a DISCRETE data set.

The emergency room of a medical center records the number of stitches used for patients in a week.

Since the number of stitches used ARE countable, the data are from a DISCRETE data set.

A sample of seniors is selected and it's found that 50% own a television.

Statistic because the value is a numerical measurement describing a characteristic of a sample.

Examples that illustrate the interval level of measurement

Temperatures: body temperatures of 98.2°F and 98.8°F are examples of data at this interval level of measurement. Those values are ordered, and we can determine the difference of 0.6°F. However, there is no natural starting point. The value of 0°F might seem like a starting point, but it is arbitrary and does not represent the total absence of heat. Years: The year is 1992 and 1776 can be arranged in order, and the differences of 284 years can be found and is meaningful. However, time did not begin in the year 0, so the year 0 is arbitrary instead of being a natural zero starting point representing "no time."

Caution ⚠️

The concept of countable data plays a key role in the preceding definitions, but it is not a particularly easy concept to understand. Continuous data can be measured, but not counted. If you select a particular data value from continuous data, there is no "next" data value.

Identify the level of measurement of the data, and explain what is wrong with the given calculation. In a set of data, car rankings are represented as 1 for first, 2 for second, and 3 for third. The average mean of the 782 car rankings is 1.3.

The data are at the ORDINAL level of measurement. Such data should not be used for calculations such as an average.

Determine which of the four levels of measurement (nominal, ordinal, interval, ratio) is most appropriate. Also, explain what is wrong with the given calculation. Four medications are coded as follows: medication A (1); medication B (1202); medication C (74); medication D (795). The average (mean) of those numbers is 518.

The data are at the nominal level of measurement because the data cannot be ordered. The data are used for identification, and they do not represent measurements or counts of anything. It would not make sense to compute an average (mean).

A research team has received ALL brains from those deceased in a natural disaster which had been donated for research. The research team found that the average mean for the volume of the brains they had received was 1101.7 cm³. They came to this conclusion after measuring the brains they had received.

The given value is a PARAMETER because the numerical measurement describes a characteristic of a POPULATION.

In a study of all the babies born at five different hospitals in Massachusetts, it was found that 52% of the babies were girls.

The given value is a PARAMETER because the numerical measurement describes a characteristic of a POPULATION.

In a study of all the babies born at five different hospitals in Massachusetts, it was found that the average (mean) weight at birth of those babies born at those hospitals was 2948.3 grams.

The given value is a PARAMETER because the numerical measurement describes a characteristic of a POPULATION.

A research team has received a collection of brains for those deceased in a natural disaster which had been donated for research. The research team found that the average (mean) for the volume of the brains hat they had received was 1051.8 cm^3. They came to this conclusion after measuring the brains they had a received.

The given value is a STATISTIC because the numerical measurement describes a characteristic of a SAMPLE.

Determine which of the four levels of measurement (nominal, ordinal, interval, ratio) is most appropriate. Blood lead levels of low, medium, and high.

The ordinal level of measurement is most appropriate because the data can be ordered, but differences (obtained by subtraction) cannot be found or are meaningless.

Determine which of the four levels of measurement (nominal, ordinal, interval, ratio) is most appropriate. A research project on the effectiveness of appendectomies includes the charges (dollars) for those procedures that were conducted the past year.

The ratio level of measurement is most appropriate because the data can be ordered, differences (obtained by subtraction) can be found and are meaningful, and there is a natural zero starting point.

Determine which of the four levels of measurement (nominal, ordinal, interval, ratio) is most appropriate. Height of subject (cm)

The ratio level of measurement is most appropriate because the data can be ordered, differences (obtained by subtraction) can be found and are meaningful, and there is a natural zero starting point.

Determine which of the four levels of measurement (nominal, ordinal, interval, ratio) is most appropriate. A research project on the effectiveness of heart transplants begins with the number of hospitals that provide heart transplants

The ratio level of measurement is most appropriate because the data can be ordered, differences (obtained by subtraction) can be found and are meaningful, and there is a natural zero starting point..

Grammar: fewer versus less

When describing smaller amounts, it is correct grammar to use fewer for discrete amounts and less for continuous amounts. It is correct to say that we drink fewer cans of cola and that, in the process, we drink less cola. The numbers of cans of cola or discrete data, whereas the volume amounts of cola are continuous data.

Parameter

a numerical measurement describing some characteristic of a population Hint in alliteration: Population Parameter

Statistic

a numerical measurement describing some characteristic of a sample Hint in alliteration: Sample Statistic

Categorical (or qualitative or attribute) data

consist of names or labels (not numbers representing counts or measurements) Caution ⚠️: categorical data are sometimes coded with numbers, with those numbers replacing names. Although such numbers might appear to be quantitive, they are actually categorical data.

Quantitative data can be further described by distinguishing between

discrete and continuous types

Which of the following would be classified as categorical data?

hair color

Examples of jobs

•Facebook: data scientist •IBM: data scientist •PayPal: data scientist

Examples of applications of big data

•attempt to forecast flu epidemics by analyzing internet searches of flu symptoms • The spatio temporal epidemiological modeler developed by IBM is providing a means for using a variety of data that are correlated with disease data.

Examples of data set magnitudes

•terabytes (10^12 bytes) of data •petabytes (10^15 bytes) of data •exabytes (10^18 bytes) of data •zettabytes (10^21 bytes) of data •yottabytes (10^24 bytes) of data


Related study sets

convection question for science final

View Set

Lesson 1: The Sales Comparison Approach

View Set

NURS 121 QCQ Constipation, Diarrhea, and Fecal Incontinence

View Set

AP Comparative Government Review Questions (Great Britain & EU)

View Set