Statistics 101

Réussis tes devoirs et examens dès maintenant avec Quizwiz!

Constructing a Frequency Distribution

1. Decide on the number of classes (between 5 and 20) 2. Find the class with the width as follows: a. Divide the range by the number of classes. b. Round up to the next convenient number. 3. Find the class limits. 4. Find the frequency for each class.

What are the 2 main branches of statistics?

1.) Descriptive statistics 2.) Inferential statistics

Name each level of measurement for which data can be qualitative.

1.) Ordinal 2.) Nominal

Suppose a survey of 969 homeowners found that 22​% bought flood insurance. Which part of the survey represents the descriptive branch of​ statistics? Make an inference based on the results of the survey. Choose the best statement of the descriptive statistic in the problem.

22% of homeowners in the sample bought flood insurance.

Use the given minimum and maximum data​ entries, and the number of​ classes, to find the class​ width, the lower class​ limits, and the upper class limits. minimum=8​, maximum=93​, 6 classes

98 - 8 = 90 / 6 = 15 classes

What is the difference between a census and a​ sampling?

A census includes the entire population. A sampling includes only part of the population.

Experiment

A researcher deliberately applies treatment before observing the responses. A treatment is applied to part of a population. called a treatment group and responses are observed.

Observational Study

A researcher does not influence the responses. A researcher observes and measures characteristics of interest of part of a population but does not change existing conditions. For example; measuring the amount of time people spent doing various activities like, paid work, childcare, and socializing.

How is a sample related to a population?

A sample is a subset of a population.

blocks

Are groups of subjects with similar characteristics

Nominal level of measurement

Are qualitative only. Data at this level are categorized using names, labels, or qualities. No mathematical computations can be made at this level.

Ordinal level of measurement

Are qualitative or quantitative. Data at this level can be arranged in order, or ranked, but differences between data entries are not meaningful.

Ratio level of measurement

Are similar to data at the interval level, with the added property that a zero entry is an inherent zero. A ratio of 2 data entries can be formed so that one data entry can be meaningfully expressed as a multiple if another.

Qualitive data

Attributes, labels, non-numerical entries.

interval level of measurement

Can be ordered, and meaningful differences between data entries can be calculated. At the interval level, a zero entry simply represents a position on a scale; the inherent is not an inherent zero.

Quantitive Data

Consist of number that are measurements or counts.

Data

Consists of information from observations, counts, measurements, or responses.

Dot Plot

Each data is plotted, using a point, above a horizontal axis,

T or F - A population is the collection of some outcomes, responses, measurements, or counts that are of interest.

False. A population is the collection of all outcomes, responses, measurements, or counts that are of interest.

Determine whether the statement is true or false. If it is​ false, rewrite it as a true statement. Using a systematic sample guarantees that members of each group within a population will be sampled.

False. Using a stratified sample guarantees that members of each group within a population will be sampled

placebo

Harmless, fake treatment that is made to look like the real treatment.

After constructing a relative frequency distribution summarizing IQ scores of college students, what should be the sum of the relative frequencies?

If percentages are used the sum should be 100%. If proportions are used the sum of the relative frequencies should be 1

Why should the number of classes in a frequency distribution be between 5 and 20?

If the number of classes in a frequency distribution is not between 5 and 20, it may be difficult to detect any patterns.

A. Time (in minutes) it takes a sample of employees to drive to work

Match the plot with a possible description of the sample.

Suppose a survey of 969 homeowners found that 22​% bought flood insurance. Which part of the survey represents the descriptive branch of​ statistics? Make an inference based on the results of the survey. Choose the best inference from the given information.

Most homeowners do not buy flood insurance.

Double-blind experiment

Neither the experimenter or the subjects know whether the subjects are receiving treatment or a placebo. The experimenter is informed after all data has been collected

The jersey numbers for players on a baseball team are listed below. 20, 6, 50, 16, 14 ,5, 12, 17, 21, 24, 13, 15, 23, 8, 19, 7, 25, 10, 22, 31, 34, 36, 33, 18, 4, 71, 2 Identify the level of measurement of the data set. Explain your reasoning.

Nominal. The data are categorized using​ numbers, but no mathematical computations can be made.

Placebo effect

Occurs when a subject reacts favorably to a placebo when in fact the subject has been given fake treatment.

Confounding Variable

Occurs when an experiment cannot tell the difference between the effects of different factors on the variable.

The top five books on the best seller list last year are shown below. 1. Threat Vector 2. Spring Fever 3. Kiss The Dead 4. The Forgotten 5. Gone Girl Identify the level of measurement of the data set. Explain your reasoning

Ordinal. The data can be arranged in order, but the differences between the data entries are not meaningful.

Determine whether the data set is a population or a sample. Explain your reasoning. The age of each member on the House of Representatives.

Population, because it is a collection of ages for all members of the House of Representatives.

In a​ poll, 1,005 adults in a country were asked whether they favor or oppose the use of​ "federal tax dollars to fund medical research using stem cells obtained from human​ embryos." Among the​ respondents, 47​% said that they were in favor. Identify the population and the sample.

Population: All the adults in the country.

a.) There are 7 classes b.) The least frequency is about 25. The greatest frequency is about 300. c.) The class width is 10. d.) What pattern does the graph show? - About half of the employees salaries are between $50,000 and $69,000

Question content area top part 1 Use the frequency histogram to complete the following parts. ​(a) Determine the number of classes. ​(b) Estimate the greatest and least frequencies. ​(c) Determine the class width. ​(d) Describe any patterns with the data.

What is replication in an​ experiment? Why is replication​ important?

Replication is repetition of an experiment under the same or similar conditions. Replication is important because it enhances the validity of the results.

Determine whether the data set is a population or a sample. Explain your reasoning. The salaries of 5 baseball players on a team of 25.

Sample, because the collection of salaries for 5 baseball players is a subset of all baseball players on the team.

In a​ poll, 1,005 adults in a country were asked whether they favor or oppose the use of​ "federal tax dollars to fund medical research using stem cells obtained from human​ embryos." Among the​ respondents, 47​% said that they were in favor. Identify the population and the sample.

Sample: The 1,005 adults selected.

Identify the sampling techniques​ used, and discuss potential sources of bias​ (if any). Explain. Chosen at​ random, 700 customers at a department store are contacted and asked their opinions of the service they received. What type of sampling is​ used?

Simple random sampling is​ used, since the business is selecting from its customers at​ random, and all samples of 700 customers have an equal chance of being selected

In​ 1965, researchers used random digit dialing to call 1200 people and ask what obstacles kept them from attending town hall meetings. What type of sampling was​ used?

Simple random sampling was​ used, since each number had an equal chance of being​ dialed, so all samples of 1200 phone numbers had an equal chance of being selected.

Determine whether the underlined numerical value is a parameter or a statistic. Explain your reasoning. In a poll of a sample of 12,000 adults in a certain city, 12% said that they left work before 6pm.

Statistic, because the data set of a sample of 12,000 adults in a city is a sample.

Completly randomized design

Subjects are assigned to different treatment groups through random selection.

Class width

The distance between lower (or upper) limits of consecutive classes.

A pharmaceutical company wants to test the effectiveness of a new allergy drug. The company identifies 250 females​ 30-35 years old who suffer from severe allergies. The subjects are randomly assigned into two groups. One group is given the new allergy drug and the other is given a placebo that looks exactly like the new allergy drug. After six​ months, the​ subjects' symptoms are studied and compared. Answer parts​ (a) through​ (c) below. ​(a) Identify the experimental units and treatments used in this experiment. Choose the correct answer below.

The experimental units are the​ 30- to​ 35-year-old females being given the treatment. The treatment is the new allergy drug.

Ratio

The graph to the right shows the number of cities with certain average annual rainfall amounts (in inches). Identify the level of measurement of the data listed on the horizontal and vertical axes in the graph

​(c) How could this experiment be designed to be a​ double-blind? Choose the correct answer below.

The study would be a​ double-blind study if both the researcher and the patient did not know which patient received the real drug or the placebo.

Midpoint of a class

The sum of the lower and upper limits of the class divided by 2.

Determine whether the variable is qualitative or quantitative. Explain your reasoning. State of residence

The variable is qualitative because state of residence describes an attribute or characteristic.

Determine whether the variable is qualitative or quantitative. Explain your reasoning. Time in hours that a light bulb lasts

The variable is quantitative because time is found by measuring or counting.

What potential sources of bias are​ present, if​ any? Select all that apply

The wording of the question asked to the customers may influence them towards a particular response. The results would not be usable in this case.

(b) Identify a potential problem with the experiment design being used and suggest a way to improve it. Choose the correct answer below.

There may be a bias on the part of the researcher if the researcher knows which patients were given the real drug.

Determine whether the data set is a population or a or a sample. Explain your reasoning. The ages of 13 members of a legislature.

This is a sample, because it is a collection of ages for some members of the legislature.

The ages of car owners in the country.

Use the Venn Diagram to identify which is the population.

The ages of car owners in the country who have a garage.

Use the Venn Diagram to identify which is the sample.

Frequency polygon

Use the same horizontal and vertical scales that were used in the histogram. Plot points that represent the midpoint and frequency of each class and connect the points in order from left to right. Because the graph should begin and end on the horizontal axis, extend the left side to one class width before the first class midpoint and extend the right side to one class width after the last class midpoint.

Blinding

Used to help minimize the placebo effect. Is a technique where the subjects do not know whether they are receiving treatment or a placebo.

Relative frequency histogram

has the same shape and the same horizontal scales as the corresponding frequency histogram. The differences is that the verticle scale measures relative frequencies, NOT frequencies.

Census

is a count for or measure of an entire population.

Parameter

is a numerical description of a population

The frequency f of a class

is the number of data entries in the class.

Stratified sample

members of the population are divided into 2 or more subsets, called strata that share similar characteristics such as age, gender, ethnicity or political preference. A sample is then randomly selected from each strata Example: a stratified sample of people living in West Ridge County: household are divided into income groups: Group 1 Low income Group 2 Middle income Group 3: High income

Control group

no treatment is applied

Cumulative frequency

the sum of the frequencies of that class and all previous classes

Sample size

which is the number of subjects in a study.

Paired data sets

for example: A data set contains the costs of an item and a second data set contains sales amounts for the item at each cost.

Pie Chart

Is a a circle that that is divided into sectors that represent categories. The area of each sector is proportional to the frequency of each category.

Frequency histogram

Is a bar graph that represents the frequency distribution of data set. A histogram has the following properties: 1. The horizontal scale is quantitative and measures the data values. 2. The vertical scale measures the frequencies of the classes. 3. Consecutive bars must touch

Sampling

Is a count or measure of part of a population and is more commonly used in statistical studies.

Cumulative frequency graph (ogive)

Is a line graph that displays the cumulative frequency of each class at its upper class boundary. The upper boundaries are marked on the horizontal axis, and the cumulative frequencies are marked on the vertical axis.

Statistic

Is a numerical description of a sample characteristic

Randomization

Is a process of randomly assigning subjects to different treatment groups.

Systematic sample

Is a sample in which each member of the population is assigned a number. The members of the population are ordered in some way, a starting number is randomly selected, and then sample members are selected at regular intervals from the starting number.

A simple random sample

Is a sample in which every possible sample of the same size has the same chance of being selected.

Sample

Is a subset, or part, of a population.

Chapter 2: Frequency Distribution

Is a table that shows classes or intervals of data entries with a count of the number of entries in each class.

Pareto Chart

Is a vertical bar graph In which the height of each bar represents frequency or relative frequency.

Survey

Is an investigation of one or more characteristics of a population. Most often it is carried out by people asking them questions.

Inferential statistics

Is the branch is statistics that involves using a sample to draw conclusions about a population. A basic tool in the study of inferential statistics probability.

Descriptive Statistic

Is the branch of statistics that involves the organization, summarization, and display of data.

Population

Is the collection of all outcomes, responses, measurements, or counts that are of interest.

Range

Is the difference between the maximum and minimum data.

Replication

Is the repetition of an experiment under the same or similar conditions.

Statistics

Is the science of collecting, organizing, analyzing, and interpreting data in order to make decisions.

Simulation

Is the use of a mathematical or physical model to reproduce the conditions of a situation or process. Car companies use dummies to study the effects of crashes on humans.

A random sample

Is when everyone in the population has an equal chance of being selected.

Class limits

The least and greatest number that can belong in a class.

An example of constructing a frequency distribution

The maximum value is a 98 The minimum value is a 47 Range - 98 - 47 = 51 51 / 5 classes = 10.2 (round up to 11 = class width) start with 47 (lower limit) + 11 = 58 + 11 = 69 + 11 = 80 + 11 = 91

Class boundaries

The numbers that separate classes without forming gaps between them.

Relative frequency

The portion of or percentage of the data that falls in that class. (class freq)/(sample size)

Determine if the survey question is biased. If the question is​ biased, suggest a better wording. Why is drinking water good for​ you?

The question is biased. The wording​ "How do you think drinking water affects your​ health?" would be better.

Determine whether the following statement is true or false. If it is​ false, rewrite it as a true statement. For data at the interval​ level, you cannot calculate meaningful differences between data entries

The statement is false. A true statement is​ "For data at the interval​ level, you can calculate meaningful differences between data​ entries."

Determine whether the statement is true or false. If it is​ false, rewrite it as a true statement. The midpoint of a class is the sum of its lower and upper limits divided by two.

The statement is true.

Cluster sample

When the population naturally falls into subgroups, each having similar characteristics., a cluster group may be more effective. Example: Different branches of the same bank. Different sections of the same course.

Matched pair design

Where subjects are paired up according to similarities

Scatter plot

Where the ordered pairs are graphed as points in a coordinate plane. A scatter plot is used to show the relationship between two quantitative variables

In terms of displaying​ data, how is a​ stem-and-leaf plot similar to a dot​ plot?

a. Both plots can be used to identify unusual data findings. b. Both plots show how data are distributed. d. Both plots can be used to determine specific data entries.

The following appear on a​ physician's intake form. Identify the level of measurement of the data. (a) Allergies (b) Temperature (c) Age (d) Change in health (scale of −5 to 5) a) What is the level of measurement for ​"Allergies​"? (b) What is the level of measurement for ​"Temperature​"? (c) What is the level of measurement for ​"Age​"? d) What is the level of measurement for ​"Change in health (scale of −5 to 5)​"?

a. Nominal. b. Intervel c. Ratio d. Ordinal

What potential sources of bias were​ present, if​ any? Select all that apply

a.) Individuals may have not been available when the researchers were calling. Those individuals that were available may have not been representative of the population. b.) Telephone sampling only includes people who had telephones. People who owned telephones may have been older or wealthier on​ average, and may not have been representative of the entire population. d.) Individuals may have refused to participate in the sample. This may have made the sample less representative of the population.

The temperatures (in °F) of air samples taken simultaneously over a glacier are shown below. 18.4, 22.2, 22.1, 18.9, 18.2, 18.8, 19.8, 20.5, 17.7 Determine whether the data are qualitative or quantitative and identify the data​ set's level of measurement. a.) are the data qualitative or quantitive? b.) What is the data​ set's level of​ measurement?

a.) Quantitive b.) Interval

Determine whether the study is an observational study or an experiment. Explain. In a survey of 1060 adults in a​ country, 58​% said the​ country's leader should release all medical information that might affect their ability to serve. The study is ______ because it ________ a treatment to the adults.

a.) observational b.) does not apply

The region of a country with the longest life expectancy for the past six years is shown below. Southeast Southwest Southwest Southwest Southwest Eastern Determine whether the data are qualitative or quantitative and identify the data​ set's level of measurement a.) are the data set quantitive or qualitative b.) What is the data set's level if measurement?

a.) qualitative b.) Nominal

Steam and leaf plot

are examples of exploratory data analysis (EDA), each number is separated into a stem (for instance, the entry's leftmost digits) and a leaf (for instance, the rightmost digit).


Ensembles d'études connexes

Security+ CompTIA Exam Questions

View Set

CHAPTER 15 - FEMALE REPRODUCTIVE, MATERNITY & NEWBORNS

View Set

Practice Questions for 401 Exam 1

View Set

Adrenergics (carvedilol, clonidine, doxazosin)

View Set