Statistics Chapter 1
Sample
A part of the population selected for study.
Categorical Variable
Categorical Variable has values that describe a 'quality' or 'characteristic' of a data unit, like 'what type' or 'which category'. Categorical variables fall into mutually exclusive - in one category or in another, and exhaustive - include all possible options categories. Therefore, categorical variables are qualitative variables and tend to be represented by a non-numeric value.
Multivariate data
Multivariate data is when we gather two or more attributes of a category.
Numerical Variable
Numeric variables have values that describe a measurable quantity as a number, like 'how many' or 'how much'. Therefore numeric variables are quantitative variables. They can be discrete or continuous.
Continuous Variable
A continuous variable is a numeric variable. Observations can take any value between a certain set of real numbers. The value given to an observation for a continuous variable can include values as small as the instrument of measurement allows. Examples of continuous variables include height, time, age, and temperature.
Univariate Data Set
A data set consisting of observations on a single characteristic is a univariate data set. Examples include number of students in class, the university's mascot, computer company, etc. A univariate data set is categorical - also called qualitative if the individual observations are categorical responses - whether with words and sometimes numerals. A univariate data set is numerical - also called quantitative if each observation is a number. Numerical variables always have units such as mpg, $, minutes and seconds, etc.
Discrete Variable
A discrete variable is a numeric variable. Observations can take a value based on a count from a set of distinct whole values. A discrete variable cannot take the value of a fraction between one value and the next closest value. Examples of discrete variables include the number of registered cars, number of business locations, and number of children in a family, all of of which measured as whole units i.e. 1, 2, 3 cars.
Data
Data is a collection of observations on one more variables.
Numerical Data: Discrete and Continuous
Discrete numerical data are isolated points - countable number of possible values along the number line examples includes # of credit hours, # of people in a household, price of milk, etc. Continuous numerical data are an entire interval of possible values along the number line examples include temperature, length, etc.
Frequency Distribution
If we are dealing with a categorical data set, a common way to present the data is in the form of a table. This is called a frequency distribution. A frequency distribution is a table that displays the possible categories along with the associated frequency and/or relative frequencies. The frequency for a particular category is the number of times the category appears in the data set. If the table includes relative frequencies, it is sometimes referred as a relative frequency distribution.
Bivariate Data
If we wanted to record the height and weight of each student, then we could gather pairs of data. This is known as a bivariate data set. This is important for finding correlation, least-squares regression line, etc.
Variability
Statistical methods allow us to collect, describe, analyze and draw conclusions from data. In other words, variability refers to how 'spread out' a group of scores is.
Data Analysis
The 3 tasks of statistics is collecting, summarizing, and analyzing.
Evaluating a Research Study
The 6 data analysis steps can also be used as a guide for evaluating published research studies. The following questions should be addressed as part of an evaluation: -What were the researchers trying to learn? What questions motivated their research? -Was relevant information collected? -Were the right things measured? -Were the data collected in a sensible way? -Were the data summarized in an appropriate way? -Was an appropriate method of analysis used, given the type of data and how the data were collected? -Are the conclusions drawn by the researchers supported by the data analysis?
Inferential Statistics
The beach of stats that involves generalizing from a sample to the population from which the sample was selected and assessing the reliability of such generalizations. When generalizing, we do run the risk of an incorrect conclusion.
Descriptive Statistics
The branch of stats that includes methods for organizing and summarizing data. This is usually done by the use of tables, graphs, or numerical summaries.
Population
The entire collection of individuals or measurements about which information is desired.
Process of Data Analysis
There are six steps. 1. Understanding the nature of the problem - the goal of research and what questions we want to answer. 2. What to measure and how to measure it - define the variables to be studied to develop appropriate methods for determining their values. 3. Data collection - This step is crucial. Is an existing data source adequate or whether new data must be collected? 4. Data summarization and preliminary analysis - using graphs and numbers. 5. Formal data analysis - Select and apply statistical methods. 6. Interpretation of results - Often leads to formulation of new research questions which leads us back to the first step.
Variable
Variable is a characteristic whose value may change from one observation to another.
Statistics
We will need to understand and use data to make decisions. To do this, we must be able to: 1. Decide whether existing data is adequate or whether additional information is required. 2. If necessary, collect more information in a reasonable and thoughtful way. 3. Summarize the available data in a useful and informative manner. 4. Analyze the available data. 5. Draw conclusions, make decisions, and assess the risk of an incorrect decision.