Chapter 1
Population parameter
A characteristic or measure of a population.
POPULATION VERSUS SAMPLE
A population consists of all items of interest in a statistical problem. A sample is a subset of the population. We analyze sample data and calculate a sample statistic to make inferences about the unknown population parameter.
Quantitative
A variable that assumes meaningful numerical values, Discrete or continuous, (interval and ratio scales)
CROSS-SECTIONAL DATA AND TIME SERIES DATA
Cross-sectional data contain values of a characteristic of many subjects at the same point or approximately the same point in time. Time series data contain values of a characteristic of a subject over time.
Three steps are essential for doing good statistics.
First, we have to find the right data, which are both complete and lacking any misrepresentation. Second, we must use the appropriate statistical tools, depending on the data at hand. Finally, an important ingredient of a well-executed statistical analysis is to clearly communicate numerical information into written language.
STRUCTURED DATA, UNSTRUCTURED DATA, AND BIG DATA
Structured data reside in a predefined row-column format, while unstructured data do not conform to a predefined row-column format. The term big data is used to describe a massive volume of both structured and unstructured data that are extremely difficult to manage, process, and analyze using traditional data processing tools. The availability of big data, however, does not necessarily imply complete (population) data.
Population
The complete collection of item of interest in a statistical problem.
Cross-sectional data
refers to data collected by recording a characteristic of many subjects at the same point in time, or without regard to differences in time.
Time series data
refers to data collected over several time periods focusing on certain groups of people, specific events, or objects. Time series can include hourly, daily, weekly, monthly, quarterly, or annual observations.
Descriptive statistics
refers to the summary of important aspects of a data set. This includes collecting data, organizing the data, and then presenting the data in the form of charts and tables. In addition, we often calculate numerical measures that summarize, for instance, the data's typical value and the data's variability.
Statistics
the methodology of extracting useful information from a data set.
Qualitative variable
we use labels or names to identify the distinguishing characteristic of each observation. include race, profession, type of business, the manufacturer of a car, and so on. (Nominal and ordinal scales)
Examples of descriptive statistics
The unemployment rate, the president's approval rating, the Dow Jones Industrial Average, batting averages, the crime rate, and the divorce rate *Mean*: - average of scores *Standard deviation*- indication of the possible deviations of the mean *variance* - how the values are dispersed around the mean the larger the variance, the dispersion of scores
Variable
When a characteristic of interest differs in kind or degree among various observations,
sample statistic
a random variable used to estimate the unknown population parameter of interest
Sample
a subset of the population, rely on sample data in order to make inferences about various characteristics of the population.
inferential statistics
drawing conclusions about a large set of data—called a population—based on a smaller set of sample data.
Statistics is used
make informed decisions based on data