BSTAT TEST 1 REVIEW

Réussis tes devoirs et examens dès maintenant avec Quizwiz!

what do we call when we say drawing conclusions about a large set of data?

population

Numerical variable

A variable that assumes meaningful numerical values for observations.

Continuous (random) variable

A variable that assumes uncountable values in an interval.

Categorical variable

A variable that uses labels or names to identify the distinguishing characteristics of observations.

what are the 2 ways we cannot use population data only sampling?

1. Obtaining information on the entire population is expensive 2. It is impossible to examine every member of the population.

Population parameter

A characteristic of a population. Number that describes something about an entire group or population

Variable

A general characteristic being observed on a set of people, objects, or events, where each observation varies in kind or degree.

Line Chart

A graph that connects the consecutive observations of a numerical variable with a line.

Bar chart

A graph that depicts the frequency or relative frequency of each category of a categorical variable as a series of horizontal or vertical bars, the lengths of which are proportional to the values that are to be depicted.

Histogram

A graphical depiction of a frequency or a relative frequency distribution for a numerical variable; series of rectangles where the width and height of each rectangle represent the interval width and frequency (or relative frequency) of the respective interval.

Discrete (random) variable

A variable that assumes a countable number of values.

big data

A massive volume of both structured and unstructured data that are often difficult to manage, process, and analyze using traditional data processing tools.

Scatterplot with a categorical variable

A modification of a basic scatterplot that incorporates a categorical variable.

Pie chart

A segmented circle portraying the categories and relative sizes for a categorical variable.

Frequency distribution

A table that groups the observations of a variable into categories or intervals and records the number of observations that fall into each category or interval.

Contingency table

A table that shows frequencies for two categorical variables, x and y, where each cell represents a mutually exclusive combination of the pair of x and y observations.

Stem-and-leaf diagram

A visual method of displaying quantitative data where each value of a data set is separated into two parts: a stem, which consists of the leftmost digits, and a leaf, which consists of the last digit.

Myers-Briggs assessment breaks down personality types into four categories

Analyst, Diplomat, Explorer, Sentinel

structured data

Data that conform to a predefined row-column format.

what is one example of interval scale?

Fahrenheit scale for temperatures

Polygon

For a numerical variable, a graph of a frequency or relative frequency distribution in which lines connect a series of neighboring points, where each point represents the midpoint of a particular interval and its associated frequency or relative frequency.

Stacked column chart

Graph of a contingency table; depicts more than one categorical variable and allows for the comparison of composition within each category.

approximation formula to find the width of each interval

Minimum - Maximum / number of intervals

Interval scale

Observations of a variable can be categorized and ranked, and differences between observations are meaningful.

Ratio scale

Observations of a variable can be categorized and ranked, differences between observations are meaningful, and a true zero point (origin) exists.

Ordinal scale

Observations of a variable can be categorized and ranked.

Nominal scale

Observations of a variable differ merely by name or label.

Volume

One of the V's describing big data; an immense amount of data is compiled from a single source or a wide range of sources.

Variety

One of the V's describing big data; data come in all types, forms, and granularity.

Velocity

One of the V's describing big data; data from a variety of sources get generated at a rapid speed.

Value

One of the V's describing big data; information derived from big data should have value.

Veracity

One of the V's describing big data; refers to the credibility and quality of data.

Subsetting

The process of extracting a portion of a data set.

Omission strategy

When missing values exist, this strategy recommends excluding these observations from subsequent analysis.

Imputation strategy

When missing values exist, this strategy recommends replacing them with some reasonable imputed values.

Scatterplot

a graphed cluster of dots, each of which represents the values of two variables

Ogive

a line graph of a cumulative frequency or cumulative relative frequency distribution.

what does sample?

a subset of the population

what does population consist of?

all items of interest in a statistical problem

what are variables classified as?

categorical (qualitative) or numerical (quantitative)

what are nominal and ordinal scales use for?

categorical variables

COUNTIF function

count the number of employees in each industry

Cross-sectional data

data collected by recording a characteristic of many subjects at the same point in time, or without regard to differences in time

Time series data

data collected over several time periods focusing on certain groups of people, specific events, or objects.

2 branches of statistics

descriptive statistics and inferential statistics

what are 2 of the numerical variables?

discrete or continuous.

unstructured data (or unmodeled data)

do not conform to a predefined, row-column format

What is inferential statistics?

drawing conclusions about a large set of data.

COUNTA

function counts the number of cells that are not empty and is applicable to all four variables.

What is the Phenomenal growth in statistics?

inferential statistics

distribution is not symmetric

it is either positively skewed or negatively skewed

symmetric distribution

mirror image of itself on both sides of its center

one of four major measurement scales

nominal, ordinal, interval, or ratio

what are examples of structured data?

numbers, dates, and groups of words and numbers, typically stored in a tabular format.

what are interval and ratio scales use for?

numerical variables

ratio scale

represents the strongest level of measurement.

if the set is smaller what do we call it ?

sample data

descriptive statistics

summary of important aspects of a data set. Ex: collecting data, organizing the data, and then presenting the data in the form of charts and tables.

COUNT function

the number of cells that contain numeric observations and, therefore, can only apply to the EmployeeID and Wage variables.

Exel Function COUNT and COUNTA

to inspect the number of observations in each column

what are examples of unstructured data?

written reports, e-mail messages, doctor's notes, or open-ended survey responses etc.


Ensembles d'études connexes

Chapter 5 Exam: Underwriting and Policy Issue

View Set

6-1: Interest Groups and American Government

View Set

MEGA/MOCA exam flash cards early childhood education learning across curriculum ALL subjects

View Set

Nutrition ATI Final PT1 (quiz bank)

View Set

Chapter 3- Introduction to Entrepreneurship: MindTap Assignments

View Set

MU NURS 615: Advanced Pharm - Exam 4 Review

View Set