What is Data Modelling
Simple Regression
A predictor based on the mean of one variable for best fit from the observed data. An approach for modelling the relationship between a dependent variable y and one or more explanatory variables (or independent variables) denoted X. linear regression fits a straight line through the set of n points in such a way that makes the sum of squared residuals of the model (that is, vertical distances between the points of the data set and the fitted line) as small as possible. The slope of the fitted line is equal to the correlation between y and x corrected by the ratio of standard deviations of these variables. The intercept of the fitted line is such that it passes through the mean (x, y) of the data points.
Multipul Regression
For more than one explanatory variable, the process is called multiple linear regression.
Regression
Predicts Y from X. The relation between selected values of x and observed values of y (from which the most probable value of y can be predicted for any value of x).
Y
The outcome, dependant variable, criterion variable
X
The variable, the predictor
Residuals
The vertical distances from the line to the points they are the "left-over" variation after a regression line is fit. = observed y - predicted y The relationship between the data points and the model data = model + error
Univariant
_______ one dependant variable
Questions about the Data
different formed questions about exploration or the relationships amongst variables, use the same underling statistical approach and come to the same significant conclusions.
Causation
only can design demonstrate causality, because analyses are essentially correlational. Analysis can be used to infer causal relationships between the independent and dependent variables.However this can lead to illusions or false relationships, so caution is advisable; for example, correlation does not imply ________.
Statistical Analysis Examine
relationships among variables and seek to understand the nature of the relationship, statistical analysis are correlational in nature. is a statistical process for estimating the relationships among variables. It includes many techniques for modeling and analyzing several variables, when the focus is on the relationship between a dependent variable and one or more independent variables (or 'predictors')
Models
statistical representation of a relationship between variables in diagrams, (scatterplot). How well the line fits the data tells us how accurate the model is.
GLM
t-test, regression,ANOVA, correlation statistical procedures that fall under, all of these procedures are basically the same
Linear Regression
the goal is prediction, or forecasting, or error reduction, used to fit a predictive model to an observed data set of y and X values. After developing such a model, if an additional value of X is then given without its accompanying value of y, the fitted model can be used to make a prediction of the value of y. Statistical method for finding the "line of best fit".
Modelling
the level of measurement is irrelevant, predictors can be continuously measured variables or categorical. From a statistical view the nature of data underling the predictors is irrelevant. There is flexibility with answering an array of questions within the same basic framework.