Experimental Analysis and Techniques Final PLSC 351

Lakukan tugas rumah & ujian kamu dengan baik sekarang menggunakan Quizwiz!

Logarithmic (transfrom data)

The logarithmic transformation is particularly useful when the data has a wide range of values and is positively skewed. - Used when factor effects are multiplicative rather than additive - Add a constant - Excel -LOG(value+1) - Removes big gaps --> better distribution

Square Root (transform data)

The square root transformation is particularly useful when the data has a skewed distribution - When variances = means - addition of a small constant - excel: SQRT(value -> 0.5)

Post Hoc test for ANOVA

Tukey HSD test *best* - Controls risk of alpha error - sample sized are not even Student Newman Kuels Test - called multiple range test - controls risk of alpha error (little less than tukey) Least significant differences (LSD) - Most likely to commit alpha error - Doesn't pool risk, an error inflates - Looked down upon Scheffe's Multiple contracts - more conservative - frequency used w/ uneven sample sizes Dunnett's test - tests for differences between a control & other treatments - raises risk of making a beta error

Two-Factor ANOVA

We want to determine whether two different factors, soil type and water availability, have an effect on the yield of tomato plants. Ho: There was no interaction between nitrogen and cultivar on plant tissue weight Ha: There was an interaction between nitrogen and cultivar on plant tissue weight Ho: There is no interaction between independent variable and independent variable b dependent variable.

ANOVA Model 2: Random-Effects Model

a random-effects ANOVA model is used when the groups being compared are not predetermined, but rather are randomly selected from a larger population - Levels of a facto are random - not common in agriculture - multiple comparisons are complicated Ex. Year, Block

Precision

closeness of repeated measurements

Accuracy

nearness to the actual value being measured

Distribution

the arrangement & Frequency of data points - often represented as a curve

ANOVA Model 3: Mixed Model

when there are both fixed and random factors that are expected to influence the outcome variable. - Both Fixed & random effects present - Blocks common in agriculture - formulas change *correct formula for F to get proper P* Ex. Fertilizer rate & Year

Sample = s

- A subset of a population used for analysis - needs to represent the population - random sampling is important to avoid bias

descripitve stats

- Central Tendency - Dispersion - Standard Deviation - Coefficient of variation Stats lacking dispersion descriptions are dangerous (naked stats)

Data Collection

- Collect proper data - does data measure what I want to observe? - is data collection feasible? - is sample large enough to be meaningful? - Make sure the values are legible (no ink) - Check accuracy of data entry

Nonparametric ANOVA

Kruskal-Wallis Test If you have normal data an ANOVA test is more powerful than a Kruskal-Wallis Test. If you have non-normal data a Kruskal-Wallis Test may be more powerful than an ANOVA - Randomized and independent treatments and samples - Samples have the same dispersion and shape Nemenyi Test If you reject the null hypothesis you can move on to a nomparametric Multiple Comparisons Test Uses rank data to calculate differences between treatments in a manner similar to the Tukey Test. 1. Find the sum of rankings for each group 2. Set up comparisons (start with the largest difference) 3. Calculate differences 4. Compute a standard error (SE for the experiment 5. Calculate the q-value for each difference 6. Look up the critical value for q

Block design (RCBD w/o replication)

- Randomized Complete Block Design without replication - Controls for error - Blocks set up perpendicular to gradient of variation - if conditions uniform, using blocks lower power of statistical tests - every block contains every treatment

Replication

- Required meaningful conclusions - Within experiment repetition - experimental unit must be identified - experimental units must be independent - use equal sample sizes

Software

- Specialized programs for data handling and analysis - ability to mandate complex designs - can run a wide range of tests

Psuedoreplication

- The illusion of replication Ex. Measuring mpg in 1 car

Completely Randomized Design (CRD)

- Units can be anywhere - (need uniform conditions) when low variation exists in experimental units or the environment it is very efficient - susceptible to deceptive outcomes when conditions are not uniform

Chi-Square Test of Independence

- Used w/ 2 variables - does 1 variable influence another variable? - Ho: no relationship between variables - Ha: there is a relationship between the variables - df = (# of categories in first variable - 1) * (# of categories in the second variable) - Chi-Squared best suited for higher sample sizes (1000+)

Chi-Square Goodness of Fit Test

- Used w/ one variable - Ho: sample fits a specified distribution - df = # of categories - 1 Ex. Apple trees sold on central coast (t-bud, cleft)

Hidden Variables

- Variables not tested - could be meaningless or very impactful

Nonparametric tests for Two-Sample Dependent T-test

- Wilcoxen single-sample test - Wilcoxen matched pairs test - Mann-Whitney U test

Population = p

- an exclusive group for which conclusions are desired - small or large Ex. Lizards in CA, Martin Guitars

Non-nominal data (transform data)

- data that is not categorical 1. Logarithmic 2. Square root 3. Arcsine

Randomized Complete Block Design (RCBD w/ replication)

- each block has multiple replication

Data Handling

- look for patterns in data sets - outliers - trends - scatter plots is a good way to see trends

Analyzing Nominal data

- nomincal data can be though of as count data - each experimental unit is counted (placed in a category) - Chi-square is one key concept of analysis of counts *nominal = counts of expresses unit that fit into categories*

Correlation coefficient (r)

- now closely the 2 variables relate to each other - scale of ( 10 to -1 ) ( 0 = no correction) - can be calculated by values or ranks - looking to see how well the data fits a slope

Sample Size

- sample size required to observe difference can be calculated before or after test - calculated or estimated parameters - g-power to predict sample sizes - choosing - previous experience - calculations - 4 is very common *more is not always better*

Arcsine (transform data)

- useful when the data represents proportions or percentages, Ex. The proportion of people who prefer one brand over another or the percentage of a population that has a certain characteristic.

Correlation & regression

- very similar - same statistical overlap w/ test - goals are different *we look at simple linear correlation/regression* - 2 factors w/ ratio & interval data - nominal data pairs ratio & interval data together - examine relationships between factors

Randomized complete block design

- w/ no replication - can't evaluate an interaction - only 2 sets of hypotheses Ex. independent facrpt 1: fertilizer independent factor 2: block

ANOVA Model 1: Fixed-effects model

-Classic - Levels of a factor are specifically chosen - Uses formulas we have discussed in class - Can be done again Ex. Fertilizer rates, Culitvars, Irrigation Rates, pesticide application

Analysis of Variance (ANOVA)

-R.A. Fisher - Tests if means of samples are the same - Power of ANOVA is higher when sample sizes are equal - Variances of samples are homogenous - error is randomly and independently distributed - main effects are additive (dramatic change = alters test)

Chi-Squared

Large / small drinks vs. Happy / Sad

Chi-Squared (relationship or not)

Large / small drinks vs. happy / sad

Independent T-Test

One-Sample: Asses how well a sample represents a hypothetical population Two-sample: Assesses the likelihood 2-samples came from the same population - Ratio or Interval Data - Normal distribution - Samples are independent of each other - Sample have equal variances

ANOVA & Tukey

Strawberry harvest (weight ) for 2 cultivars w/ 3 irrigation methods measurements

2 sample independent t test

Strawberry harvest (weight) for 2 cultivars

Ways experiments get ruined

1. Lack of randomization 2. Lack of replication 3. Wrong Variables 4. Poor data collection procedures / techniques 5. Disturbance 6. environmental - input variation that is not evenly distributed

Central Tendency

1. Mean 2. Median 3. Mode 4. Quantiles (percentile/decile/quartiles)

Scientific Method

1. Observation 2. Conjecture 3. Testing 4. Analysis 5. Retesting

Dispersion

1. range (high & low, high - low) 2. variance (mean sum of squares)

Nonparametric ANOVA (If the data is not normal)

1. transform data 2. use nonparametric tests - doesn't have normal distribution - lacking the estimation of parameters - based on ranks

2-Factor ANOVA w/ RCBD with replication

6 blocks w/ 5 radish cultivars

Alternate hypothesis (Ha)

A statement expressing opposition to the null hypothesis - results differ

Null Hypothesis (Ho)

A statement expressing that no difference / relationship was observed Options: reject or fail to reject

T-Distribution

Allows samples to be tested when the population mean and standard deviation are unknown - works well w/ small samples - if t-score increases the p-value decreases

Correlation

Artichoke sales & lawsuits in SLO

Stats

Branch of math dealing collection and presentation of masses of numerical data

Two-Sample Independent T-Test

Ex. Compare the average height of two different varieties of tomato plants, variety A and variety B. (Ho) is that there is no difference in the average height between the two varieties of tomato plants. - samples are not related to eachother

Single-Factor/One-Way ANOVA

Ex. Compare the growth rates of three different varieties of corn plants, labeled A, B, and C. (Ho) is that there is no difference in the mean height of the three varieties of corn plants. - Samples must be independent - Equal variances

Two-Sample Dependent T-test

Ex. We want to determine whether a new fertilizer has an effect on the growth rate of tomato plants. The measurements are paired because each plant has two measurements: one before applying the fertilizer and one after applying the fertilizer. - Used to test the difference between 2 dependent sample means - Ratio or Interval Data - Normally distributed population - Samples must be correlated to eachother

Tukey HSD Test

HSD = Honest Significant difference We have conducted a Single-Factor ANOVA on the yield of three different varieties of potato plants, labeled A, B, and C. We have found that there is a statistically significant difference in the mean yield of the three varieties. To determine which specific varieties are different from each other, we would conduct Tukey's HSD test as a follow-up analysis

Sample Validity

If population data is known, samples can be compared

Fields in ag:

- Designing in orchards/ fields/ plots - be aware of error - remember importance of randomization & independence

Random sampling

- Fruit harvest - What is meaningful? - What is practical? - How can error be controlled? - Sampling - Boxes, crates, stems, sorting machines - be aware of bias - sampling Techniques - Machine selected - assign #'s (random selection) - blind draw - develop a sampling procedure

Randomization

- Helps to avoid bias - Essentials for validity of test - Should not replace common sense - use random # generatirs - consider set-up, data collection & harvesting

Continuous

- Interval and Ratio Data - Decimals used - Reading between the point Ex. Height, Weight

Discrete

- Interval and Ratio Data - Whole #'s used - Fixed points Ex. Leaves

T-tests

- Interval or Ratio data - Samples come from a population w/ normal distribution

independent variable

- Known Variables - Input Variable - treatment " " - predictor " " - explaining factors Ex. Cultivars, Academic Standing, Fertilizer ppm

Regression

- Measures how two variables relate to each other (scale: unlimited) - Shows the change in the Y variable that occurs for one unit of the x variable - Range is infinity Ex. avocado is worth two bucks - Graphed shows perfect predictive power - β=2 -Not that predictable in real experimental data but trying to find closest correlation Ex. Sparrow age and wing length - (β) for every day the sparrow lives, it gains 0.27 cm β=0.27

Correlation of Determination (r^2)

- Measures proportion of the variation in the dependent variable that is associated with the variation of the independent variable (scale 0-1, nothing-100%) - lowercase = simple, - Adjusted r^2 is more conservative - Determining how much you can explain

Correlation

- Measuring a linear relationship between two INDEPENDENT variables, no dependency implied - 2 variables - pos. correlation: one increases = other increases - neg. correlation: one increases = other decreases

Error

- Natural variation - Should be prevented - Use a control

Ratio/Proportions

- Nominal data (categorical) - categories based on quality NOT a measurement - relationship between data points is NOT numerical - an attribute (ex. color, sex, location) - status (dead/alive, right/left) - Experimental design - think about observation & experimental units - data handling - confirm proper design - check tests assumptions (normalitu, variances, effects) - trandrom is an options ( arcsine trandformation) - if assumptions met, general linear models (GLM) - consuming a statistician is recommended - data can be handled as counts & chi-square tests

Deceptive Claims

- Only Positive results recorded - Overreaching conclusions (out of context) - Summary data misleading - No replication - psuedoreplication - experimental malpractice

Dependent variables

- Outcome Variable - output " " - trying to explain / predict / describe

Inductive/Inferential function

- Predicting - Past Results might not be the same to the future results

ordinal data (categorical data)

- Qualitative - Categories based on logical order Ex. (1st, 2nd, 3rd, 4th year) Likert Scale: (strongly dislike -> strongly like)

nominal data (categorical data)

- Qualitative - categories based on quality - Not a measurement Ex. Sex, Color, Location, Alive/Dead, Right/Left

ratio data (numerical data)

- Quantitative - Constant interval size - True 0 Ex. Height, Weight, Petal Count

Interval Data (Numerical Data)

- Quantitative - #'s that have meaning - Consistent interval size - No true zero Ex. °C, °F


Set pelajaran terkait

health assessment quiz questions

View Set

Lean Systems and Six-Sigma Quality

View Set

LWW - Ch. 47: Mgmt of Patients With Intestinal and Rectal Disorders

View Set

Tuckman's Group Development Stages

View Set