Statistik 1
Typvärdet
"mode"
Mutually Exhaustive Variales
exhaustive variables are variables that represent distinct categories or events, and no outcome can belong to more than one category simultaneously. In other words, the categories represented by mutually exhaustive variables do not overlap
Intervallskala
has equal intervals between values, but it lacks a true zero point. Temperature measured in Celsius or Fahrenheit is a common example. Differences between values are meaningful, but ratios are not. = when both order and equal differences matter but ratios don't
Sharpe-kvot visar
hur väl avkastningen kompenserar för risk - Ju större Sharpe-kvot, desto bättre kompenserartillgången investeraren för risk
When are P-values used
in hypothesis testing to determine the significance of the results. A small p-value suggests that the observed results are unlikely under the null hypothesis and provides evidence to reject it.
How to calculate lamda for exponential distributions
lamda = 1/mean = inverse of the mean
Typ II fel
låt bli att förkasta nollhypotesen när den är falsk
Z-score
measures the number of standard deviations a data point is from the mean of a dataset. Z-scores are typically used when the population standard deviation is known. They are commonly used in standard normal distribution calculations and in comparing data from different distributions.
Hypergeometric distribution
models the number of successes in a fixed number of draws without replacement from a finite population containing a specified number of successes and failures.
Binomial Distribution
models the number of successes in a fixed number of independent Bernoulli trials, where each trial has the same probability of success.
Exponential Distribution
often used to model the time until the next event occurs in a sequence of independent events that occur at a constant rate
Kvotskala
the highest level of measurement. It has a true zero point, where zero means the absence of the attribute being measured. Examples include height, weight, and income.
When are t-values used
used in situations where the population standard deviation is unknown and must be estimated from the sample, typically in smaller sample sizes.
Multiplikationsregeln
used to calculate the probability of the intersection of two or more independent events
Poisson Distribution
used to model the number of events that occur in a fixed interval of time or space. It's characterized by the fact that events occur independently of each other and at a constant rate.
T-värde
used when the population standard deviation is unknown and is estimated from the sample. T-values are used in t-tests, which are statistical tests for comparing the means of two groups.
When are Z-score used
used when working with a known population standard deviation, especially in cases where data is normally distributed.
Exhaustive variables
variables that cover all possible outcomes or events in a given scenario. In other words, when you have a set of exhaustive variables, every possible outcome must fall into one of the categories represented by those variables. There are no outcomes left unaccounted for
Beskrivande mått för slumpvariabler
väntevärde varians standardavvikelse
When should you use poisson distribution
when modeling the number of events occurring in a fixed interval of time or space.
When should you use binomial distribution
when modeling the number of successes in a fixed number of independent trials.
When should you use exponential distribution
when modeling the time until the next event occurs in a sequence of independent events.
When should you use Hypergeometric distribution
when sampling is done without replacement from a finite population containing a specified number of successes and failures.
Medelvärdet av absoluta differenser (MAD)
är genomsnittetav de absoluta differenserna mellan observationerna ochmedelvärdet
Den centrala gränsvärde satsen (CGS) säger att
stickprovsmedelvärden är normalfördelade om stickproven är tillräckligt stora
Empiriska regeln
0.6826 0.9544 0.9972
Additionsregeln
states that the probability of the union of two or more mutually exclusive events is equal to the sum of their individual probabilities.
Tre fall kring populationsvarianserna när det är två populationer
1. 𝜎1^2 och 𝜎2^2 är kända; standard normalfördelning 2. 𝜎1^2 och 𝜎2^2 är okända men antas lika; 𝑡𝑡-fördelning med df = 𝑛1 + 𝑛2 − 2 3. 𝜎1^2 och 𝜎2^2 är okända men inte antas lika; 𝑡𝑡-fördelning med approximativt df
vi kan välja att kalla en observation en outlier om dess z-score är mer/mindre än
3 eller −3.
Uttömmande (Exhaustive) Event
Om alla möjliga utfall för ett experiment finns blandhändelserna
Ömsesidigt uteslutande (Mutually exclusive) Event
Om de inte har några gemensamma utfall förexperimentet
P-value
Represents the probability of obtaining results as extreme as the observed results, assuming the null hypothesis is true. A smaller p-value suggests stronger evidence against the null hypothesis, leading to its rejection. P-values are used in hypothesis testing to determine the significance of the results.
explain chebyshevs theorem
The theorem states that for any dataset, regardless of its distribution, at least 1−1/𝑘^2 of the data lies within 𝑘 standard deviations of the mean, where 𝑘 is any positive number greater than 1. For example, if 𝑘=2, then at least 1−1/2^2=3/4 or 75% of the data lies within 2 standard deviations of the mean. If 𝑘=3, then at least 1−1/3^2=8/9 or 88.9% of the data lies within 3 standard deviations of the mean.
Typ I-fel
Typ I fel : förkasta nollhypotesen när den är sann
What is MAD (Mean Absolute Deviation)
a measure of the average absolute distance between each data value and the mean of a data set
En binomialfördelad slumpvariabel 𝑋 räknar
antallyckade försök i en serie om 𝑛 försök som är ❑ Oberoende ❑ Sannolikheten för lyckat försök är konstant mellanförsöken
Betingad Sannolikhet
refers to the probability of an event occurring given that another event has already occurred
Obetingad Sannolikhet
refers to the probability of an event occurring without any conditions or prior knowledge
Multiplikationsregeln
sannolikheten att både 𝐴 och 𝐵 inträffar
Kvartilavstånd (IQR)
skillnaden mellan den tredje och denförsta kvartilen (75% respektive 25% av data)
Det viktade medelvärdet beräknas som
𝑥 = Σ𝑤𝑖𝑥𝑖