📐 1. Descriptive Statistics

Descriptive statistics summarize and organize data so it can be easily understood.

Measures of Central Tendency

  • Mean (x̄): The arithmetic average. Sum of all values divided by the number of values.
  • Median: The middle value when data is ordered. Robust to outliers.
  • Mode: The most frequently occurring value. Can have multiple modes.
Mean: x̄ = Σxᵢ / n
Median: Middle value of sorted data (average of two middle values if n is even)

Measures of Spread (Variability)

  • Range: Max − Min (sensitive to outliers)
  • Variance (σ²): Average of squared deviations from the mean
  • Standard Deviation (σ): Square root of variance — same units as data
  • Interquartile Range (IQR): Q3 − Q1 — the middle 50% of data
Population Variance: σ² = Σ(xᵢ - μ)² / N
Sample Variance: s² = Σ(xᵢ - x̄)² / (n - 1) ← Bessel's correction
Standard Deviation: σ = √(σ²)
💡 Key Insight: We divide by (n−1) for sample variance to get an unbiased estimate of the population variance. This is called Bessel's correction.

Shape of Distributions

  • Skewness: Measures asymmetry. Right-skewed (positive) = tail to right, mean > median. Left-skewed (negative) = tail to left, mean < median.
  • Kurtosis: Measures "tailedness." High kurtosis = heavy tails, more outliers.

🎲 2. Probability

Probability is the mathematical framework for quantifying uncertainty.

Basic Rules

  • Addition Rule: P(A ∪ B) = P(A) + P(B) − P(A ∩ B)
  • Multiplication Rule: P(A ∩ B) = P(A) × P(B|A)
  • Complement: P(A') = 1 − P(A)
  • Independent Events: P(A ∩ B) = P(A) × P(B)

Conditional Probability & Bayes' Theorem

P(A|B) = P(A ∩ B) / P(B)

Bayes' Theorem: P(A|B) = [P(B|A) × P(A)] / P(B)
💡 Bayes' Theorem lets you update the probability of a hypothesis given new evidence. It "flips" conditional probabilities.

Common Distributions

DistributionTypeUse CaseKey Parameters
NormalContinuousNatural phenomena, CLTμ (mean), σ (std dev)
BinomialDiscreteSuccesses in n trialsn (trials), p (prob)
PoissonDiscreteEvents per intervalλ (rate)
t-distributionContinuousSmall samples, unknown σdf
Chi-SquareContinuousGoodness of fitdf

The Normal Distribution (68-95-99.7 Rule)

  • ~68% of data falls within 1 standard deviation of the mean
  • ~95% within 2 standard deviations
  • ~99.7% within 3 standard deviations
Z-score: z = (x - μ) / σ

📊 3. Inferential Statistics

Making conclusions about populations based on sample data.

Central Limit Theorem

💡 CLT: Regardless of the population distribution, the sampling distribution of the sample mean approaches a normal distribution as sample size increases (typically n ≥ 30).
Standard Error: SE = σ / √n

Confidence Intervals

CI = x̄ ± z* × (σ/√n) [known σ]
CI = x̄ ± t* × (s/√n) [unknown σ]

Common z*: 90%→1.645, 95%→1.96, 99%→2.576
⚠️ Common Misconception: A 95% CI does NOT mean there's a 95% probability the true parameter is in this interval. It means if we repeated the process many times, 95% of intervals would contain it.

Hypothesis Testing Steps

  1. State hypotheses: H₀ (null) and Hₐ (alternative)
  2. Choose significance level (α), typically 0.05
  3. Calculate test statistic
  4. Find p-value or compare to critical value
  5. Make decision: Reject or fail to reject H₀

Types of Errors

H₀ TrueH₀ False
Reject H₀Type I Error (α)✅ Correct (Power)
Fail to Reject✅ CorrectType II Error (β)
⚠️ Important: We never "accept" H₀ — we only "fail to reject" it. Absence of evidence is not evidence of absence.

📈 4. Regression & Correlation

Correlation

  • Pearson's r: Measures linear relationship (−1 to +1)
  • r = 0 means no linear relationship
⚠️ Correlation ≠ Causation! A correlation does not mean X causes Y.

Simple Linear Regression

ŷ = b₀ + b₁x

R² = proportion of variance in Y explained by X
Residual: e = y − ŷ

Assumptions (LINE)

  • Linearity
  • Independence
  • Normality of residuals
  • Equal variance (Homoscedasticity)

🔬 5. Study Design

Types of Studies

  • Observational: No intervention. Association only.
  • Experimental: Assigns treatments. Can establish causation.

Key Concepts

  • Random sampling: Generalizable results
  • Random assignment: Causal inference
  • Confounding variable: Related to both X and Y
  • Blinding: Prevents bias

Sampling Methods

MethodDescription
Simple RandomEqual chance of selection
StratifiedSample from subgroups
ClusterSelect entire groups
SystematicEvery kth member
ConvenienceWhoever is available (biased!)
Category

Loading...

Answer

👆 Click the card to flip it

1 / 1

📐 Central Tendency

Mean: x̄ = Σxᵢ / n

Median: Middle value (sorted)

Mode: Most frequent value

Right-skewed: Mean > Median

Left-skewed: Mean < Median

📏 Spread

Variance: s² = Σ(xᵢ−x̄)²/(n−1)

Std Dev: s = √s²

IQR: Q3 − Q1

Range: Max − Min

SE: σ/√n

🎲 Probability Rules

Addition: P(A∪B) = P(A)+P(B)−P(A∩B)

Multiplication: P(A∩B) = P(A)·P(B|A)

Complement: P(A') = 1−P(A)

Bayes: P(A|B) = P(B|A)·P(A)/P(B)

📊 Normal Distribution

Z-score: z = (x−μ)/σ

68% within ±1σ

95% within ±2σ

99.7% within ±3σ

🧪 Hypothesis Testing

z-test: z = (x̄−μ₀)/(σ/√n)

t-test: t = (x̄−μ₀)/(s/√n)

p ≤ α: Reject H₀

Type I (α): False positive

Type II (β): False negative

Power: 1 − β

📈 Regression

ŷ = b₀ + b₁x

R²: % variance explained

r: Correlation (−1 to +1)

Assumptions: LINE

📋 Confidence Intervals

CI = x̄ ± z*(σ/√n)

90%: z* = 1.645

95%: z* = 1.96

99%: z* = 2.576

Wider CI = more confidence, less precision

🔬 Study Design

Observational: Association only

Experimental: Can show causation

Random sampling → Generalizability

Random assignment → Causation

⚠️ Common Pitfalls

• Correlation ≠ Causation

• Never "accept" H₀

• Statistical ≠ Practical significance

• p-value ≠ P(H₀ is true)

• CI is about the method, not one interval