Consolidated Question Patterns & Abstractions: Statistics I (Weeks 1-4)

This document synthesizes the core problem types and mental algorithms for the first four weeks of Statistics I. Use this to rapidly identify what a question is asking and which tools you need to solve it.

📚 Table of Contents

  1. Week 1: Introduction to Statistics & Data
  2. Week 2: Describing Categorical Data
  3. Week 3: Describing Numerical Data
  4. Week 4: Association Between Two Variables

Week 1: Introduction to Statistics & Data

  • Core Idea: Learning the fundamental vocabulary to classify and understand the nature of data.
Pattern #Pattern NameFrequencyDifficultyCore Skill & Abstraction
1.1Population vs. Sample IdentificationHighEasyAbstract: Is it the entire group of interest (Population) or the subset you have data for (Sample)? Keywords: “all”, “every” vs. “selected”, “a group of”.
1.2Inferential vs. Descriptive LogicHighEasyAbstract: Is the statement just describing the sample, or is it inferring a conclusion about the whole population? Keywords: “the sample had
” vs. “we conclude that all
“.
1.3Classifying Variable TypesHighMediumAbstract: Apply the tests: 1. Math? (Can I average it? Yes=Numerical, No=Categorical). 2. Gaps? (Can it be a decimal? Yes=Continuous, No=Discrete).
1.4Identifying the Scale of MeasurementHighMediumAbstract: Use the NOIR framework: Nominal (names), Ordinal (order), Interval (equal intervals, no true zero), Ratio (true zero).

🧠 Week 1 Mental Algorithm: The Classification Checklist

When asked to classify a variable (e.g., “Education Level”):

  1. Triage: It’s a classification problem.
  2. Abstract & Act:
    • Math Test: Can I average “High School” and “PhD”? No. It’s Categorical.
    • Order Test: Is there a natural ranking? Yes, PhD > High School. It’s Ordinal.
    • Final Answer: Categorical, Ordinal Scale.

Week 2: Describing Categorical Data

  • Core Idea: Summarizing and visualizing data that falls into non-numerical groups using counts, proportions, and charts.
Pattern #Pattern NameFrequencyDifficultyCore Skill & Abstraction
2.1Calculating Frequencies & ProportionsHighEasyAbstract: Use the core relationship: Part = Whole × Percentage. Find the piece you’re missing.
2.2Identifying Measures of Central TendencyHighEasyAbstract: For nominal data, only the Mode (most frequent) is defined. Mean and Median require order/numbers and are not applicable.
2.3Choosing the Appropriate GraphHighMediumAbstract: Bar Chart to compare counts. Pie Chart to show percentages of a whole. Pareto Chart to identify the most important categories (a sorted bar chart).
2.4Interpreting Graphical RepresentationsMediumEasyAbstract: Read the values directly from the chart. For pie charts, convert percentages to counts if a total is given. For bar charts, read the axis labels carefully.

🧠 Week 2 Mental Algorithm: The Categorical Toolkit

When you see categorical data (e.g., a list of academies and the number of players in each):

  1. Triage: It’s a categorical description problem.
  2. Abstract & Act:
    • “What is the most common?” Find the Mode. Look for the highest bar in a bar chart or the biggest slice in a pie chart.
    • “What share/proportion
?” Calculate Relative Frequency: (Frequency of Category) / (Total).
    • “How to best visualize
?” If comparing counts, use a Bar Chart. If showing parts of a whole, a Pie Chart is an option. If prioritizing, a Pareto Chart.

Week 3: Describing Numerical Data

  • Core Idea: Calculating statistics that measure the “center” (central tendency) and “spread” (dispersion) of numerical datasets.
Pattern #Pattern NameFrequencyDifficultyCore Skill & Abstraction
3.1Calculating Center & SpreadHighMediumAbstract: Calculate Mean (average), Median (sorted middle), Mode (most frequent), Sample Variance (), and Sample Standard Deviation ().
3.2Correcting Mean and VarianceHighMediumAbstract: Work backward from the wrong statistic to find the wrong Sum or Sum of Squares. Correct the sum (Sum_correct = Sum_wrong - wrong_val + correct_val), then recalculate.
3.3Calculating Percentiles and QuartilesHighMediumAbstract: 1. Sort the data. 2. Find the median (Q2). 3. Find the median of the lower half (Q1). 4. Find the median of the upper half (Q3). 5. Calculate IQR = Q3 - Q1.
3.4Identifying OutliersMediumMediumAbstract: Calculate the fences: Lower = Q1 - 1.5*IQR and Upper = Q3 + 1.5*IQR. Any data point outside this range is an outlier.

🧠 Week 3 Mental Algorithm: The Numerical Description Flow

When given a list of numbers:

  1. Triage: It’s a numerical description problem.
  2. Abstract & Act:
    • First step is always to SORT the data.
    • “Find the center”: Calculate Mean and Median. If they are very different, it hints at skewness or outliers.
    • “Find the spread”: Calculate IQR (resistant to outliers) and Standard Deviation (sensitive to outliers).
    • “Check for outliers”: Use the IQR and the 1.5*IQR fence rule.

Week 4: Association Between Two Variables

  • Core Idea: Moving from describing one variable at a time to describing the relationship between two variables.
Pattern #Pattern NameFrequencyDifficultyCore Skill & Abstraction
4.1Calculating Covariance and CorrelationHighMediumAbstract: A procedural calculation, best done with a table. Find means, then deviations from the mean for both variables, then products of deviations. Sum these products to find covariance, then standardize to find correlation r.
4.2Interpreting the Correlation Coefficient rHighEasyAbstract: Look at the sign for direction (positive/negative) and the magnitude for strength (close to 1 or -1 is strong; close to 0 is weak).
4.3Analyzing Contingency TablesHighMediumAbstract: Differentiate between Marginal (uses grand totals in the denominator) and Conditional (uses a row or column total in the denominator) proportions. Read the question carefully to find the correct “whole”.
4.4Conceptual Understanding of CorrelationMediumEasyAbstract: Remember the key rules: Correlation ≠ Causation. Correlation only measures linear relationships. A perfect linear relationship means or .

🧠 Week 4 Mental Algorithm: The Relationship Analysis Flow

When given a dataset with two variables, X and Y:

  1. Triage: It’s a relationship/association problem.
  2. Abstract & Act:
    • Are X and Y both categorical? Build a Contingency Table. Analyze it by calculating conditional proportions.
    • Are X and Y both numerical? Your goal is to find the Correlation Coefficient r.
      1. Visualize with a Scatterplot in your mind. Does it seem to go up or down?
      2. Calculate Covariance. The sign will confirm your visual check.
      3. Calculate Standard Deviations for both X and Y.
      4. Calculate .
      5. Interpret r: State the direction (positive/negative) and strength (weak/moderate/strong).