The Ultimate Strategist’s Arsenal: Statistics I (Weeks 1-4) - The Complete Edition

This guide covers every identified question pattern from your assignments for Statistics I. Each pattern is broken down with the TAA Framework (Triage, Abstract, Act) to make the logic explicit and easy to follow.


Week 1: The Language of Data

  • Core Idea: Learning the fundamental vocabulary to classify and understand the nature of data.

Pattern 1.1: Population vs. Sample Distinction

  • Triage: “Does the problem distinguish between a large group of interest (e.g., “all students in India”) and a smaller, observed group (e.g., “four selected IITs”)?”
  • Abstract: Population is the whole group you want to know about. Sample is the smaller group you actually have data for.
  • Act (Execution):

    Problem: To study placements in India, data from 4 IITs is used. Identify the population and sample.

    1. Identify the Broad Goal: The study is about “placements in India”. This is the Population.
    2. Identify the Data Source: The data comes from “4 IITs”. This is the Sample. Final Answer: Population = All institutes in India; Sample = The 4 selected IITs.

Pattern 1.2: Inferential vs. Descriptive Statements

  • Triage: “Is the statement a simple summary of the collected data, or is it a broader conclusion about a group larger than the one measured?”
  • Abstract: Descriptive statistics describe the sample. Inferential statistics use the sample to infer something about the population.
  • Act (Execution):

    Problem: Based on the 4 IITs, a report concludes, “Placement in India is 95%.” Is this inferential or descriptive?

    1. Analyze the Scope: The conclusion is about “India” (population), but the data is only from “4 IITs” (sample).
    2. Conclusion: Generalizing from a sample to a population is the definition of Inferential Statistics. Final Answer: Inferential.

Pattern 1.3: Data Classification (NOIR Framework)

  • Triage: “Does the problem ask to ‘classify the variable’ or identify its ‘scale of measurement’?”
  • Abstract: Apply a checklist. 1. Math? (No Categorical, Yes Numerical). 2. Order? (No Nominal, Yes Ordinal). 3. True Zero? (Yes Ratio).
  • Act (Execution):

    Problem: Classify the variable “Stock price of a company”.

    1. Math Test: Can you average stock prices? Yes. Numerical.
    2. Gaps Test: Can a price be \rightarrow$ Continuous.
    3. True Zero Test: Does a price of \rightarrow$ Ratio Scale. Final Answer: Numerical, Continuous, Ratio Scale.

Week 2: Describing Categorical Data

  • Core Idea: Using counts, proportions, and special charts to summarize data that consists of labels.

Pattern 2.1: Frequency and Proportion Calculation

  • Triage: “Does the problem give counts or percentages for different categories and ask for a total or a specific share?”
  • Abstract: The core tool is the relationship: Part = Whole × Percentage.
  • Act (Execution):

    Problem: A pie chart shows subject marks out of a total of 500. Physics=25%, Maths=20%, Biology=18%. Find the total marks for these three subjects.

    1. Find the Total Percentage: 25% + 20% + 18% = 63% or 0.63.
    2. Apply Formula: Total Marks = 500 × 0.63 = 315. Final Answer: 315.

Pattern 2.2: Central Tendency for Categorical Data

  • Triage: “Does the problem ask for the ‘mean’, ‘median’, or ‘mode’ of non-numerical data (e.g., academy names)?”
  • Abstract: For nominal data, only the Mode (most frequent category) is defined. Mean and Median are mathematically meaningless.
  • Act (Execution):

    Problem: Player counts: Academy A(30), B(40), C(60), D(20), E(90). Find the mode.

    1. Find the Highest Frequency: The highest count is 90.
    2. Identify the Category: The category corresponding to the count of 90 is “Academy E”. Final Answer: The Mode is Academy E.

Week 3: Describing Numerical Data

  • Core Idea: Calculating the “center” (mean, median) and “spread” (standard deviation, IQR) of numerical data.

Pattern 3.1: Correcting Statistics After a Data Error

  • Triage: “Does the problem state a statistic is wrong due to a data entry error?”
  • Abstract: You can reverse-engineer the statistic. An incorrect mean reveals the incorrect sum. You can then fix the sum and recalculate.
  • Act (Execution):

    Problem: The mean of 6 observations is 19. A value of 11 was wrongly entered as 7. Find the correct mean.

    1. Find Incorrect Sum: Sum_incorrect = Mean × n = 19 × 6 = 114.
    2. Correct the Sum: Sum_correct = Sum_incorrect - Wrong_Value + Correct_Value = 114 - 7 + 11 = 118.
    3. Calculate Correct Mean: Mean_correct = Sum_correct / n = 118 / 6 ≈ 19.67. Final Answer: 19.67

Pattern 3.2: Quartiles, IQR, and Outliers

  • Triage: “Does the problem ask for ‘quartiles’, ‘IQR’, or to identify ‘outliers’?”
  • Abstract: This is a fixed, multi-step procedure: 1. Sort the data. 2. Find the quartiles (Q1, Q3). 3. Calculate IQR. 4. Build the outlier fences.
  • Act (Execution):

    Problem: For the data {30, 39, 44, 46, 73, 89, 91, 96, 112, 115}, find the IQR.

    1. Data is Sorted (n=10).
    2. Find Quartiles:
      • The lower half is {30, 39, 44, 46, 73}. Its median is the middle value. Q1 = 44.
      • The upper half is {89, 91, 96, 112, 115}. Its median is the middle value. Q3 = 96.
    3. Calculate IQR:
      • Formula: IQR = Q3 - Q1.
      • IQR = 96 - 44 = 52. Final Answer: 52.

Pattern 3.3: Effect of Data Transformation

  • Triage: “Does the problem ask what happens to the mean or variance if you add a constant to all data points?”
  • Abstract: Adding a constant k to every data point shifts the center but does not change the spread.
  • Act (Execution):

    Problem: The sample variance of prices {75, 25, 29, 75, 83, 24} is 812.17. What is the new sample variance if 4 rupees is added to all prices?

    1. Apply the Rule: Adding a constant to every data point shifts the entire dataset on the number line. The mean will increase by 4.
    2. Analyze Spread: However, the distance between each point and the new mean remains exactly the same as before. The “spread” is unchanged.
    3. Conclusion: Variance and standard deviation are measures of spread, so they do not change. Final Answer: The sample variance remains 812.17.

Week 4: Association Between Two Variables

  • Core Idea: Measuring the relationship between two variables.

Pattern 4.1: Analyzing Contingency Tables

  • Triage: “Do I see a two-way table of counts? Does the question ask for a proportion ‘of those who…’ or ‘given that…’?”

  • Abstract: This is a conditional probability problem. You must correctly identify your denominator. The condition given in the question (e.g., “of students in good economic conditions”) defines your new, smaller “whole”.

  • Act (Execution):

    Problem: Using the table below, what proportion of students in good economic conditions are borderline?

    Econ. ConditionBorderlineRow Total
    Good149377
    Poor104348
    1. Identify the “Whole” (Denominator): The condition is “in good economic conditions”, so our universe is limited to that row. The total for this row is 377.
    2. Identify the “Part” (Numerator): Within that row, the number of “borderline” students is 149.
    3. Calculate Proportion: Proportion = Part / Whole = 149 / 377 ≈ 0.40. Final Answer: 0.40 or 40%.

Pattern 4.2: Calculating & Interpreting Correlation

  • Triage: “Does it ask for the ‘correlation coefficient’, ‘covariance’, or the ‘strength and direction’ of a relationship between two numerical variables?”
  • Abstract: The correlation coefficient, r, is the ultimate measure. Its sign gives direction (positive/negative). Its magnitude (distance from 0) gives strength (weak/moderate/strong).
  • Act (Execution):

    Problem: Given OnePlus sales (X) and BBK sales (Y). The sample covariance . The sample standard deviations are and . Calculate and interpret the correlation coefficient.

    1. Identify the Tool: The formula for the correlation coefficient.
    2. Apply the Formula:
      • Formula:
      • .
    3. Interpret r:
      • Sign: The value is positive, so there is a positive linear relationship.
      • Magnitude: The value 0.496 is near 0.5, which is generally considered a moderate strength. Final Answer: The relationship is moderate and positive.