Stats 1: Complete Theory Notes

IIT Madras BS DS - Foundational Level All 12 Weeks Covered


Week 1: Introduction to Data

1.1 Population vs Sample

TermDefinition
PopulationEntire group of interest
SampleSubset of population actually studied
ParameterNumerical measure of population
StatisticNumerical measure of sample

1.2 Types of Statistics

TypePurposeExample
DescriptiveSummarize data”Average score is 75”
InferentialMake predictions about population”95% of all students pass”

1.3 Types of Variables

Categorical (Qualitative):

  • Nominal: Names only (Gender, Color)
  • Ordinal: Ordered categories (Grades: A > B > C)

Numerical (Quantitative):

  • Discrete: Countable values (Number of students)
  • Continuous: Any value in range (Height, Weight)

1.4 Scales of Measurement

ScalePropertiesExamplesOperations
NominalLabels onlyGender, CityMode
OrdinalOrder existsRatings, GradesMode, Median
IntervalEqual intervals, no true zeroTemperature (Β°C)Mean, SD
RatioTrue zero existsHeight, Weight, MoneyAll operations

Key Test: Can you say ” is twice β€œ? Only valid for Ratio.

1.5 Data Types

TypeDescription
Cross-sectionalData at single point in time
Time-seriesData over time
StructuredOrganized in rows/columns
UnstructuredNo predefined format

Week 2: Describing Categorical Data

2.1 Frequency Tables

TermFormula
FrequencyCount of occurrences
Relative Frequency
Cumulative FrequencyRunning total

2.2 Visualizations

ChartBest For
Bar ChartComparing categories
Pie ChartShowing proportions (parts of whole)
Pareto ChartBar chart with cumulative line

2.3 Mode for Categorical Data

Mode = Most frequent category

In pie chart: Widest slice = Mode

Note: Mean and Median are NOT defined for nominal data!

2.4 Association (Contingency Tables)

Two variables are associated if the distribution of one changes depending on the value of the other.


Week 3: Measures of Central Tendency & Dispersion

3.1 Central Tendency

Mean (Average)

Weighted Mean:

Median

  • Middle value when sorted
  • If is even: Average of two middle values
  • Robust to outliers

Mode

  • Most frequent value
  • Can have multiple modes (bimodal, multimodal)

3.2 Dispersion

Range

Variance

Population:

Sample:

Shortcut Formula:

Standard Deviation

Interquartile Range (IQR)

Where:

  • = 25th percentile
  • = Median = 50th percentile
  • = 75th percentile

3.3 Linear Transformation

If :

Note: Adding constant does NOT change variance!

3.4 Outlier Detection

Values outside these bounds are outliers.


Week 4: Correlation & Regression

4.1 Covariance

  • Positive: Both increase together
  • Negative: One increases, other decreases
  • Zero: No linear relationship

4.2 Correlation Coefficient (Pearson’s r)

Value of Interpretation
Perfect positive linear
Perfect negative linear
$0.7 \leqr
$0.3 \leqr
$r

Properties:

  • Unitless
  • Symmetric:

4.3 Linear Regression

Slope:

Intercept:


Week 5: Permutations & Combinations

5.1 Factorial

5.2 Permutations (Order Matters)

With Repetition (Identical Objects):

5.3 Combinations (Order Doesn’t Matter)


Week 6-8: Probability

6.1 Basic Probability

Properties:

6.2 Addition Rule

For mutually exclusive events:

6.3 Multiplication Rule

For independent events:

6.4 Conditional Probability

6.5 Bayes’ Theorem

Total Probability:

6.6 Independence vs Mutual Exclusivity

IndependentMutually Exclusive
Both can occurCannot occur together

Week 9-10: Discrete Random Variables

9.1 Probability Mass Function (PMF)

for each value

Requirements:

9.2 Expected Value

Properties:

9.3 Variance

Properties:

  • For independent :

Week 11-12: Special Distributions

11.1 Binomial Distribution

: trials, probability of success

11.2 Poisson Distribution

: Average rate

12.1 Uniform Distribution

12.2 Exponential Distribution


End of Stats 1 Theory. Good luck!