Stats 1: Complete Theory Notes
IIT Madras BS DS - Foundational Level All 12 Weeks Covered
Week 1: Introduction to Data
1.1 Population vs Sample
| Term | Definition |
|---|---|
| Population | Entire group of interest |
| Sample | Subset of population actually studied |
| Parameter | Numerical measure of population |
| Statistic | Numerical measure of sample |
1.2 Types of Statistics
| Type | Purpose | Example |
|---|---|---|
| Descriptive | Summarize data | βAverage score is 75β |
| Inferential | Make predictions about population | β95% of all students passβ |
1.3 Types of Variables
Categorical (Qualitative):
- Nominal: Names only (Gender, Color)
- Ordinal: Ordered categories (Grades: A > B > C)
Numerical (Quantitative):
- Discrete: Countable values (Number of students)
- Continuous: Any value in range (Height, Weight)
1.4 Scales of Measurement
| Scale | Properties | Examples | Operations |
|---|---|---|---|
| Nominal | Labels only | Gender, City | Mode |
| Ordinal | Order exists | Ratings, Grades | Mode, Median |
| Interval | Equal intervals, no true zero | Temperature (Β°C) | Mean, SD |
| Ratio | True zero exists | Height, Weight, Money | All operations |
Key Test: Can you say β is twice β? Only valid for Ratio.
1.5 Data Types
| Type | Description |
|---|---|
| Cross-sectional | Data at single point in time |
| Time-series | Data over time |
| Structured | Organized in rows/columns |
| Unstructured | No predefined format |
Week 2: Describing Categorical Data
2.1 Frequency Tables
| Term | Formula |
|---|---|
| Frequency | Count of occurrences |
| Relative Frequency | |
| Cumulative Frequency | Running total |
2.2 Visualizations
| Chart | Best For |
|---|---|
| Bar Chart | Comparing categories |
| Pie Chart | Showing proportions (parts of whole) |
| Pareto Chart | Bar chart with cumulative line |
2.3 Mode for Categorical Data
Mode = Most frequent category
In pie chart: Widest slice = Mode
Note: Mean and Median are NOT defined for nominal data!
2.4 Association (Contingency Tables)
Two variables are associated if the distribution of one changes depending on the value of the other.
Week 3: Measures of Central Tendency & Dispersion
3.1 Central Tendency
Mean (Average)
Weighted Mean:
Median
- Middle value when sorted
- If is even: Average of two middle values
- Robust to outliers
Mode
- Most frequent value
- Can have multiple modes (bimodal, multimodal)
3.2 Dispersion
Range
Variance
Population:
Sample:
Shortcut Formula:
Standard Deviation
Interquartile Range (IQR)
Where:
- = 25th percentile
- = Median = 50th percentile
- = 75th percentile
3.3 Linear Transformation
If :
Note: Adding constant does NOT change variance!
3.4 Outlier Detection
Values outside these bounds are outliers.
Week 4: Correlation & Regression
4.1 Covariance
- Positive: Both increase together
- Negative: One increases, other decreases
- Zero: No linear relationship
4.2 Correlation Coefficient (Pearsonβs r)
| Value of | Interpretation |
|---|---|
| Perfect positive linear | |
| Perfect negative linear | |
| $0.7 \leq | r |
| $0.3 \leq | r |
| $ | r |
Properties:
- Unitless
- Symmetric:
4.3 Linear Regression
Slope:
Intercept:
Week 5: Permutations & Combinations
5.1 Factorial
5.2 Permutations (Order Matters)
With Repetition (Identical Objects):
5.3 Combinations (Order Doesnβt Matter)
Week 6-8: Probability
6.1 Basic Probability
Properties:
6.2 Addition Rule
For mutually exclusive events:
6.3 Multiplication Rule
For independent events:
6.4 Conditional Probability
6.5 Bayesβ Theorem
Total Probability:
6.6 Independence vs Mutual Exclusivity
| Independent | Mutually Exclusive |
|---|---|
| Both can occur | Cannot occur together |
Week 9-10: Discrete Random Variables
9.1 Probability Mass Function (PMF)
for each value
Requirements:
9.2 Expected Value
Properties:
9.3 Variance
Properties:
- For independent :
Week 11-12: Special Distributions
11.1 Binomial Distribution
: trials, probability of success
11.2 Poisson Distribution
: Average rate
12.1 Uniform Distribution
12.2 Exponential Distribution
End of Stats 1 Theory. Good luck!