Topic 1.1 Introduction to Machine Learning

15 min 72 views Chapter 01 Comprehensive Guide to Machine Learning

📚 Topic 1.1: Introduction to Machine Learning

🎯 Overview

Machine learning is a powerful field that combines computer science, statistics, and mathematics to make predictions and discover patterns in data using algorithms and models.


📖 Core Concepts

🤖 What is Machine Learning?

Machine learning uses algorithms and models to:

  • 🎯 Make predictions
  • 🔍 Discover patterns in data
  • 📊 Apply knowledge to new data

Key Components:

Component Definition Example
Model 🧮 A mathematical function describing relationships between inputs and outputs Linear regression equation: y = mx + b
Algorithm ⚙️ A procedure or set of decision rules to carry out ML tasks Decision tree rules, gradient descent
Dataset 📁 Collection of information containing features and instances Rideshare trip data with prices, distances

📊 Data Structure Fundamentals

🗂️ Dataframes, Features, and Instances

Dataframe Structure:

┌─────────────────────────────────────────────────┐
│                   DATAFRAME                      │
├──────────┬──────────┬──────────┬──────────┬─────┤
│ Feature1 │ Feature2 │ Feature3 │ Feature4 │ ... │ ← FEATURES (Columns)
├──────────┼──────────┼──────────┼──────────┼─────┤
│   1.30   │   Uber   │ 2018-... │ Theatre  │ ... │ ← INSTANCE (Row 1)
├──────────┼──────────┼──────────┼──────────┼─────┤
│   1.35   │   Lyft   │ 2018-... │  South   │ ... │ ← INSTANCE (Row 2)
├──────────┼──────────┼──────────┼──────────┼─────┤
│   1.10   │   Lyft   │ 2018-... │Financial │ ... │ ← INSTANCE (Row 3)
└──────────┴──────────┴──────────┴──────────┴─────┘

Key Terminology:

Term Definition Visual Representation
Instance 📍 Individual data point or observational unit (ROW) Each rideshare trip
Feature 🏷️ Characteristic measured on an instance (COLUMN) Distance, price, cab_type
Dataset 📦 Collection of instances and features Complete rideshare data table

💻 Working with Dataframes in Python

📥 Importing Data with Pandas

Common Import Functions:

Function Purpose File Type
pd.read_csv() Import CSV files .csv
pd.read_excel() Import Excel files .xlsx, .xls
pd.read_json() Import JSON files .json

🔧 Essential Pandas Operations

Function Reference Table:

Function/Method Purpose Syntax Example Returns
pd.read_csv() Load CSV file pd.read_csv('file.csv') DataFrame
dataframe[['col']] Select column(s) df[['distance']] DataFrame
dataframe.iloc[x, y] Select by position df.iloc[0][1] Element/Series
dataframe.head() Show first rows df.head() DataFrame (first 5 rows)
: (slice notation) Define range df.iloc[:5, 1:3] DataFrame subset

📝 Code Example: Loading and Exploring Rideshare Data

# Import necessary library
import pandas as pd

# 📥 Load the rideshare dataset
rides = pd.read_csv('rideshare_data.csv')

# 👀 Display first 5 rows
print("First 5 rows of data:")
print(rides.head())

# 🎯 Select specific features (columns)
distance_data = rides[['distance']]  # Returns DataFrame
print("\nDistance column:")
print(distance_data.head())

# 🔍 Select multiple features
selected_features = rides[['distance', 'price', 'destination']]
print("\nSelected features:")
print(selected_features.head())

# 📍 Access specific element (row 0, column 1)
element = rides.iloc[0][1]
print(f"\nElement at position [0][1]: {element}")

# 📊 Slice data (first 5 rows, columns 1-3)
subset = rides.iloc[:5, 1:3]
print("\nSubset of data:")
print(subset)

# 📈 Get basic information
print("\nDataset shape:", rides.shape)  # (rows, columns)
print("Column names:", rides.columns.tolist())

Output:

First 5 rows of data:
   distance cab_type           time_stamp      destination  price  surge_multiplier
0      1.30     Uber  2018-12-01 13:08:04  Theatre District   17.5               1.0
1      1.35     Lyft  2018-11-29 12:22:57     South Station    7.0               1.0
2      1.10     Lyft  2018-12-18 09:15:09  Financial District  13.5               1.0
3      1.51     Lyft  2018-11-28 10:11:07     South Station   27.5               1.5
4      0.63     Uber  2018-11-26 20:08:09  Financial District   4.5               1.0

Distance column:
   distance
0      1.30
1      1.35
2      1.10
3      1.51
4      0.63

Dataset shape: (1000, 6)

🎯 Input and Output Features

📊 Visual Representation

┌─────────────────────────────────────────────────────┐
│              MACHINE LEARNING MODEL                  │
│                                                      │
│  INPUT FEATURES          MODEL          OUTPUT       │
│  (Explanatory) ────────► [🤖] ────────► (Target)    │
│                                                      │
│  • Distance                              • Price    │
│  • Time                                             │
│  • Location                                         │
│  • Vehicle Type                                     │
└─────────────────────────────────────────────────────┘

🔑 Key Definitions:

Feature Type Alternative Names Role Example
Input Features ⬅️ Explanatory features, Predictors, X Used to make predictions Distance, time, location
Output Feature ➡️ Target feature, Response, Y What we want to predict Price of rideshare

💡 Example: Rideshare Price Prediction

Task: Predict rideshare price based on distance

import pandas as pd
import matplotlib.pyplot as plt

# Sample rideshare data
data = {
    'distance': [0.5, 1.0, 1.5, 2.0, 2.5, 3.0, 3.5, 4.0, 4.5, 5.0],
    'price': [8][12][15][18][22][25][28][32][35][40]
}

rides = pd.DataFrame(data)

# 🎯 Define Input and Output
X = rides[['distance']]  # INPUT: Distance
y = rides[['price']]     # OUTPUT: Price

print("Input Features (X):")
print(X.head())
print("\nOutput Feature (y):")
print(y.head())

# 📊 Visualize relationship
plt.figure(figsize=(10, 6))
plt.scatter(rides['distance'], rides['price'], color='blue', s=100, alpha=0.6)
plt.xlabel('Distance (miles)', fontsize=12)
plt.ylabel('Price ($)', fontsize=12)
plt.title('Rideshare Price vs Distance', fontsize=14, fontweight='bold')
plt.grid(True, alpha=0.3)
plt.show()

Choose a Sub-Topic to Continue

This lesson contains multiple sub-topics. Click on any sub-topic below to read its content.

Up to Top