Machine Learning Fundamentals and Applications

Machine Learning is transforming industries and creating new possibilities. In this post, we’ll explore the fundamental concepts and practical applications of ML.

What is Machine Learning?

Machine Learning is a subset of AI that enables computers to learn and improve from experience without being explicitly programmed. It focuses on developing algorithms that can learn patterns from data.

Types of Machine Learning

1. Supervised Learning

Classification Example:

PYTHON
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import accuracy_score

# Load dataset
data = pd.read_csv('iris.csv')
X = data.drop('species', axis=1)
y = data['species']

# Split data
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2)

# Train model
model = RandomForestClassifier(n_estimators=100)
model.fit(X_train, y_train)

# Make predictions
predictions = model.predict(X_test)
accuracy = accuracy_score(y_test, predictions)
print(f"Accuracy: {accuracy:.2f}")
Click to expand and view more

Regression Example:

PYTHON
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error

# Simple linear regression
X = [[1], [2], [3], [4], [5]]
y = [2, 4, 6, 8, 10]

model = LinearRegression()
model.fit(X, y)

# Predict
prediction = model.predict([[6]])
print(f"Prediction for 6: {prediction[0]}")
Click to expand and view more

2. Unsupervised Learning

Clustering with K-Means:

PYTHON
from sklearn.cluster import KMeans
from sklearn.datasets import make_blobs
import matplotlib.pyplot as plt

# Generate sample data
X, _ = make_blobs(n_samples=300, centers=4, cluster_std=0.60, random_state=0)

# Apply K-means clustering
kmeans = KMeans(n_clusters=4, random_state=0)
clusters = kmeans.fit_predict(X)

# Visualize results
plt.scatter(X[:, 0], X[:, 1], c=clusters, cmap='viridis')
plt.scatter(kmeans.cluster_centers_[:, 0], kmeans.cluster_centers_[:, 1],
           s=300, c='red', marker='x')
plt.show()
Click to expand and view more

3. Deep Learning with Neural Networks

Simple Neural Network with TensorFlow:

PYTHON
import tensorflow as tf
from tensorflow import keras

# Define model
model = keras.Sequential([
    keras.layers.Dense(128, activation='relu', input_shape=(784,)),
    keras.layers.Dropout(0.2),
    keras.layers.Dense(10, activation='softmax')
])

# Compile model
model.compile(optimizer='adam',
              loss='sparse_categorical_crossentropy',
              metrics=['accuracy'])

# Train model
model.fit(X_train, y_train, epochs=5)

# Evaluate model
test_loss, test_acc = model.evaluate(X_test, y_test)
print(f"Test accuracy: {test_acc:.2f}")
Click to expand and view more

Data Preprocessing

Feature Scaling

PYTHON
from sklearn.preprocessing import StandardScaler, MinMaxScaler

# Standardization (mean=0, std=1)
scaler = StandardScaler()
X_scaled = scaler.fit_transform(X)

# Normalization (0 to 1)
minmax_scaler = MinMaxScaler()
X_normalized = minmax_scaler.fit_transform(X)
Click to expand and view more

Handling Missing Values

PYTHON
# Remove rows with missing values
df_clean = df.dropna()

# Fill missing values
df_filled = df.fillna(df.mean())

# Impute with KNN
from sklearn.impute import KNNImputer
imputer = KNNImputer(n_neighbors=5)
df_imputed = pd.DataFrame(imputer.fit_transform(df), columns=df.columns)
Click to expand and view more

Model Evaluation Metrics

Classification Metrics

PYTHON
from sklearn.metrics import classification_report, confusion_matrix

# Confusion Matrix
cm = confusion_matrix(y_test, predictions)
print("Confusion Matrix:")
print(cm)

# Classification Report
report = classification_report(y_test, predictions)
print("Classification Report:")
print(report)
Click to expand and view more

Regression Metrics

PYTHON
from sklearn.metrics import mean_absolute_error, r2_score

mae = mean_absolute_error(y_test, predictions)
r2 = r2_score(y_test, predictions)

print(f"Mean Absolute Error: {mae:.2f}")
print(f"R² Score: {r2:.2f}")
Click to expand and view more

Cross-Validation

PYTHON
from sklearn.model_selection import cross_val_score, KFold

# K-Fold Cross Validation
kf = KFold(n_splits=5, shuffle=True, random_state=42)
scores = cross_val_score(model, X, y, cv=kf, scoring='accuracy')

print(f"Cross-validation scores: {scores}")
print(f"Mean accuracy: {scores.mean():.2f} (+/- {scores.std() * 2:.2f})")
Click to expand and view more

Hyperparameter Tuning

PYTHON
from sklearn.model_selection import GridSearchCV

# Define parameter grid
param_grid = {
    'n_estimators': [50, 100, 200],
    'max_depth': [None, 10, 20, 30],
    'min_samples_split': [2, 5, 10]
}

# Grid search
grid_search = GridSearchCV(
    RandomForestClassifier(),
    param_grid,
    cv=5,
    scoring='accuracy'
)

grid_search.fit(X_train, y_train)
print(f"Best parameters: {grid_search.best_params_}")
print(f"Best score: {grid_search.best_score_:.2f}")
Click to expand and view more

Real-World Applications

1. Recommendation Systems

PYTHON
# Collaborative filtering with Surprise library
from surprise import Dataset, Reader, SVD
from surprise.model_selection import cross_validate

# Load data
reader = Reader(rating_scale=(1, 5))
data = Dataset.load_from_df(ratings_df, reader)

# Train SVD model
algo = SVD()
cross_validate(algo, data, measures=['RMSE', 'MAE'], cv=5, verbose=True)
Click to expand and view more

2. Natural Language Processing

PYTHON
from transformers import pipeline

# Sentiment analysis
classifier = pipeline('sentiment-analysis')
result = classifier("I love this product!")
print(result)  # [{'label': 'POSITIVE', 'score': 0.9998}]

# Text generation
generator = pipeline('text-generation', model='gpt2')
generated = generator("The future of AI is", max_length=50)
print(generated[0]['generated_text'])
Click to expand and view more

3. Computer Vision

PYTHON
import cv2
from tensorflow.keras.applications import ResNet50
from tensorflow.keras.preprocessing import image
from tensorflow.keras.applications.resnet50 import preprocess_input, decode_predictions
import numpy as np

# Load pre-trained model
model = ResNet50(weights='imagenet')

# Load and preprocess image
img_path = 'cat.jpg'
img = image.load_img(img_path, target_size=(224, 224))
x = image.img_to_array(img)
x = np.expand_dims(x, axis=0)
x = preprocess_input(x)

# Make prediction
preds = model.predict(x)
print('Predicted:', decode_predictions(preds, top=3)[0])
Click to expand and view more

Best Practices

  1. Start Simple: Begin with simple models before trying complex ones
  2. Data Quality: Clean, relevant data is more important than complex algorithms
  3. Feature Engineering: Spend time creating meaningful features
  4. Model Interpretability: Understand why your model makes predictions
  5. Continuous Learning: ML models need regular retraining with new data
  6. Ethical Considerations: Be aware of bias and fairness in your models

Future of Machine Learning

Machine Learning is a rapidly evolving field with endless possibilities. The key to success lies in understanding the fundamentals and continuously learning about new techniques and applications.

Copyright Notice

Author: Sanajit Jana

Link: https://sanajitjana.github.io/posts/machine-learning-fundamentals-and-applications/

License: CC BY-NC-SA 4.0

This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License. Please attribute the source, use non-commercially, and maintain the same license.

Start searching

Enter keywords to search articles

↑↓
ESC
⌘K Shortcut