Unit 5

Unit 3 - Correlation and Regression

This unit demonstrates development throughout the module, showcasing key artefacts and reflections:

Legal, Social, Ethical, and Professional Issues

Explored challenges such as data privacy (GDPR compliance), algorithmic bias, and accountability in machine learning applications. Discussions and wiki submissions emphasized fairness, transparency, and the impact of societal biases embedded in datasets.

Applicability and Dataset Challenges

Analyzed dataset-specific challenges like class imbalances, missing data, and representation biases. Highlighted preprocessing steps and the importance of selecting domain-appropriate datasets to enhance algorithm performance.

Collaboration and Feedback

Team meeting notes reflect active participation in peer discussions, focusing on ethical considerations and practical ML challenges. Peer and tutor feedback informed iterative improvements in assignments and models.

Task-Specific Artefacts

  • Covariance and Correlation: Explored statistical relationships between variables to understand data trends.
  • Linear Regression: Implemented simple linear regression to identify and interpret data patterns.
  • Multiple Linear Regression: Demonstrated the impact of multiple predictors on model accuracy and feature importance.
  • Polynomial Regression: Addressed non-linear data trends and assessed model generalization.

Code Showcase

							
# Calculating covariance and correlation
import numpy as np
import pandas as pd

# Example dataset
data = {'X': [1, 2, 3, 4, 5], 'Y': [2, 4, 5, 4, 5]}
df = pd.DataFrame(data)

# Covariance
covariance = np.cov(df['X'], df['Y'])[0, 1]
print("Covariance:", covariance)

# Correlation
correlation = np.corrcoef(df['X'], df['Y'])[0, 1]
print("Correlation:", correlation)
							
	
							
# Simple linear regression example
from sklearn.linear_model import LinearRegression
import matplotlib.pyplot as plt

# Prepare data
X = df['X'].values.reshape(-1, 1)
Y = df['Y'].values

# Train the model
model = LinearRegression()
model.fit(X, Y)

# Predictions
Y_pred = model.predict(X)

# Visualize
plt.scatter(df['X'], df['Y'], color='blue')
plt.plot(df['X'], Y_pred, color='red')
plt.title('Linear Regression')
plt.xlabel('X')
plt.ylabel('Y')
plt.show()