Unit 3 Unit 6

Unit 5 - Jaccard Coefficient Calculations

Calculations:

The Jaccard coefficient is calculated using the formula:

J(A, B) = |A ∩ B| / |A ∪ B|

Where:

|A ∩ B|: The number of shared attributes (intersection).
|A ∪ B|: The total number of attributes (union).

Calculations:

Jack and Mary: |A ∩ B| = 3, |A ∪ B| = 7, J(Jack, Mary) = 3 / 7 ≈ 0.43
Jack and Jim: |A ∩ B| = 3, |A ∪ B| = 7, J(Jack, Jim) = 3 / 7 ≈ 0.43
Jim and Mary: |A ∩ B| = 2, |A ∪ B| = 8, J(Jim, Mary) = 2 / 8 = 0.25

Jaccard Coefficient Results

Jack and Mary: 0.43
Jack and Jim: 0.43
Jim and Mary: 0.25

Task Overview

Objective: Calculated Jaccard coefficients for pairs of individuals based on their pathological test results to evaluate similarity.

Key Pairs:

Jack and Mary
Jack and Jim
Jim and Mary

Learning Outcomes

1. Legal, Social, Ethical, and Professional Issues

Demonstrated understanding of ethical concerns in analyzing personal health data, such as ensuring privacy and preventing misuse.
Highlighted the importance of anonymizing sensitive datasets to maintain compliance with data protection laws like GDPR.
Addressed the role of similarity measures in decision-making systems (e.g., healthcare), ensuring unbiased and fair algorithms.

2. Dataset Applicability and Challenges

Discussed the challenges in working with incomplete or ambiguous data, as seen in entries like "N" (No) and "P" (Positive).
Emphasized the need for clear labeling and preprocessing to ensure data quality for machine learning applications.
Explored how similarity metrics like the Jaccard coefficient can aid in clustering or classification tasks.

3. Collaboration and Feedback

Participated in team discussions on the implications of similarity metrics in sensitive domains like healthcare.
Received feedback to improve clarity in handling missing or unclear data and reporting outcomes effectively.

Artefact: Jaccard Coefficient Calculations

Jack and Mary: Similarity score based on shared test results.
Jack and Jim: Highlighted differences in cough and test-1 results impacting the coefficient.
Jim and Mary: Discussed data points leading to minimal overlap and low similarity.