Heart_Disease_Prediction_ML

By Priyanka Roy

This is a Heart Disease Data Set, collected from the UCI Machine Learning Repository. The complete collection consists of four individual databases collected from four different institutions located in Cleveland, Hungary, Switzerland, and the VA Long Beach. This database contains 76 attributes, but all published experiments refer to using a subset of 14 of them. In particular, the Cleveland database is the only one that has been used by ML researchers to this date. The "goal" field refers to the presence of heart disease in the patient.

Features of this dataset:

age: person's age in years
sex: person's sex- 1 for male and 0 for female
cp: chest pain type (4 values)- typical angina, atypical angina, non-anginal pain, asymptomatic
trestbps: resting blood pressure
chol: serum cholestoral in mg/dl
fbs: fasting blood sugar > 120 mg/dl (1- T ; 0- F)
testecg: resting electrocardiographic results where, 0 = normal, 1 = having ST-T wave abnormality (T wave inversions and/or ST elevation or depression of > 0.05 mV) and 2 = showing probable or definite left ventricular hypertrophy by Estes' criteria
thalach: maximum heart rate achieved
exang: exercise induced angina
oldpeak: oldpeak = ST depression induced by exercise relative to rest
slope: the slope of the peak exercise ST segment (1 = upsloping, 2 = flat, 3 = downsloping)
ca: number of major vessels (0-3) colored by flourosopy
thal: 3 = normal; 6 = fixed defect; 7 = reversable defect
hd: heart disease, which is our target variable (0 = no, 1 = yes).

Main Objective(s):

Explaining the features and target variable
Deal with the missing values
Performing One-hot Encoding
Splitting the dataset into 80% training and 20% for testing. Adding random state and stratification
Training with Decision Tree Classification
Tree diagram of the Decision Tree
Showing the Confusion Matrix, Classification report, and ROC-AUC &
Determining the accuracy of the ML model
Explaining the model

Observation(s)/Outcome Analysis:

The recall rate is about 93% , that is, still 7% of the patients who has heart disease will be diagnosed wrong. But as of now, it's good enough to continue.
If the AUC is in the range of 0.5~1, that means the model has value of prediction. The higher the score is, the better the model is. Moreover, the model will be a perfect classifier if the AUC equals to one. We got almost 0.9 of the AUC score, not bad at all!
This model has an accuracy of 88.33% (with random state= 63 as we found some variations with different state no. but this is the highest accuracy state) which explains that while dealing with the dataset this model is preety handy because we know, when the accuracy rate of a model > 85% we can call that model good enough. Although, many more ML algorithms might result more fruitful considering a better accuracy rate which could be explored later.

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
Heart_Disease_Prediction_ML.ipynb		Heart_Disease_Prediction_ML.ipynb
README.md		README.md
processed.cleveland.data		processed.cleveland.data

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Heart_Disease_Prediction_ML

By Priyanka Roy

Features of this dataset:

Main Objective(s):

Observation(s)/Outcome Analysis:

Dataset Reference: Click Here

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Heart_Disease_Prediction_ML

By Priyanka Roy

Features of this dataset:

Main Objective(s):

Observation(s)/Outcome Analysis:

Dataset Reference: Click Here

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages