Skip to content

roypriyanka7/Heart_Disease_Prediction_ML

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 

Repository files navigation

Heart_Disease_Prediction_ML


This is a Heart Disease Data Set, collected from the UCI Machine Learning Repository. The complete collection consists of four individual databases collected from four different institutions located in Cleveland, Hungary, Switzerland, and the VA Long Beach. This database contains 76 attributes, but all published experiments refer to using a subset of 14 of them. In particular, the Cleveland database is the only one that has been used by ML researchers to this date. The "goal" field refers to the presence of heart disease in the patient.

Features of this dataset:

  • age: person's age in years
  • sex: person's sex- 1 for male and 0 for female
  • cp: chest pain type (4 values)- typical angina, atypical angina, non-anginal pain, asymptomatic
  • trestbps: resting blood pressure
  • chol: serum cholestoral in mg/dl
  • fbs: fasting blood sugar > 120 mg/dl (1- T ; 0- F)
  • testecg: resting electrocardiographic results where, 0 = normal, 1 = having ST-T wave abnormality (T wave inversions and/or ST elevation or depression of > 0.05 mV) and 2 = showing probable or definite left ventricular hypertrophy by Estes' criteria
  • thalach: maximum heart rate achieved
  • exang: exercise induced angina
  • oldpeak: oldpeak = ST depression induced by exercise relative to rest
  • slope: the slope of the peak exercise ST segment (1 = upsloping, 2 = flat, 3 = downsloping)
  • ca: number of major vessels (0-3) colored by flourosopy
  • thal: 3 = normal; 6 = fixed defect; 7 = reversable defect
  • hd: heart disease, which is our target variable (0 = no, 1 = yes).


Main Objective(s):

  • Explaining the features and target variable
  • Deal with the missing values
  • Performing One-hot Encoding
  • Splitting the dataset into 80% training and 20% for testing. Adding random state and stratification
  • Training with Decision Tree Classification
  • Tree diagram of the Decision Tree
  • Showing the Confusion Matrix, Classification report, and ROC-AUC &
  • Determining the accuracy of the ML model
  • Explaining the model


Observation(s)/Outcome Analysis:

  • The recall rate is about 93% , that is, still 7% of the patients who has heart disease will be diagnosed wrong. But as of now, it's good enough to continue.
  • If the AUC is in the range of 0.5~1, that means the model has value of prediction. The higher the score is, the better the model is. Moreover, the model will be a perfect classifier if the AUC equals to one. We got almost 0.9 of the AUC score, not bad at all!
  • This model has an accuracy of 88.33% (with random state= 63 as we found some variations with different state no. but this is the highest accuracy state) which explains that while dealing with the dataset this model is preety handy because we know, when the accuracy rate of a model > 85% we can call that model good enough. Although, many more ML algorithms might result more fruitful considering a better accuracy rate which could be explored later.


Dataset Reference: Click Here


Releases

No releases published

Packages

 
 
 

Contributors