Title: My Achey-Breaky Heart
Authors: Ricardo Barbosa, Max Halbert, Lindsey Reynolds, and Dan Sedano

Project Specifications

Research Question

What is the percent likelihood of someone dying from cardiovascular disease based on specific high impact risk factors?

Hypothesis

Based on initial raw data visualization (specifically a heatmap), we hypothesize that high levels of serum creatinine, age, ejection fraction, and serum sodium will increase the liklihood of cardiovascular disease.

Data Set

https://raw.githubusercontent.com/HelpingOtters/CST383_DS_Project/main/heart_failure_clinical_records_dataset.csv

Description of Data Set

https://archive.ics.uci.edu/ml/datasets/Heart+failure+clinical+records

Features used in our final model

Serum creatinine and ejection fraction

Machine Learning Model

Classification Decision Tree

Data Exploration

There is no missing data and all the values of the dataset are numeric, the dataset is ready to explore and put into the Decision Tree Model.


Since 'DEATH_EVENT' is the target variable, from the above correlation table, we can see 'age', 'high_blood_pressure', 'serum_creatinine', 'ejection_fraction', and 'serum_sodium' are the top five correlated features.

Data Visualization

The following heatmap will show the correlation between features visually.

From the heatmap, we got the same results from the correlation data table. Since time has the highest correlation, the relation of time with DEATH_EVENT will be explored.

As we can see, the lower the ejection fraction the higher the death count.

As we can see lower serum creatinine the higher the death counts.


Machine Learning

Model Testing

Test Case 1:

As we can see one of the leaf node samples size is only 0.4% of the data, that is possibility of overfitting. So we will add another hyperparmeter min_samples_leaf to avoid overfitting as we are tuning the model.

Which features are most impactful?

Answer: age, serum_creatinine, ejection_fraction, and serum_sodium appear to have the most impact on results

Get Prediction and Accuracy

Test Case 2:

Which features were the best?

Ejection Fraction and Serum Creatinine are the most impactful features.

Get Prediction and Accuracy

Test Case 3:

Result

Get Prediction and Accuracy

Final Model (Test Case 2)

Test Cases

Test Case 1:

Test Case 2:

Conclusion

Our Findings

Further Research