Stroke Data Exploration Using Excel

Stroke Data Exploration Using Excel

Introduction

This is an Excel Dashboard project, I completed as a part of practising my Excel skills. I used the stroke dataset from Kaggle to prepare, clean and analyze the data and created a visualization dashboard using different graphs.

Excel Stroke Data Exploration and Visualization

Kaggle Dataset

Data Preparation, Cleaning And Analysis Process

  1. I downloaded the CSV file of the stroke dataset from Kaggle and imported it to Excel.

  1. I created another sheet duplicating the data to keep the original data intact and use the new sheet for further cleaning and analysis of the data. I further named it the " Working Sheet".

  2. I used Sort and Filter features in Excel to get the first look at the data.

  3. I started cleaning the data by first checking for duplicate rows but none were found.

  4. I increased the visibility of the column labels by using the formatting tools to make the labels bold and change the case of the labels.

  5. I found an error in the Work Type column after using a Filter feature which had string value of "Children". I checked with the age column to find they were between 0-16 years of age.

  6. I later replaced "Children" in Work Type column to “Unemployed” as a category for clear visualizations.

  7. I used the find and replace features in Excel to change the values of 0 and 1 to "Yes" and "No" in the columns of Stroke, Hypertension and Heart Disease for clarity in the visualizations.

  8. I created brackets columns for Age and BMI and named them "Age Brackets" and "BMI Brackets" using the following nested "IF" functions. This helped me create clear visualizations of those categories.

=IF(C2>=65,"Senior (65+)",IF(C2>=25,"Adult (25-64)",IF(C2>=24,"Youth ( 15-24)",IF(C2<=14,"Child (0-14)","Invalid"))))

=IF(K2="Unknown","Unknown",IF(K2>=30,"Obese (30+)",IF(K2>=25,"Overweight (25-29.9)",IF(K2>=18.5,"HealthyWeight (18.5-24.9)",IF(K2<18.5,"Underweight (0 -18.4)","Unknown")))))

  1. I found and replaced "N/A" values in BMI column with "Unknown" for clarity in the visualizations.

  2. I increased the consistency of the strings throughout the columns of Work Type by finding and replacing "Never_worked" and "Self-employed" with "Unemployed" and "Self_employed" respectively.

Key insights from the analysis

  1. There are a total of 249 patients who had a stroke with females having higher stroke incidents than males.
    56.7% in females Vs 43.37% in males.

  2. Seniors had the highest percentage of strokes followed by adults and then children.

  3. People working privately had more stroke incidents than others. Unemployed male individuals had no stroke incidents while unemployed females had 2 stroke incidents.

  4. Individuals who did not smoke had the highest stroke incidents followed by people who formerly smoked and then the ones who smoked.

Limitations in the dataset

  1. There could be a sampling bias in the data with a significantly higher number of people who did not have a stroke than the ones who had a stroke.

  2. The time frame for which the data was collected is missing which makes it difficult to draw enough patterns in the data.

  3. There are a significant number of nulls or missing values which I categorized as unknown.

Conclusion

Skills Used

  1. Data cleaning in Excel.

  2. Creating categories/ brackets using nested IF functions to use them for visualizations.

  3. Creating pivot tables and charts.

  4. Creating a visualization dashboard in Excel using customized charts.

  5. Inserting sliders to create an interactive dashboard.

My learnings

I enjoyed using Excel for this project as I found it a very useful tool to:

  1. Explore the data with some built-in functionalities of sorting and filtering to get the first view of the data, identify the data format and check for errors and duplicates in the data to clean it further.

  2. Analyze the data using pivot tables to get useful insights from the data.

  3. Create simple and effective visualizations with customized charts.

  4. Make the visualizations interactive by using some of its functionalities like inserting multiple slicers.

I appreciate your time to review my project blog and look forward to any feedback or suggestions for improvement.

Resources:

Kaggle Stroke Dataset Prediction

Alex The Analyst YouTube Portfolio Tutorial