This is an Excel Dashboard project, I completed as a part of practising my Excel skills. I used the stroke dataset from Kaggle to prepare, clean and analyze the data and created a visualization dashboard using different graphs.
You can find the link to my completed Excel project and the stroke dataset here:
Data Preparation, Cleaning And Analysis Process
- I downloaded the CSV file of the stroke dataset from Kaggle and imported it to Excel.
I created another sheet duplicating the data to keep the original data intact and use the new sheet for further cleaning and analysis of the data. I further named it the " Working Sheet".
Filterfeatures in Excel to get the first look at the data.
I started cleaning the data by first checking for duplicate rows but none were found.
I increased the visibility of the column labels by using the formatting tools to make the labels bold and change the case of the labels.
I found an error in the
Work Typecolumn after using a
Filterfeature which had string value of "Children". I checked with the age column to find they were between 0-16 years of age.
I later replaced "Children" in
Work Typecolumn to “Unemployed” as a category for clear visualizations.
I used the
replacefeatures in Excel to change the values of 0 and 1 to "Yes" and "No" in the columns of Stroke, Hypertension and Heart Disease for clarity in the visualizations.
I created brackets columns for
BMIand named them "Age Brackets" and "BMI Brackets" using the following nested "IF" functions. This helped me create clear visualizations of those categories.
=IF(C2>=65,"Senior (65+)",IF(C2>=25,"Adult (25-64)",IF(C2>=24,"Youth ( 15-24)",IF(C2<=14,"Child (0-14)","Invalid"))))
=IF(K2="Unknown","Unknown",IF(K2>=30,"Obese (30+)",IF(K2>=25,"Overweight (25-29.9)",IF(K2>=18.5,"HealthyWeight (18.5-24.9)",IF(K2<18.5,"Underweight (0 -18.4)","Unknown")))))
I found and replaced "N/A" values in
BMIcolumn with "Unknown" for clarity in the visualizations.
I increased the consistency of the strings throughout the columns of
Work Typeby finding and replacing "Never_worked" and "Self-employed" with "Unemployed" and "Self_employed" respectively.
Key insights from the analysis
There are a total of 249 patients who had a stroke with females having higher stroke incidents than males.
56.7% in females Vs 43.37% in males.
Seniors had the highest percentage of strokes followed by adults and then children.
People working privately had more stroke incidents than others. Unemployed male individuals had no stroke incidents while unemployed females had 2 stroke incidents.
Individuals who did not smoke had the highest stroke incidents followed by people who formerly smoked and then the ones who smoked.
Limitations in the dataset
There could be a sampling bias in the data with a significantly higher number of people who did not have a stroke than the ones who had a stroke.
The time frame for which the data was collected is missing which makes it difficult to draw enough patterns in the data.
There are a significant number of nulls or missing values which I categorized as unknown.
Data cleaning in Excel.
Creating categories/ brackets using nested IF functions to use them for visualizations.
Creating pivot tables and charts.
Creating a visualization dashboard in Excel using customized charts.
Inserting sliders to create an interactive dashboard.
I enjoyed using Excel for this project as I found it a very useful tool to:
Explore the data with some built-in functionalities of sorting and filtering to get the first view of the data, identify the data format and check for errors and duplicates in the data to clean it further.
Analyze the data using pivot tables to get useful insights from the data.
Create simple and effective visualizations with customized charts.
Make the visualizations interactive by using some of its functionalities like inserting multiple slicers.
I appreciate your time to review my project blog and look forward to any feedback or suggestions for improvement.