University | Temasek Polytechnic (TP) |
Subject | ESE1008 Data Visualisation and Analytics |
Background:
This dataset encompasses a wide array of health-related information, offering a comprehensive overview of various physiological and lifestyle factors. It includes demographic details such as sex and age, as well as crucial anthropometric measurements like height, weight, waistline. Additionally, the dataset contains data on blood pressure (systolic and diastolic), blood components (blood sugar, cholesterol levels, triglycerides, and hemoglobin), kidney function markers (serum creatinine), liver enzymes (SGOT_AST, SGOT_ALT, and gamma-GTP), and indicators of lifestyle habits (drinking). This rich dataset provides a valuable resource for exploring relationships between these variables, conducting health assessments, and investigating the impact of lifestyle choices on various health parameters.
By considering factors such as blood pressure (SBP, DBP), liver enzymes (SGOT_AST, SGOT_ALT, gamma_GTP), and cholesterol levels (tot_chole, HDL_chole, LDL_chole), we can gain insights into the impact of drinking (DRK_YN) on overall health. This analysis allows us to identify trends and potential health risks associated with drinking habits, such as increased liver stress, cardiovascular issues, and metabolic irregularities. By examining additional variables like smoking status (SMK_stat_type_cd), age, and gender, we can explore how these factors interact with drinking to influence health outcomes, providing a deeper understanding of the challenges and risks related to lifestyle choices.
Examples of investigative question will be:
- Which age group shows the highest levels of total cholesterol (tot_chole) among drinkers (DRK_YN)?
- Does drinking status (DRK_YN) correlate with liver enzyme levels (SGOT_AST, SGOT_ALT)?
Sources:
Hire a Professional Essay & Assignment Writer for completing your Academic Assessments
Native Singapore Writers Team
- 100% Plagiarism-Free Essay
- Highest Satisfaction Rate
- Free Revision
- On-Time Delivery
Objectives:
- Data Cleaning: To perform data cleaning to prepare the dataset for further analysis.
- Exploratory Data Analysis (EDA): To conduct exploratory data analysis to gain statistical insights into the dataset. Key activities include gathering statistical summaries, plotting box plots and histograms for numerical variables, and creating visual charts for categorical data types. A correlation matrix for all numerical variables should also be included.
- Formulating Investigative Questions or Hypotheses: To propose preliminary investigative questions or hypotheses based on the dataset. Use data visualization techniques to explore and answer these questions or hypotheses. Go beyond the initial findings to explore specific scenarios in more depth, uncovering additional insights.
- Data Transformation: To perform data transformations based on insights gained from the EDA. This may include outlier removal and aggregation to improve data quality.
- Model Selection and Evaluation: To select appropriate target variables and apply Linear and Logistic Regression models. Assess and discuss the accuracy of each model, using SGOT_AST and DRK_YN as target variables.
Additional Notes:
- Complete Objectives 1, 2, and 3, and compile your findings into Report 1, which should be no more than 20 pages.
- For Objective 2, analyze all variables in the dataset and provide evidence of your work in both Knime and Tableau. In the report, show the statistic table, include any two box plots, two histograms, and two pie charts which are worth mentioning plus the linear correlation matrix.
- It is not necessary to answer the preliminary investigative questions in Report 1, answer them in Report 2. You may use AI tools to help generate relevant questions if needed. Propose at least two two-variable and three three-variable (further Insights) questions for Objective 3, ensuring they are unique from those in the introduction.
- For Report 2, perform the necessary data transformations following your EDA and use the data to address the investigative questions. Copy both the investigative questions from the Background section and proposed questions from Report 1 and provide answers for each one. Additionally, include Linear and Logistic Regression model analysis and conclude with a reflection.
- Reflection: In your reflection, evaluate the dataset’s usefulness, model accuracy, and any feature enhancements (such as additional features) that could improve the model’s predictive accuracy. Keep Report 2 to a maximum of 20 pages.
Data Dictionary (variable descriptions)
Variable | Description |
---|---|
sex | Gender of the individual (e.g., Male or Female). |
age | Age of the individual, categorized into 5-year interval |
height | Height of the individual, usually in centimeters. |
weight | Weight of the individual, typically in kilograms. |
waistline | Measurement of the individual’s waistline, in centimeters, indicating abdominal fat. |
SBP | Systolic Blood Pressure, measuring the pressure in arteries when the heart beats (mmHg). |
DBP | Diastolic Blood Pressure, measuring the pressure in arteries between heartbeats (mmHg). |
BLDS | Blood Sugar level, typically measured in mg/dL indicating blood glucose concentration. |
tot_chole | Total Cholesterol level, measuring the overall cholesterol in blood (mg/dL). |
HDL_chole | High-Density Lipoprotein (HDL) Cholesterol, often referred to as “good” cholesterol (mg/dL). |
LDL_chole | Low-Density Lipoprotein (LDL) Cholesterol, often called “bad” cholesterol (mg/dL). |
triglyceride | Level of triglycerides, a type of fat in the blood, usually in mg/dL. |
hemoglobin | Hemoglobin concentration, an indicator of oxygen-carrying capacity in the blood (g/dL). |
urine_protein | Presence of protein in urine, indicating possible kidney issues; usually coded as a categorical value. |
serum_creatinine | Serum creatinine level, indicating kidney function (mg/dL). |
SGOT_AST | Aspartate Aminotransferase (AST), a liver enzyme used to assess liver health (U/L). |
SGOT_ALT | Alanine Aminotransferase (ALT), another liver enzyme indicating liver health (U/L). |
gamma_GTP | Gamma-Glutamyl Transferase (GGT), an enzyme indicating liver and bile duct function (U/L). |
SMK_stat_type_cd | Smoking Status: 1 never smoked, 2 used to smoke but quit, 3 still smoking. |
DRK_YN | Drinking Status (Yes/No), indicating whether the individual consumes alcohol. |
Data Assigned:
S/N | Data File Assigned (Tick) |
---|---|
1 | Health-1.csv |
2 | Health-2.csv |
3 | Health-3.csv |
4 | Health-4.csv |
5 | Health-5.csv |
Buy Custom Answer of This Assessment & Raise Your Grades
Looking for Plagiarism free Answers for your college/ university Assignments.
- SWK352 Tutor-Marked Assignment SUSS January 2025 : Children And Their Issues
- COS364 Tutor-Marked Assignment (TMA) SUSS January 2025 : Interventions for At-Risk Youth
- FMT309 Tutor-Marked Assignment (TMA01) SUSS January 2025 : Building Diagnostics
- HFS105 Tutor Marked Assignment 02 SUSS January 2025 : Cognition and Information Processing
- SUSS : Legal Liability in Adventure Tourism A Case Study Analysis
- Case Study Individual Assignment SUSS : Inventory Management Strategies and Cost Minimization
- BSL202 TJA, 2025 SUSS : Workplace Law Employment Classification and Contractual Obligations
- Ethical Theories and Contemporary Business Issues: Coursework Re-Assessment
- MTD220 ECA January 2025 SUSS : User Experience (UX) Design and Web Technologies
- FMT101 TMA01 January 2025 SUSS : Building Services