Simple Linear Regression
Definition:
Simple linear regression is a statistical technique used to model the relationship between a single independent variable (predictor) and a dependent variable (outcome). In the medical field, it is often used to investigate the correlation between a specific intervention or exposure and a health outcome.
Formula:
y = β0 + β1x + ε
- y: Dependent variable (health outcome)
- x: Independent variable (intervention or exposure)
- β0: Intercept (value of y when x = 0)
- β1: Slope (change in y for each unit change in x)
- ε: Error term (unknown factors influencing the outcome)
Applications in the Medical Field:
Predicting disease incidence: Estimating the probability of developing a disease based on a specific risk factor.
Assessing treatment effectiveness: Determining the impact of a particular intervention on a clinical outcome.
Identifying environmental health risks: Examining the relationship between exposure to pollutants and health conditions.
Diagnostic testing: Developing prediction models for diagnosing diseases based on laboratory or imaging findings.
Prognostication: Predicting the course and outcome of a disease for individual patients.
Advantages:
Simple and easy to interpret.
Provides a measure of the strength of the relationship between variables (slope coefficient).
Can be used with limited data sets.
Limitations:
Assumes a linear relationship between variables.
May not account for confounding variables that influence the outcome.
Requires caution when extrapolating beyond the range of observed data.
Example:
A study investigates the relationship between body mass index (BMI) and systolic blood pressure (SBP). The regression analysis reveals a positive correlation, with a slope coefficient of 0.25. This indicates that for each unit increase in BMI, SBP is expected to increase by 0.25 mmHg.
Conclusion:
Simple linear regression is a valuable tool in the medical field for investigating the correlation between variables and predicting health outcomes. However, it is important to consider its limitations and interpret the results cautiously.
Pearson Correlation Coefficient (r)
Measures the strength and direction of the linear relationship between two variables.
Ranges from -1 to 1:
-1: Perfect negative correlation
0: No correlation
1: Perfect positive correlation
Formula:
r = (Σ(x - x̄)(y - ȳ)) / √(Σ(x - x̄)² Σ(y - ȳ)²)
x and y are the data points
x̄ and ȳ are the means of x and y
Coefficient of Determination (R-squared)
Also known as the explained variance.
Measures the proportion of variance in the dependent variable (y) that is explained by the independent variable (x).
Ranges from 0 to 1:
0: The model explains none of the variance in y.
1: The model explains all of the variance in y.
Formula:
R-squared = r²
Relationship between r and R-squared:
R-squared is the square of the correlation coefficient (r).
This means that:
If |r| is close to 1, R-squared will be close to 1, indicating a strong linear relationship.
If |r| is close to 0, R-squared will be close to 0, indicating a weak or no linear relationship.
Interpretation:
R-squared indicates the goodness of fit of the regression line.
A higher R-squared value means:
The line better fits the data points.
The independent variable explains more of the variation in the dependent variable.
However, R-squared does not provide information about the direction of the relationship (positive or negative), which is indicated by the correlation coefficient (r).
Example 1: Simple Linear Regression in Medical Field
Objective: To determine the relationship between blood pressure (systolic) and age in a population of patients.
Data Collection:
Blood pressure (systolic) and age data were collected from 100 patients.
Model:
The simple linear regression model is:
Systolic Blood Pressure (mmHg) = β0 + β1 Age + ε
where:
β0 is the intercept (the predicted systolic blood pressure when age is 0)
β1 is the slope (the change in systolic blood pressure for each unit increase in age)
ε is the error term
Parameter Estimation:
Using the collected data, the model parameters can be estimated using least squares regression.
Results:
β0 = 120 mmHg
β1 = 0.5 mmHg/year
Interpretation:
The estimated model indicates that:
The average systolic blood pressure for patients of age 0 is 120 mmHg.
For every year increase in age, the average systolic blood pressure increases by 0.5 mmHg.
Conclusion:
The simple linear regression model demonstrates a positive relationship between blood pressure (systolic) and age in the studied population. This suggests that as patients get older, their blood pressure tends to increase. This information can be valuable for healthcare professionals in assessing and managing blood pressure in different age groups.
Example 2: Simple Linear Regression in a Clinical Trial
Purpose: To investigate the relationship between the independent variable (drug dosage) and the dependent variable (treatment outcome) in a clinical trial.
Data: A clinical trial is conducted with 50 participants randomly assigned to receive one of two drug dosages: 100 mg or 200 mg. The treatment outcome (measured in percentage improvement) is recorded for each participant.
Model: The simple linear regression model can be represented as:
Y = α + β X + ε
where:
Y is the treatment outcome
X is the drug dosage
α is the intercept
β is the slope
ε is the error term
Analysis:
1. Scatterplot: A scatterplot of the data shows a positive linear relationship between drug dosage and treatment outcome.
2. Regression Line: The regression line is fitted to the scatterplot using the least squares method. The equation of the line is:
Y = 20 + 0.5 X
3. Interpretation: For every 1 mg increase in drug dosage, the expected increase in treatment outcome is 0.5%.
Conclusions:
The simple linear regression model provides evidence of a positive linear relationship between drug dosage and treatment outcome.
The slope of the regression line (0.5) indicates the magnitude of the effect: a 1 mg increase in drug dosage is associated with a 0.5% improvement in treatment outcome.
The intercept of the regression line (20) represents the predicted treatment outcome when the drug dosage is 0 mg. In this case, it indicates a baseline treatment outcome of 20%.
.....................................................................................................
👉 For the data analysis, please go to my Youtube(Ads) channel to Watch Video (Video Link) in
Youtube Channel (Channel Link) and Download(Ads) video.
💗 Thanks to Subscribe(channel) and Click(channel) on bell 🔔 to get more videos!💗!!
- Tell: (+855) - 96 810 0024
- Telegram: https://t.me/sokchea_yann
- Facebook Page: https://www.facebook.com/CambodiaBiostatistics/
- TikTok: https://www.tiktok.com/@sokcheayann999
- STATA for dataset restructuring, descriptive and analytical data analysis
- SPSS for dataset restructuring, data entry, data check, descriptive, and analytical data analysis
- Epi-Info for building questionnaires, data check, data entry, descriptive, and analytical data analysis
- Epidata-Analysis for dataset restructuring, descriptive and analytical data analysis
- Epi-Collect for building questionnaires, remote data entry, mapping, and data visualization
- Epidata-Entry for building questionnaires, data check, data entry, and data validation
ABA Account-holder name: Sokchea YAN
ABA Account number: 002 996 999
ABA QR Code:
or tap on link below to send payment:
https://pay.ababank.com/iT3dMbNKCJhp7Hgz6
✌ Have a nice day!!! 💞
Comments
Post a Comment