eISSN: 1896-9151
ISSN: 1734-1922
Archives of Medical Science
Current issue Archive Manuscripts accepted About the journal Special issues Editorial board Abstracting and indexing Subscription Contact Instructions for authors Ethical standards and procedures
SCImago Journal & Country Rank

vol. 16
Letter to the Editor

A review of robust regression in biomedical science research

Sacha Varin
Demosthenes B. Panagiotakos

Collège Villamont, Lausanne, Switzerland
School of Health Science and Education, Harokopio University, Athens, Greece
Arch Med Sci 2020; 16 (5): 1267–1269
Online publish date: 2019/08/06
View full text
Get citation
JabRef, Mendeley
Papers, Reference Manager, RefWorks, Zotero
It is a fact that most real-world datasets in biomedical research contain outliers and leverage points. To define what an outlier and a leverage point is, let us assume a YX regression model where Y is the outcome variable and X the independent covariate(s). Outliers are Y outcome observations that are distant from the majority of the other observations (in terms of the y-axis). Outliers can sometimes be influential, meaning they can substantially impact the results of a regression analysis, i.e., the estimated b-coefficients and, consequently, the predicted outcome y variable. However, at this point we have to distinguish between (a) “non-influential” outliers i.e., those that have a minimal impact on the estimated regression model but will still lead to an overestimation of the standard error and (b) the “influential” outliers which seriously impact the estimated model because they “pull” the regression line towards themselves [1]. The influential points can be removed from the modelling process, but only when substantive reasons are present, e.g., if these observations have been mis-recorded. In any other case they should be retain in the model as they are true observations and the results should be interpreted with caution. In contrast an inlier is an “unusual” observation that lies in the interior of a dataset making it difficult to distinguish from the other values. Leverage points are X observations (i.e., independent covariates) that are distant from the majority of other observations (in terms of the x-axis), regardless of their effect on the Y outcome. For example, let assume that we want to estimate a YX regression model of systolic blood pressure (SBP, y, outcome) levels based on age, body mass, salt consumption and physical activity status of n individuals. An outlier is an observation (individual) that has quite distant SBP levels from the majority of the other individuals, although its age, body mass, salt consumption and physical activity levels are within the range of the other cases. On the other hand, a leverage point is an observation (individual) that has quite distant age, and/or body mass, salt consumption and physical activity (x, covariates) levels compared to the majority of the other cases, regardless of the SBP levels. Leverage points are characterised as “good” when they do not influence the regression line and “bad” when they influence the regression line (like the outliers). In Figure 1 differences...

View full text...
Quick links
© 2020 Termedia Sp. z o.o. All rights reserved.
Developed by Bentus.
PayU - płatności internetowe