INTRODUCTION
The modified Rankin scale (mRS) has been the gold standard for grading stroke outcomes in clinical trials for many years. It has, therefore, become the most commonly used tool for stroke-related disability measurement both in observational studies as well as the everyday clinical practice of stroke units (SU) and rehabilitation wards (RW) [1-3]. Due to this simple-to-use scale, which assesses disability from no symptoms (0 points) to death (6 points), the whole spectrum of functional states can be described [3-5]. The most important differentiation is between mRS 0-2 (independent) and mRS 3-5 (dependent) patients (see Supplementary material). Despite these advantages, the problem of mRS inter-rater variability may lead to end-point misclassification, which can significantly affect clinical practice and observational studies [6-9].
Recognizing the need for improved reliability, numerous initiatives have been undertaken to enhance mRS grading. Notable among these are the development of a simplified mRS questionnaire (smRSq) and the Rankin Focus Assessment (RFA) and the implementation of a video-based training program by Glasgow University [4, 10-14]. These efforts aim to standardize the interpretation and application of the mRS, thereby reducing inter-rater variability and enhancing the scale’s reliability.
Additionally, most observational studies and stroke registries collect data from everyday clinical practice. These assessments are performed by physicians from different medical backgrounds who are not always as aware of all the scale’s nuances as investigators participating in clinical trials. It is also important to mention that although the mRS is the most widely used assessment of stroke outcome in various studies, most randomized trials – beginning with the NINDS reperfusion therapy trial in 1995 [15] and finishing with mechanical thrombectomy efficacy trials [16] – have used the mRS to assess post-stroke disability at discharge from a hospital (most commonly SU or neurology department, rather than a RW) and as a primary outcome after three months from stroke onset [17]. Recently, attempts to assess single- measure analysis versus repeated-measure analysis have started to be made [18].
In 2022, 73,900 acute ischemic stroke cases were reported in Poland – 21.4% of those patients started formal neurorehabilitation within 14 days of stroke onset, and many of them were transferred from a SU to a RW to continue rehabilitation [19].
This study aimed to measure the real-life consistency between (i) stroke unit physicians (SUPs) and (ii) physical and rehabilitation medicine physicians (PRMPs) using the mRS to measure the post-stroke disability of patients transferred directly from a SU to a RW.
METHODS
This prospective observational study enrolled adult acute stroke patients, treated in a single tertiary SU from October 2020 to May 2022. These patients were transferred directly to a RW, located in the same hospital in Warsaw, Poland. The inclusion criteria were: i) acute ischemic or hemorrhagic stroke according to the AHA/ASA tissue-based definition [2], ii) residual post-stroke deficit at discharge from SU, ii) slight-to-moderate disability (mRS 1-4) at discharge from SU and iv) capacity to give an informed consent. This cohort was recruited for a series of studies addressing the reliability of post-stroke functional assessment and quality of life. For the purpose of the current analysis, we excluded patients discharged to a place other than the RW located in the same hospital (Figure I).
A reference mRS score (REF) was made prospectively and directly in-person by one single-blinded experienced stroke physician, certified in the mRS, who used the Polish version of the RFA form to guide the interview (see Supplementary material). Post-stroke disability was measured using the mRS and documented in the patient’s medical record as a standard of care (i) by a SUP at discharge from the SU and (ii) by a PRMP on admission to the RW. Certification in the mRS was not mandatory for SUPs and PRMPs, but some of them had been trained in previous years for the purpose of randomized clinical trials. These data were obtained retrospectively after the end of enrollment to ensure that the results reflected routine everyday practice. It is important to emphasize that all three mRS assessments were done on the day of transfer, which minimizes the likelihood of significant changes in patients’ functional states.
The main cohort included only those patients with mRS scores stated directly in their last observation or discharge note from the SU and stated directly in the initial observation or discharge note from the RW. As recording of the mRS score was mandatory for PRMPs but only expected of SUPs, we had planned a sensitivity analysis for all patients rated by a PRMP without excluding cases not rated by a SUP.
All patients signed the consent form by themselves or were accompanied by two independent witnesses before the REF was obtained. This study was conducted in accordance with the Declaration of Helsinki, Polish law and the hospital regulations with approval from the local ethics committee.
Statistical analysis
Categorical variables are presented as a number of valid observations and proportions calculated. Unknown values were excluded from the denominator. Continuous variables are presented as a median with an interquartile range (1st quartile to 3rd quartile, Q1-Q3) due to their non-normal distribution, according to the Shapiro-Wilk test.
We calculated the proportion of patients who, on admission to the RW, received a mRS score that was either identical, higher, or lower than SUPs grades and compared them both to the REF. The degree of consistency between the assessments of SUPs, PRMPs and the REF was expressed using Cohen’s κ. Additional analysis was conducted in a subpopulation of patients with a double mRS assessment – made by a SUP and a PRMP (‘the main cohort’).
Calculations were carried out using the STATISTICA 13.3 software package (TIBCO Software Inc., USA). The χ2 test, two-tailed Fisher’s exact, and Mann-Whitney U tests were used, as being appropriate for comparisons. P-values of < 0.05 were considered statistically significant.
RESULTS
Of 117 screened acute stroke patients hospitalized in the SU, 48 patients (median age 69 years, Q1-Q3, 63-75; 50% females) were transferred directly between departments and enrolled in the sensitivity cohort, while 33 patients with mRS score reported both at discharge from the SU and on admission to the RW constituted the ‘main cohort’ (Figure I).
Patients from the main cohort had a median age of 68 years (Q1-Q3, 62-75), were 90.9% pre-stroke independent (mRS 0-2) and their most common comorbidity was hypertension (87.9%). The remaining 15 patients had no written record of a mRS score at discharge from the SU and therefore were excluded from the main analysis. Their characteristics were similar to the main cohort (Table 1).
Table 1
Basic characteristics of cohorts
In the main cohort SUPs and PRMPs reported identical mRS scores in 75.8% of cases (Cohen’s κ = 0.58), while the assessments of the REF and SUPs were at 72.7% (Cohen’s κ = 0.55) and the REF and PMRPs at 70.0% (Cohen’s κ = 0.49). A similar level of agreement was observed for PRMPs and the REF in the sensitivity cohort (N = 48, 66.7%, κ = 0.46) (Table 2).
Table 2
Inter-rater consistency in mRS scoring between all physicians
Cross-tabulations between cohorts and their subpopulations showed that patients assessed by the REF as independent (mRS 2) were often described as dependent (mRS 3) both by SUPs and PRMPs (Tables 3-5). A similar tendency was shown while comparing SUPs and PRMPs grades – patients assessed by the SUP as independent (mRS 2) were often scored as dependent (mRS 3 and 4) by the PRMP. On the other hand, in a subpopulation of patients scored by the REF as dependent (mRS 3 and 4), there was no clear tendency towards overrating disability levels by SUPs and PRMPs (Tables 3-5).
Table 3
Cross-tabulations of mRS scores between physicians for the main cohort (n = 33)
Table 4
Cumulative cross-tabulation of mRS scores for the REF, SUPs and PRMPs for the main cohort (n = 33) with REF as the gold standard
[i] Explanation of colors: green – complete agreement between all three physicians; yellow – partial agreement between the REF and one of the other physicians (SUPs or PRMPs); red – no agreement between the REF and both other physicians (SUPs and PRMPs).
mRS – modified Rankin scale, REF – reference mRS score, SUPs – stroke unit physicians, PRMPs – physical and rehabilitation medicine physicians
DISCUSSION
The first version of the Rankin scale, created by John Rankin almost 70 years ago, comprised briefly described states (1-5), with no distinct criteria to differentiate between disability levels [20]. The scale was later modified by adding new grades (0 – no symptoms and 6 – death). Therefore, it became possible to capture the whole spectrum of functional states. However, the disadvantage of significant inter-observer variability remained [3, 6, 7].
The first means used to improve the consistency of standard mRS assessment included developing a simple structured interview and, subsequently, the creation of the smRSq, which consists of short yes-or-no questions facilitating the interviewer’s ability to distinguish between states of disability [10, 11, 13]. The smRSq demonstrates adequate agreement with the classic mRS and proves useful in face-to-face, phone and online interviews. Its reliability proved to be superior to standard mRS but it was still not optimal [12, 21].
The most recent attempt to increase the consistency of mRS grading was the development of the RFA, which is an independent tool consisting of a checklist with key discriminating features included in the detailed structured interview for the mRS [4]. That structure urges physicians to ask all relevant questions and increases the degree of the scale’s reliability up to 93% [4, 22]. Other proposals for optimizing mRS assessment include the miFUNCTION scale and the utility-weighted mRS (UW-mRS) [17, 23-25].
Despite creating modifications to the standard mRS, another approach to improving its inter-rater consistency is the online training provided by Glasgow University. It includes short lectures and video interviews recorded in English with subtitles in many different languages. After completing it, attending physicians receive a certificate that should be periodically renewed.
Unlike most previous validation studies and analyses of the reliability of the mRS, our study provides prospective real-world data from a tertiary stroke center [14]. All raters were experienced in managing post-stroke patients, yet they represented two different clinical perspectives. In contrast to the setting of clinical trials, assessments were done without special emphasis on precision and without a predetermined mRS scoring tool. However, knowing the clinical routine of both departments it can be assumed that SUPs and PRMPs used either the smRSq or original mRS, but not the RFA.
It is important to note the incomplete data reporting by SUPs in patients’ medical history, which significantly decreased the size of the main cohort (from 48 to 34 patients). Most observational studies and stroke registries use the data from patients’ documentation, which makes the lack of mRS scoring even more troublesome. In Poland, the mRS score is included in a set of variables that must be reported directly to the National Health Fund for each discharged stroke patient. However, our study shows that such regulation does not translate into improved reporting in the original patients’ records. Therefore, it seems crucial to implement the reporting of mRS assessment in SU discharge notes as a standard operating procedure (SOP).
Current analysis confirms that overall consistency between SUPs and PRMPs is modest (76.5%, Cohen’s κ = 0.58), as previously observed in our fully retrospective study [14]. The consistency between SUPs and PMRPs compared to the REF is also modest, matching values from smRSq validation studies [10-13].
The most marked inconsistency between the REF and the PRMPs or SUPs was observed between mRS scores of 2 and 3. Only 17% of these patients obtained identical grades. Interestingly, the REF scored mRS 2 significantly more often than SUPs or PRMPs. One possible explanation is that patients transferred to the RW are perceived as being unable to live on their own by definition. Inter-rater variability between mRS scores 2 and 3 may also be associated with the simplification of mRS 3 score just to motor functions – if the patient moves unassisted by another person but has temporal or permanent need of using orthopedic aid, mRS 3 grade may be given automatically.
Another interesting finding is that a large proportion of patients assessed by both the REF and a SUP as independent (mRS 2) were scored as dependent (mRS 3 or 4) by a PRMP. Our previous retrospective study also suggests that SUPs may tend to underrate disability and PRMPs to overrate it [14]. This potential bias may have two causes. Firstly, SUPs are more likely to concentrate on disability attributed only to post-stroke neurological deficits, while PRMPs perceive a patient’s functional status in a more complex way. That is why some researchers indicate that mRS scoring is insufficient to describe post-stroke disability in terms of the neurorehabilitation required in particular cases [26, 27]. Secondly, SUPs are willing to see the positive effect of SU hospitalization, whilst PRMPs prefer to achieve a measurable improvement in the RW. However, we did not observe a tendency to overrate mRS scores in actually dependent patients (REF mRS 3 or more). This may indicate that distinguishing between more severe states of disability (mRS 3-5) is less challenging.
Strengths and limitations
This study reflects the everyday clinical practice of SUPs and PRMPs using prospectively obtained reference mRS assessment. The retrospective collection of mRS scores by SUPs and PRMPs ensures the data is unbiased by an awareness that their performance will be externally evaluated or used as the outcome measure in a clinical trial. A similar approach was used in our previous study [14]. This did not allow us to state what mRS tools were used by SUPs and PRMPs in individual cases and, more importantly, to verify the correctness of scoring. To overcome this limitation, the current study includes prospectively enrolled patients who were additionally assessed by the REF. Such a methodology allows for the determination of which specialist was correct (i.e. the SUP, the PRMP, both or neither).
The first limitation of this study is the sample size of the main cohort (n = 33), which is caused primarily by the incompleteness of mRS recording in the SU medical records. However, the sensitivity cohort is more numerous (N = 48) and similar to what has been observed in the validation (median of 47 patients in a systematic review of the reliability of mRS [7], 58 in a study comparing standard mRS to structured interview [10], or 50 patients in the first RFA study [4]). Another limitation is the exclusion of severely disabled patients who were not able to give informed consent but underwent intensive inpatient rehabilitation in RW. One may speculate that in this subset of patients, the level of agreement between mRS raters would be higher. Additionally, transferring post-stroke patients from the SU to rehabilitation is a common practice [28-30] but our findings refer directly to patients requiring intensive neurological hospital rehabilitation in the RW. Therefore, in both the sensitivity and main cohorts, there were no patients without or with mild disability (mRS 0 or 1) or patients with an unfavorable prognosis for functional improvement. That is why the study does not allow us to conclude about the whole possible spectrum of post-stroke disability. We also did not attempt a separate analysis according to the mRS certification status of the attending physicians, as the sample size would not have been large enough to provide robust results.
CONCLUSIONS
Our study provides further reassurance that the accuracy of the mRS assessments made by SUPs and PRMPs in real life is modest as in validation studies. This refers both to the internal consistency between different specialists and correctness compared to the REF. Therefore, the everyday use of the mRS in a comprehensive stroke center may be considered sufficient for the purpose of retrospective studies or stroke registries. However, the high proportion of missing assessments indicates the need to put additional effort into making the mRS score an SOP in the SU discharge report.
Our results suggest that the most inter-rater disagreement occurs between the mRS score of 2 and the mRS score of 3, which is vital for determining whether the treatment outcome is positive (mRS 0-2, independence) or negative (mRS 3-5, dependence). Stating more definite conclusions requires further multicenter studies involving primary stroke centers. Nonetheless, it seems reasonable to promote repeated training in the use of mRS or to implement specially designed aids such as the RFA.