Length of Time-Series Gait Data on Lyapunov Exponent for Fall Risk Detection

Falls are the leading cause of disability in older adults with a third of adults over the age of 65 falling every year. Quantitative fall risk assessments using inertial measurement units and local dynamics stability (LDS) have shown that it is possible to identify at-risk persons. However, there are inconsistencies in the literature on how to calculate LDS and how much data is required for a reliable result. This study investigates the reliability and minimum required strides for 6 algorithm-normalization method combinations when computing LDS using young healthy and community dwelling elderly individuals. Participants wore an accelerometer at the lower lumbar while they walked for three minutes up and down a long hallway. This study concluded that the Rosenstein et al. algorithm was successfully and reliably able to differentiate between both populations using only 50 strides. It was also found normalizing the gait time series data by either truncating the data using a fixed number of strides or using a fixed number of strides and normalizing the entire time series to a fixed number of data points performed better when using the Rosenstein et al. algorithm.


INTRODUCTION
Falls are among the most common cause of decreased mobility and independence in older adults and rank as one of the most serious public health problems in the U.S., with costs exceeding $50 billion in 2015 (Ambrose, Paul, & Hausdorff, 2013;Bergen, Stevens, & Burns, 2016;Burns, Stevens, & Lee, 2016;Weisenfluh, Morrison, Fan, & Sen, 2012). Analogous to this reduction in independence is the inherent decline in gait stability that impairs balance and predisposes older adults to falls and fall-related injuries.
Dynamic stability, defined as the ability to maintain equilibrium despite the presence of small disturbances or control errors, is a fundamental motor task that must be rapidly adapted in the face of a dynamically varying environment (Dingwell & Cusumano, 2000;Dingwell, Cusumano, Cavanagh, & Sternad, 2001;Wurdeman, 2016). Evidence suggests that older adults experience a gradual deterioration in these balance mechanisms and may require more task-dependent rehabilitative and training interventions. Quantitative assessment of gait has been shown to identify age-related decrements, fall risk and pathology (Bruijn, Meijer, Beek, & Van Dieën, 2013;Daniel Hamacher, Singh, Van Dieën, Heller, & Taylor, 2011;Toebes, Hoozemans, Furrer, Dekker, & Van Dieën, 2012). In particular, gait measures derived from trunk acceleration signals can characterize trunk movement dynamics that regulate gait-related oscillations. However, aging may induce subtle impairments in gait without obvious detectable unsteadiness; therefore, nonlinear measures which are able to detect the hidden, subtle characteristics of aging in detrimental effects on locomotor control are used. In particular, calculating local dynamic stability (LDS) or the Lyapunov Exponent (LyE) during continuous walking has become a popular approach for quantifying gait stability (Mehdizadeh, 2018).
Modern motion capture laboratories collect precise data during walking and postural stability tasks; however, they are prohibitively expensive, immobile, and require well trained technicians to collect and process experimental results. Inertial measurement units (IMUs) or accelerometers have become widely used in assessing and monitoring gait and other daily living activities as an alternative to traditional motion capture. These sensors more flexible, mobile, and inexpensive. They also have the advantage of unlimited measurement volume and the opportunity of recording gait in various environmentse.g. clinical offices, community centers, or outdoor trackswith ease (Tao, Liu, Zheng, & Feng, 2012). Accelerometers and LDS have been used together as biomarkers for differentiating between healthy controls and various ailments, e.g. patient with dementia (IJmker & Lamoth, 2012), multiple sclerosis (Huisinga, Mancini, St. George, & Horak, 2013), and concussions (Fino, 2016). However, not all of these studies are comparable. Some studies use different data collection equipment, algorithms, and/or normalization methods. And even when publications research similar paradigms, some studies find significant differences while others do not. This could be due to sample and effect size within particular studies, but the inconsistency across publications could also be due to the lack of a universal methodology for calculating the LyE during gait.
To date, there has been several pivotal publications about the issues in calculating the LyE when using gait data and how various factors can impact the value of the LyE (Dingwell & Marin, 2006;Mehdizadeh, 2018;Raffalt, Kent, Wurdeman, & Stergiou, 2019;Stenum, Bruijn, & Jensen, 2014). In this study we will focus on the choice of algorithm and normalization methods used and examine their reliability and determine the minimum number of required strides for reliable computation in both young healthy and elderly adults. The most common algorithms for calculating LDS in gait are the Rosenstein et al. (R-algorithm) and Wolf et al. (W-algorithm) algorithms, refer to Figure 1 for a comparison flowchart.
Both the R-and W-algorithms track the rate of exponential divergence of neighboring points on the attractor. Each method starts by reconstructing the phase space by using the method of delays (Broomhead & King, 1986;Takens, 1981). For an N-point time series ( ) , the phase space can be reconstructed using the following equation, where is the time delay and is the embedding dimension. (1) This creates a dimensional phase space as an × matrix where = − ( − 1) . After creating the phase space these two algorithms diverge. For the R-algorithm, the nearest neighbor of every point on the reference trajectory is found. In this method, nearest neighbors are located by using the Euclidean norm and requiring that each point must be on a separate trajectory. The average divergence distance of all possible nearest neighbor pairs is tracked through time creating a mean divergence curve. The LyE is then calculated using a least-squares fit to the linear slope of the divergence curve, where ⟨ ⟩ denotes the average over all pairs of (nearest neighbor pairs, = 1,2, … , ).
The W-algorithm, after the phase space is reconstructed, uses the first point as a reference trajectory and follows a single nearest neighbor until the separation between the reference and neighbor is greater than a specific limit. The exponential growth in separation is then calculated and a new nearest neighbor is found. This procedure is repeated until the reference trajectory has gone through all of the data samples and LyE was estimated using: where ( −1 ) and ′( ) are the distance between the vectors at the beginning and end of a replacement step, and M is the total number of replacements (Wolf, Swift, Swinney, & Vastano, 1985). Please note that this equation uses natural logarithm instead of the binary logarithm function that Wolf et al. originally presented. This was done to make the LyE more comparable between the two algorithms (Cignetti, Decker, & Stergiou, 2012). For more details on either the Ror W-algorithm calculation methods please refer to the following publications (Rosenstein, Collins, & De Luca, 1993;Smith, 2019;Wolf et al., 1985). We hypothesize that each algorithm will require significantly different number of strides for the calculation of LDS. Additionally, different time series normalization methods have also been shown to affect the LyE and that different normalization methods work better for different LyE algorithms (Raffalt et al., 2019;Stenum et al., 2014). Therefore, we will investigate three of the most common normalization methods with both the R-and W-algorithm. We hypothesize that normalization methods will affect the reliability of the calculated LDS. These findings augment wearable sensors' potential as an ambulatory fall risk identification tool in community-dwelling settings. Furthermore, they highlight the importance of gait features that rely less on step-detection methods, and more on time series analysis techniques in the community-dwelling elderly population.

METHODS
Seventeen young healthy adults participated in this study and eleven community dwelling older adult's data from an ongoing fall risk assessment study was used. All subjects reported no cardiovascular issues, neurological diseases, nor lower extremity surgeries in the last 3 months. Additionally, the elderly participants were required to be able to perform a 2-3-minute walk without the aid of a cane or walker and had no history of falls. Hz. All participants were asked to walk for 3 minutes on a makeshift walking track at their preferred walking speed. This track was secluded so no outside factors could interfere with or interrupt the data collection. Ten seconds were removed from the beginning and end of the acceleration measurements to avoid non-stationary periods. The trials from young healthy participants were down sampled to 100 Hz to match the elderly community dwelling data collection.

Data Analysis
The following three preprocessing normalization methods were applied before calculating the LyE: 1. Raw Gait Cycle data (gc): The time series is truncated to keep a fixed number of strides regardless of the total number of data points. This maintains the original distance between points in the phase space but allows for individuals with a faster pace to have fewer data point available over all for the calculation. 2. Gait Cycle Normalized (gcNorm): As in the first method the time series is segmented to a include a fixed number of strides. Then each stride is resampled to have a fixed number of data points, usually 100. Therefore, all strides in this method will contain the same number of data points regardless of an individual's stride time. 3. Data Point Normalized (dpNorm): The time series is first truncated to include a fixed number of strides. Then the data is resampled to a specific number of total samples for the time series. This allows for fluctuations in data length for individual strides. For method (3), the total number of data points in the series was allocated 100 samples for every stride used. A time delay of 10 samples was used for all directions and all preprocessing methods. An embedding dimension of 5 was used when the LyE is calculated using the Rosenstein et al. algorithm and a dimension of 7 was used for the Wolf et al. algorithm (Bruijn, van Dieën, Meijer, & Beek, 2009;Huisinga, Mancini, George, & Horak, 2013;Smith, 2019). The LyE was calculated for all 6 algorithm-normalization method combination since neither the Rosenstein et al. algorithm nor the Wolf et al. algorithm have been proven to outperform the other and both widely used with gait data. (Mehdizadeh, 2018;Rosenstein et al., 1993;Wolf et al., 1985) The LyE was taken from 0 to 0.5 strides using the Rosenstein algorithm. Additionally, a time evolution of 7 was found to be appropriate for calculating the LyE with the Walgorithm. All calculations were performed using custom MATLAB programs (version 2018b, Mathworks Inc., Natwick).

Statistical Analysis
To determine the minimum number of strides, we use the same procedure as Riva et al. (2014b) using interquartile range/median ratio (imr). Briefly the LyE was calculated using decreasing windows of strides, from 120 to 10 strides with a resolution of 1 stride. The imr is calculated starting from the largest window (which gives the smallest ratio) and proceeds to the smallest window. The minimum number of strides was calculated per index and per subject at an imr threshold of 10%. Then the largest number of strides required across all subjects was chosen. Percent imr is an indication of the variation around the median. When variations of the
Additionally, statistical differences between population groups were compared to test the effectiveness of algorithm and normalization method combinations. The groups were compared based on the found sufficient number of strides when using imr. A one-way ANCOVA was used for each directional signal -anteroposterior (AP), vertical (VT), and mediolateral (ML)with respect to both algorithms, while population and normalization methods were used as model effects. A post-hoc Tukey was then used to determine differences between each of the model effects. All statistical analyses were performed using JMP version 13 (SAS Institute Inc., Cary, NC) and a p-value of 0.05 or lower was considered significant.

RESULTS
Algorithm and preprocessing method choice affected the number of strides required to reach a steady state using the 10% threshold. The minimum required strides for calculating the LyE are summarized in Table 2  The reliability results are shown in Table 4. The maximum inter-subject imr was less than 20% for both young healthy and elderly adults when using the Rosenstein et al algorithm.
The Wolf et al algorithm ranged from 29% to 51% for young healthy subjects and 20% to 43% for elderly adults. The median inter-subject value of the LyE is also provided as a reference for both young and community dwelling elderly adults. Values used a 10% imr threshold for both young health (YH) and elderly adults (EA).
Lastly, the two populations were compared when 50 and 75 strides were used with the R-algorithm and when 110 strides were used with the W-algorithm, shown in Table 3. Significant differences between the two population groups were found using the AP signal when both data lengths were used with the gc and dpNorm normalization methods ( = 0.001). The normalization methods also found significant differences in the VT signal when 75 strides were used in the calculation. No significant differences between young healthy and community dwelling elderly adults were found when using the Wolf algorithm and any of the normalization methods.  Table 4. Reliability of LyE calculated for young healthy (YH) and community dwelling elderly adults (EA). Reliability is based on the maximum inter-subject imr. The median values of inter-subjects' medians have been included for reference values.

DISCUSSION
Gait stability is directly quantified through local dynamic stability, specifically, the LyE value. However, the implementation parameters are ill-defined and lack standardization procedures. Therefore, the aim of the present study was to investigate the reliability of the LyE and determine the minimum number of strides for its calculation using 6 algorithm-normalization method combinations. The Rosenstein et al. and the Wolf et al. algorithms were used along with three preprocessing methods: gc. gcNorm, and dpNorm. The R-algorithm required a significantly smaller number of steps with good reliability compared to the Walgorithm which only achieved average to poor reliability. And only the R-algorithm was able to differentiate the young healthy and elderly community-dwelling adults.
The minimum number of strides required for the R-algorithm were found to be much smaller than previously reported (F. Riva et al., 2014a); this may be due to differences in methodology. The present study calculated the LyE using a single step, while Riva et al. (2014a) calculated it from a stride. Even though our method requires less strides, it was deemed more reliable based on the maximum inter-subject imr values --imr values rank reliability scores accordingly: excellent (imr < 10%), good (imr =10-20%), average (imr =20-30%), poor (imr =30-40%), and very poor (imr > 40%). The R-algorithm had good reliability in this study for both young healthy and community-dwelling older adults, while Riva et al. (2014b) reported only average reliability for its young healthy subjects. This is the first paper, to the authors' knowledge, that has investigated the required minimum number of strides and reliability using imr with the Walgorithm. The W-algorithm required between 100 and 110 strides for all normalization methods and population groups which is almost double the number of strides required for the R-algorithm. Additionally, the W-algorithm had average to poor reliability across both populations with gc normalization method performing better for young healthy adults and dpNorm performing better for elderly adults.
The results of the present study also show that the Ralgorithm was able to differentiate between both populations while the W-algorithm was unable. Significant differences between elderly and young healthy adults were found in the AP direction ( = 0.0001, shown in Table 4) when using the R-algorithm, which is consistent with the literature (Liu, Zhang, & Lockhart, 2012;Lockhart & Liu, 2008). But interestingly, no significant differences were found in the ML direction, which is more commonly reported as significant (Dennis Hamacher, Hamacher, Singh, Taylor, & Schega, 2015;Terrier & Reynard, 2015). This could be due to different data lengths and normalization methods used in those publications or even differences between over-ground and treadmill walking studies. It is also important to note that not all studies find significant differences between these populations like Bizovska et al. (2018). They found no differences in their young and elderly populations in both over-ground and treadmill walking trials.
Recent research has reported that raw gait (gc) data is ideal for the W-algorithm, i.e. just signal truncation, while both gcNorm and dpNorm normalization methods should be used for the R-algorithm (Raffalt et al., 2019). When the Ralgorithm is used, dpNorm and the gc method had the lowest number of required strides and had good measurement reliability, as interpreted from percent imr. Both young healthy and elderly community dwelling participants required less than 60 strides to calculate the LyE. We recommend either the dpNorm or gc method of normalization over the gcNorm method for young healthy subject studies. The Wolf algorithm was more reliable for young healthy adults when raw gait (gc) was used than gcNorm or dpNorm methods. The gc method also required less strides for this group. For the community dwelling elderly adults, gc method was slightly less reliable compared to dpNorm method. Additionally, dpNorm required the least amount of data except for in the ML range. However, there isn't a large enough difference between gc and dpNorm to definitively state one normalization method is more advantageous than the other when using the W-algorithm.

Maximum inter-subject imr Median inter-subject value of LyE
The present study has a few key limitations. First, we only calculated the LyE starting from 120 gait cycles. This has been deemed a sufficient data length with limited gains in precision if more strides could have been included (Bruijn et al., 2009;Raffalt, Vallabhajosula, Renz, Mukherjee, & Stergiou, 2018;F. Riva et al., 2014;. However, not all of these studies used accelerometers for data collection and there are a limited number of studies on the required number of strides for the W-algorithm. Secondly, there was a much larger proportion of females in the community dwelling elderly participants. This is largely due to participation in ongoing fall risk assessments that meet the criteria of this paper. In theory, the minimum number of strides is not gender based but this was out of scope to be tested in this paper. It should also be noted that the findings of this study were derived from a fairly small sample size, although similar studies have used as many or fewer subjects (Dennis Hamacher et al., 2015;Federico Riva, Grimpampi, Mazzà, & Stagni, 2014b) than the present study. And finally, two different sensor systems have been used in this study however all of the data was taken from the lumbar position and all data was down-sampled to 100Hz to ensure data length would be equal across both groups. Therefore, the use of two IMU systems should not have an effect on the results presented in this paper.

CONCLUSION
The present study investigated the reliability and minimum required number of strides to using to calculate LDS in young healthy and elderly community dwelling adults. As there is no universally accepted standard methodology for this calculation, 6 algorithm-normalization method combinations were used in order to help work towards creating a standardized process for accelerometers. We found that the Rosenstein et al. algorithm requires less strides for reliably calculating the LyE compared to the Wolf et al. algorithm. And the R-algorithm was able to differentiate between young healthy and elderly community-dwelling adults in the AP and VT direction using only 75 strides, while the W-algorithm was unable to differentiate these groups when using 110 strides. Our results show that either truncating the gait signal to a fixed number of strides or normalizing the signal to a fixed number of strides with a fixed number of total data points will compute a more reliable LyE when using the Ralgorithm.