## Abstract

To identify the underlying mechanisms of human motor control, parametric models are utilized. One approach of employing these models is the inferring the control intent (estimating motor control strategy). A well-accepted assumption is that human motor control is optimal; thus, the intent is inferred by solving an inverse optimal control (IOC) problem. Linear quadratic regulator (LQR) is a well-established optimal controller, and its inverse LQR (ILQR) problem has been used in the literature to infer the control intent of one subject. This implementation used a cost function with gain penalty, minimizing the error between LQR gain and a preliminary estimated gain. We hypothesize that relying on an estimated gain may limit ILQR optimization capability. In this study, we derive an ILQR optimization with output penalty, minimizing the error between the model output and the measured output. We conducted the test on 30 healthy subjects who sat on a robotic seat capable of rotation. The task utilized a physical human–robot interaction with a perturbation torque as input and lower and upper body angles as output. Our method significantly improved the goodness of fit compared to the gain-penalty ILQR. Moreover, the dominant inferred intent was not statistically different between the two methods. To our knowledge, this work is the first that infers motor control intent for a sample of healthy subjects. This is a step closer to investigating control intent differences between healthy subjects and subjects with altered motor control, e.g., low back pain.

## 1 Introduction

Applying system identification to the study of human motor control has been investigated over the last few decades. System identification methods enabled researchers to determine physiological parameters, control gains, and control bandwidths for various motor control tasks [1–4]. One subfield of system identification is inverse optimal control (IOC) theory. In IOC, a stabilization feedback control law is first constructed for a given plant. Then, meaningful cost functions are retrieved based on state variables and control inputs [5–8]. These cost functions determine how much weight the controller assigns to various states compared to the control effort. Several previous studies have estimated optimal control cost functions from human motion data in an effort to explain human control intent [8–11]. Unlike the general potential cost functions used in these studies, we propose the use of a control theoretical method that uses a linear quadratic regulator (LQR) framework [12].

The inverse LQR (ILQR) problem received some attention with some results for potentially unstable controllers [13]. Since controller stabilization is important when examining engineering or biological systems, we focus on the stable LQR problem. In addition, when the cross term *S* of the LQR cost function is included, any controller *K* is optimal for some cost function [14]. However, for various reasons, we chose to exclude *S* from our LQR cost function. It is rarely used in the design of LQR controllers for real system. The inverse results, including cross terms, provide less meaningful information about the control intent than the results from the direct separation of control and state costs (e.g., the principle of parsimony). As a result, in the remainder of this paper, we will focus on the stable LQR problem with *S *=* *0.

There have been some studies on ILQR [15–17]. These studies (based on Ref. [15], gain-penalty ILQR) use two optimization steps where the estimated control gain *K* from the first step is used in constructing cost function for the second step. This is a relatively complex process, and errors between the two steps can accumulate. Moreover, the optimization capability may be limited because the experimental data is not used in the second step. Therefore, this study aims at overcoming these limitations. Other studies presented alternative approaches as well [18,19]. They used particle swarm optimization and an off-the-shelf genetic algorithm, respectively, which do not provide an exact solution.

This study proposes an output-penalty ILQR method that uses the experimental data in the ILQR cost function. We applied normalization and auto-annealing techniques [20,21] to improve performance and speed of operation. We hypothesize that the output-penalty ILQR will yield a better fit since it uses the experimental data. A better fit is highly desirable to increase trust in the model and hence the inferred intent. We are the first who build a framework intended to investigate the feedback control aspects of low back pain. Our long-term clinical question is what differences (in terms of feedback control) exist between healthy subjects and subjects with low back pain. We are approaching this question from an optimality perspective, i.e., each subject tries to optimize a quantity while doing the task. This is a step closer to investigating control intent differences between healthy subjects and subjects with altered motor control, e.g., low back pain patients.

## 2 Materials and Methods

### 2.1 Subjects.

Thirty healthy subjects participated in this study, with an average age of 33.2±12.0 yr (age range = 18–59 yr), height of 168.2±9.2 cm, and weight of 76.3±14.2 kg. The group consisted of 11 males and 19 females. All participants had no neurological or musculoskeletal disorders affecting motor control. Ethical approval for the study was obtained from the MSU internal review board, and subjects provided informed consent prior to enrollment.

### 2.2 Data Collection.

The seated balance test was performed by each subject during three laboratory visits, and each test was identical. For this test, the subject sat on a back-drivable robotic seat capable of rotation about an axis perpendicular to the coronal plane (Figs. 1 and 2). The robotic seat was driven by a motor (C062C-13-3305, Kollmorgen, Radford, VA). Subjects were securely strapped to the seat with their knees and hips positioned at 90 deg, and their arms crossed on their chests. They were instructed to balance on the seat, while the robot provided both spring–damper action and torque perturbations. Additional details regarding the test setup can be found in Ref. [22].

Data acquisition during the test was conducted using a data acquisition board (PCI-DAS6036, Measurement Computing, Norton, MA) and a quadrature encoder board (PCI-QUAD04, Measurement Computing, Norton, MA) at a rate of 1000 samples/s. The recorded data included seat angle ($\alpha 1$), upper body angle ($\alpha 2$), robot stiffness ($kr$), robot damping ($cr$), perturbation torque (*u*), total robot torque ($ur$), and the subject's weight and height. Robot stiffness ($kr$), damping ($cr)$, and the amplitude of the perturbation torque were tuned for each subject to normalize test difficulty between subjects before testing [22,23]. For instance, each subject was asked to balance on the seat using a high stiffness value, and then the operator would decrease it gradually until the subject would not be able to balance anymore. After fine-tuning the estimate of this critical stiffness, the experiment stiffness was set to twice this value. More details are in Ref. [22]. In less demanding settings, achieving balance provides flexibility in control, a factor that might not be contingent on health status but rather could denote individual inclinations. To reduce ambiguity in evaluation, it is advisable to subject trunk neuromuscular control to maximal challenges [24]. Perturbation torque ($u$) was designed as a pseudorandom ternary sequence with a power spectral density up to approximately 1.6 Hz. The measurement of $\alpha 1$ was obtained from an encoder in the motor. Previous studies have claimed that encoders accurately reflect lower body angles [22]. $\alpha 2$ was measured using two string potentiometers (SP2-50, Celesco, Chatsworth, CA) attached to the fourth thoracic (*T*4) and *L*4 level. Prior studies have demonstrated the reliability of the measurements obtained from the seated balance test (intraclass correlation coefficients: 0.98 for $\alpha 1$ and 0.89 for $\alpha 2$ in time domain) [22].

### 2.3 Model and Preliminary Estimation.

*L*4 vertebrae by the subject, as well as intrinsic lumbar stiffness ($kh$) and damping coefficients ($ch$). The subject exerted a torque $uh$ about the

*L*4 vertebrae and had intrinsic lumbar stiffness and damping coefficients $kh$ and $ch$, respectively. The dynamic model of physical human–robot interaction system has the following equations of motion. The details of the dynamic model can be found in Ref. [25]. The model parameters are summarized in Table 1

where $g$ is gravity. The lower body and robot seat below the fourth lumbar (*L*4) vertebra are lumped into a single rigid element with mass $M1$ and moments of inertia $J1$ with respect to the center of mass (COM). COM is at distance $l1$ relative to the pivot *O* of the seat. Similarly, the upper body above the *L*4 vertebra is lumped into a rigid element with a mass $M2$ and a moment of inertia $J2$ relative to COM. The distance between the COM and *L*4 vertebrae of the upper body is $l2$. The *L*4 vertebra itself is at a distance $l12$ relative to the seat pivot *O*. The subjects can apply a control torque $uh$ for the *L*4 vertebrae and possess an intrinsic lumbar stiffness $kh$ and damping coefficients $ch$. In addition to the torque disturbance *u*, a virtual stiffness $kr$ and a virtual damping $cr$ for the pivot point *O* are applied via feedback. The sum of these torques produces the total robot torque $ur$ for the pivot point *O*, where $ur=u\u2212kr\alpha 1\u2212cr\alpha \u02d91$.

Notation | Description |
---|---|

$\alpha 1$ | Angle of the lower body from vertical |

$\alpha 2$ | Angle of the upper body from vertical |

$M1$ | Mass of the subject's lower body below the fourth lumbar vertebrae (L4) and the seat |

$J1$ | Moment of inertia of the subject's lower body and the seat about its center of mass |

$l1$ | Distance between the pivot point and $M1$center of mass |

$M2$ | Mass of the subject's upper body above L4 |

$J2$ | Moment of inertia of the subject's upper body about its center of mass |

$l2$ | Distance between L4 and $M2$ center of mass |

$l12$ | Distance between the pivot and L4 |

$uh$ | Human control torque about L4 |

$kh$ | Intrinsic rotational stiffness about L4 |

$ch$ | Intrinsic rotational damping about L4 |

$u$ | Perturbation torque about the pivot |

$kr$ | Robot stiffness about the pivot |

$cr$ | Robot damping about the pivot |

$ur$ | Total robot torque about the pivot point, i.e., $ur=u\u2212kr\alpha 1\u2212cr\alpha \u02d91\u2009$ |

Notation | Description |
---|---|

$\alpha 1$ | Angle of the lower body from vertical |

$\alpha 2$ | Angle of the upper body from vertical |

$M1$ | Mass of the subject's lower body below the fourth lumbar vertebrae (L4) and the seat |

$J1$ | Moment of inertia of the subject's lower body and the seat about its center of mass |

$l1$ | Distance between the pivot point and $M1$center of mass |

$M2$ | Mass of the subject's upper body above L4 |

$J2$ | Moment of inertia of the subject's upper body about its center of mass |

$l2$ | Distance between L4 and $M2$ center of mass |

$l12$ | Distance between the pivot and L4 |

$uh$ | Human control torque about L4 |

$kh$ | Intrinsic rotational stiffness about L4 |

$ch$ | Intrinsic rotational damping about L4 |

$u$ | Perturbation torque about the pivot |

$kr$ | Robot stiffness about the pivot |

$cr$ | Robot damping about the pivot |

$ur$ | Total robot torque about the pivot point, i.e., $ur=u\u2212kr\alpha 1\u2212cr\alpha \u02d91\u2009$ |

The dynamical model of physical human–robot interaction system was formulated as a closed-loop feedback control system. The block diagram of the closed-loop system is shown in Fig. 3. The plant P represents the linearized mechanical dynamics with respect to the upright equilibrium point of the system in Fig. 2 [25]. The kinematic vector $[\alpha 1\u2009\alpha \u02d91\u2009\alpha 2\u2009\alpha \u02d92]T$ represents the states $x$ of the plant P. Active human motor control was modeled as a gain feedback control law, $uh=\u2212K1\alpha 1\u2212K2\alpha \u02d91\u2212K3\alpha 2\u2212K4\alpha \u02d92=\u2212Kx$ (Fig. 3), where the kinematic vector $[\alpha 1\u2009\alpha 2]T$ represents the output measurement in the seated balance test.

where $y\u0302$ is the output of the model using $\rho $, $ye$ is the experimental measurement ($\alpha 1,\u2009\alpha 2$), and $Ymax$ is a diagonal matrix with the maximum absolute value of $ye$ (used for normalization). Twenty initial points selected by a Latin hypercube method were used for estimating $\rho $.

### 2.4 Inverse Linear Quadratic Regulator.

Our ILQR problem is to estimate the weighting matrices of LQR using input–output measurements of the seated balance test. Here, the estimated weight matrices are optimal where the information of the experimental data is used as much as possible. We will solve the general case of this ILQR problem when both the weighting matrices$\u2009Q$ and $R$ are unknown.

The objective can be stated as follows.

*Problem 1*. Given the experimental data, fit a model to get model parameters including an initial value of the control gain $K$. Then, given the experimental posture dataset, ${ye}\u2208RN\xd7yn$, apply the gradient-based, least-squares minimization (ILQR) with the cost function $\Vert y(\theta )\u2212ye\Vert F2$ to estimate the weighting matrices $Q\u0302$ and $R\u0302$, where $\Vert \u22c5\Vert F2$ is the Frobenius norm, and $\theta $ is the upper triangular elements of $Q$ and $R$.

The previous formulation of the ILQR [15] used a gain penalty where the objective was minimizing $\Vert K(\theta )\u2212Ke\Vert F2$. In this study, we opt to use an output penalty by minimizing $\Vert y(\theta )\u2212ye\Vert F2$ in order to achieve better fitting between the model with LQR optimal feedback and the experimental measurements.

*N*observations $(y,ye\u2208R2\xd7N)$. We can rewrite the minimization as

### 2.5 Gradient-Descent-Based Solution.

with $xk\u2208R4,uk\u2208R,yk\u2208R2$. $wk\u223cN(0,\Sigma )\u2208R2$ is white and noise uncorrelated in time. $\rho 0$ is the true model parameter vector in Eq. (4).

where $yn=[\alpha 1\u2009\alpha 2]T$. We now have a nonrecursive solution $yv$ for all time steps $k\u22651$ given $U$. Note that the first element in $U$ is $x1$.

By solving this Sylvester equation, we get $P\u2032$. Then, we determine $e\u2032$ for all elements $i$ of $\theta i(t)$. This is a directional derivative that minimizes the cost in Eq. (11).

which is iterated a preset number of times. We introduced a projection rule for $\theta $ to maintain $Q\u0302\u2ab00$ and $R\u0302\u227b0$. If the update of $\theta $ compromise these conditions, the rule adjusts $\theta $ to the nearest matrix complying with them.

### 2.6 Primary Outcomes and Comparisons.

The first primary outcome of interest is the human control intent. However, clear information about the intent may not be readily apparent from the estimated weighting matrix $Q\u0302$ itself. To address this, we applied a similarity transformation to $Q\u0302$ and the state $x$ to obtain a diagonal matrix $Q\u02dc$ which provides a clear representation of the intent. The transformed state $x\u02dc$ is a linear combination of the elements of $x$. Let $V$ be an orthogonal matrix whose columns are the eigenvectors of $Q$, and let $\Lambda $ be a square diagonal matrix whose diagonal elements are the eigenvalues corresponding to each eigenvector. Then, to satisfy $xTQ\u0302x=x\u02dcTQ\u02dcx\u02dc$, the transformation is given by$\u2009Q\u02dc=\Lambda $ and $x\u02dc=VTx$. The eigenvector corresponding to the largest eigenvalue represents the most dominant linear combination of body angles and velocities during the task. In other words, it represents the most dominant human motor control intent.

A higher VAF value indicates a better fit, with 100% indicating a perfect match between the estimated model output $\alpha \u0302(k,\theta )$ and the measured output $\alpha e(k)$.

We set a VAF threshold of 50% to remove any case or subject from further analysis. This threshold was selected to balance the tradeoff between how well the model describes the data and how many cases/subjects would be excluded from the analysis. Although this is not the topic of this article, we like to elaborate for future endeavors. Most balance biomechanics studies model the body as a single degree-of-freedom (DoF) with VAF value close to our $VAF\alpha 1$ [27,28]. In our group, we started with a two-degree-of-freedom model without reporting the VAF of a fit measure [3]. In general, it is expected that $VAF\alpha 1>\u2009VAF\alpha 2$ since we demonstrated that the reliability measures of $\alpha 1$ were better than $\alpha 2$ [22]. One reason for the low $VAF\alpha 2$ is that this DoF was not perturbed; therefore, its response was more confounded by other physiological or nonlinear control factors, such as muscle co-activation or gain scheduling.

To investigate the differences between the two ILQR methods (output penalty versus gain penalty), we employed repeated-measures multivariate analysis of variance (MANOVA) [29]. Details about MANOVA's procedures and statistical calculations can be found in Ref. [30].This allowed us to compare the multivariable weights in dominant $V\u2009(w\alpha 1,w\alpha \u02d91,w\alpha 2,w\alpha \u02d92)$ as well as the goodness of fit $(VAF\alpha 1,\u2009VAF\alpha 2)$ between the two methods treating them as the repeated measure. In cases where MANOVA showed a significant difference, a univariate repeated-measures ANOVA was performed. In our study, we opted for a multivariate repeated-measures ANOVA instead of a paired *t*-test due to the nonscalar nature of the variables under consideration. Repeated-measures ANOVA is typically used when multiple measurements are taken on the same subjects across various conditions or time points. While most literature studies apply repeated-measures ANOVA when running the same method at different time points on the same subject, here, the repeated measures are the different methods applied to the same subject/trial. Statistical significance was set at *p* ≤ 0.05. We conducted this statistical analysis using spss version 26 (IBM, Armonk, NY).

For ILQR, computations of $G$ and $G\u2032$ in Eqs. (15) and (17) were to be done several times per iteration, which causes high computational loads. Therefore, we decided to downsample the data from 30,000 to 300 samples per trial. The VAF difference before and after downsampling was up to 5%, indicating no significant difference in the fitting.

## 3 Comparative Results and Discussion

We analyzed a total of 90 cases (30 subjects × 3 visits/subject) for this study. Of these, two subjects (six cases) were excluded due to invalid height or weight records. After conducting preliminary estimation of the model parameters $\rho $ and applying ILQR algorithm, 33 cases (19 subjects) remained for further analysis. We opted to draw meaningful insights from subjects that align well with the model's descriptions. Essentially, including subjects with poor fit could potentially obscure valuable observations obtained from subjects with good fit. We recognize that the inability to achieve reasonable goodness of fit in certain subjects represents a limitation of the current model. We can speculate on reasons why the model fails to explain data from all subjects. One plausible explanation is being trapped in a local minimum, as depicted in Fig. 4 where the weights in the initial point show minimal change from the final solution. Thus, we acknowledge that the current inverse LQR model exhibits limited generalizability across subjects. The analysis results for all subjects excluding those with invalid records can be found in the Supplemental Material on the ASME Digital Collection.

The output-penalty ILQR yielded an average VAF of 88.31% for $\alpha 1$ and 66.92% for $\alpha 2$, while the gain-penalty ILQR resulted in lower values of 87.40% for $\alpha 1$ and 61.54% for $\alpha 2$. (For reference, the VAF for $Qs$ was 85.89 and 56.47, respectively.) Our model does not account for physiological nonlinearities and other unmodeled dynamics. Given that the VAF of $\alpha 1$ is approximately 88%, the model may have reached its maximum capacity to describe the data, indicating limited potential for further improvement. Van Drunen et al. [27] implemented various biomechanical models for a 1DoF trunk task, reporting a maximum VAF of 89.5% in their relax task. Our VAF for $\alpha 2$ showed an improvement four times greater than that of $\alpha 1$, suggesting that there is still considerable room for enhancing the fit of $\alpha 2$.

Regarding the first primary outcome, which involved comparing the weights in dominant $V$, the MANOVA analysis showed no effect for the method (*p* = 0.38) (Fig. 4). This suggests that there is no significant difference in the inferred (dominant) intent between the two ILQR methods. To investigate this further, we included the dominant $V$ from the starting point of the ILQR algorithm ($Qs$). By comparing among three methods (two ILQR methods and $Qs$), repeated-measures MANOVA showed no significant effect for the method (*p* = 0.26) (Fig. 4). This indicates that the gradient-descent formulation of ILQR renders the inferred intent dependent on the starting point of the algorithm regardless of the cost function. A detailed example of the first primary outcome can be found in the Supplemental Material on the ASME Digital Collection. We observed in Fig. 4 that the weights ($w\alpha 1$, $w\alpha \u02d91$, $w\alpha \u02d92$) exhibited much less variation compared to $w\alpha 2$ which can also take on negative values. Consequently, we hypothesized that $w\alpha 2$ is associated with different balance strategies within healthy subjects, and further analysis of this will be addressed in future studies.

In contrast, the second primary outcome, which involved the VAF, showed a significant effect for the method (*p* = 0.004) in the MANOVA analysis. A posthoc univariate ANOVA on $VAF\alpha 1$ and$\u2009VAF\alpha 2$ was also significant (*p* = 0.007 and 0.001, respectively) (Fig. 5). This confirms that the goodness of fit is better in the output-penalty ILQR, supporting our initial hypothesis. This outcome is expected since the cost function in the output-penalty ILQR directly minimizes the output fit error.

Notably, the fitting of $\alpha 1$ was significantly better than that of $\alpha 2$ within each method (Fig. 5). This result is consistent with a reliability study of this experiment [22], which showed higher reliability measures for $\alpha 1$ than for $\alpha 2$. A potential explanation for this difference lies in the fact that $\alpha 1$ is an actuated DoF with a known perturbation torque. In contrast, $\alpha 2$ is not actuated, making its response more susceptible to unknown neural system noise. Moreover, $\alpha 2$ exhibits smaller response amplitudes than $\alpha 1$, resulting in a lower signal-to-noise ratio.

We acknowledge the capability of subjects to learn environmental dynamics. However, we decided on modeling through pure feedback pathways, cautious of the potential for overfitting with the inclusion of feedforward paths [31]. Furthermore, the learning of environmental dynamics by subjects does not represent clinically significant differences [32].

The gain-penalty ILQR uses the output $y$ to estimate $\rho $ (including $K$) initially, then optimizes $Q$ and $R$ through the estimated $K$, i.e., $K\u0302$ [15]. In the processes of estimating *K* first and obtaining $Q$ and $R$ later from $K\u0302$, the parameter estimation error in $K\u0302$ can propagate. On the other hand, the output-penalty ILQR, we mainly use $y$ to obtain $Q$ and $R$ minimizing the error propagation, which results in a better goodness of fit in terms of VAF.

Despite these significant findings, there are some limitations to this study. First, unfortunately, we did not collect electromyography data during the test, which could have provided valuable insights into muscle activations. Novice strategies might have relied more on muscle co-contractions, as previous research has shown that performance declines with increasing muscle co-contraction [33]. Future studies should investigate if our observed kinematic patterns correspond to distinct electromyography patterns to test this hypothesis. Additionally, some cases were excluded due to poor fitting, particularly with regards to the upper body angle. Although we improved the fitting by introducing the weight $Ymax$ into the preliminary estimation and adopting the output-penalty ILQR, there is still room for improvement in model fitting. Future research should explore methods to enhance the accuracy of the model fitting process. Furthermore, our current model includes the physical human–robot interaction (2DoF pendulum) and linear feedback of positions and velocities, reflecting the proprioceptive feedback pathway. Muscle dynamics and sensory delay were not explicitly modeled. However, in our previous research [34], where we incorporated muscle dynamics and sensory delay, we found that the inferred intent was predominantly influenced by kinematic states rather than delay or muscle dynamics. Therefore, our current model is considered a good approximation.

## 4 Conclusion

An output-penalty ILQR was derived in this paper. We used the experimental data directly in the cost function, unlike a previous study with gain-penalty ILQR [15]. Our method can be applied not only to the estimation of human control intent but also to the reverse engineering of black box controllers. By employing the output-penalty ILQR method on the seated balance experimental data, we observed a meaningful enhancement in goodness of fit. Specifically, the average VAF improvement for $\alpha 1$ and $\alpha 2$ was 1.04% and 8.74%, respectively, representing an improvement ratio over the gain-penalty ILQR approach. There was no significant difference in the inferred (dominant) intent between the two methods. This was due to the starting point $Qs\u2009$of the algorithm. This is a step closer to investigating control intent differences between healthy subjects and subjects with altered motor control, e.g., low back pain patients.

## Acknowledgment

The contents are solely the responsibility of the authors and do not necessarily represent the official views of NCCIH.

## Funding Data

National Center for Complementary and Integrative Health (NCCIH) at the National Institutes of Health (Grant No. U19AT006057; Funder ID: 10.13039/100008460).

National Research Foundation of Korea (NRF) grant funded by the Korea government (MSIT) (No. RS-2023-00221762; Funder ID: 10.13039/501100003725).

## Data Availability Statement

The datasets generated and supporting the findings of this article are obtainable from the corresponding author upon reasonable request.

## References

*Kinetics of Human Motion*

*ANOVA: Repeated Measures*