Control Theory on 四方喫茶舘

Feedback Linearization, Part 3

Sun, 14 Jun 2026 14:51:26 +0800

Feedback Linearization Theorem

We talked about feedback linearization theorem last time. As a recap:

Feedback Linearization Theorem: Nonlinear system $\Sigma: \dot{x} = f(x) + g(x)u$ is feedback linearizable if:

$[g(x), ad_fg(x), \ldots, ad_f^{n-1}g(x)]$ has rank $n$ $\forall x$.
$\Delta = \text{span}{g(x), ad_fg(x), \ldots, ad_f^{n-2}g(x)}$ is involutive.

The first condition guarantees controllability, while the second condition guarantees that we can always find an output $y = h(x)$ that has relative degree equal to the system degree, according to Frobenius theorem. Actually, this is also just observability.

We now look at some examples.

Consider the system

$$ \dot{x} = \begin{pmatrix} a \sin x_2 \\ -x_1^2 \end{pmatrix} + \begin{pmatrix}0 \\ 1 \end{pmatrix}u $$

We would like to ask 2 questions:

Is the system feedback linearizable?
If so, how shall we find the output $y = h(x)$?

To answer the first question, we first validate if the first condition is met from feedback linearization theorem.

$$ g(x) = \begin{pmatrix}0 \\ 1 \end{pmatrix} $$

$$ad_fg = [f, g] = \begin{pmatrix} -a \cos x_2 \\ 0 \end{pmatrix}$$

Therefore,

$$ [g(x), ad_fg(x)] = \begin{pmatrix}0 & -a \cos x_2 \\ 1 & 0 \end{pmatrix} $$

This new matrix is always rank 2, for all $x$, except when $\cos x_2 = 0$. The distribution $\Delta = \text{span}{g(x) }$ has only one element, so it’s trivially involutive. Therefore we conclude the system is feedback linearizable.

Now, how shall we find the output $y$? We would like to find an output $y = h(x)$ such that it has relative degree of $2$, i.e.:

$$\begin{cases} \begin{align} \frac{\partial h}{\partial x} g(x) &= 0 \\ \frac{\partial L_f h}{\partial x} g(x) &\neq 0 \end{align} \end{cases} $$

The first PDE will give us

$$ \frac{\partial h}{\partial x_2} = 0 $$

meaning $h(x)$ shall be independent of $x_2$. We sub this fact into the second PDE:

$$ \frac{\partial L_f h}{\partial x} g(x) = \frac{\partial L_fh}{\partial x_2} = \frac{\partial h}{\partial x_1}a \cos x_2 \neq 0$$

Therefore, we can pick a few candidate $h(x)$, for example: $x_1$, $x_1^5$, and so forth. If we pick $h(x) = x_1$, then we can linearize the system as

$$ \ddot{y} = v$$

where the state and control transform is given by

$$ \begin{cases} \begin{align} y &= x_1 \\ \dot{y} &= a\sin x_2 \\ u &= (x_1^2 + v) \frac{1}{a\cos x_2} \end{align} \end{cases} $$

The audience is encouraged to verify the linearization by substituting the transforms back into the original system.

As a result, we are able to design a linear control between $y$ and $v$ by LQR or pole placement, and we utilize the state and control transform to convert the system back into the original nonlinear system.

MIMO Feedback Linearization

We now move forward to a more complex and generalized system: the multi-input multi-output nonlinear system. For the sake of simplicity, we limit the MIMO to be the square case (meaning we have the same number of inputs and outputs).

If we have a square MIMO system that looks like:

$$ \begin{align} \displaystyle \Sigma: \dot{x} &= f(x) + \Sigma_{i=1}^n g_i(x) u_i \quad x \in \mathbb{R}^n \\ &= g(x)u \\ y &= \begin{pmatrix} h_1(x) \\ \vdots \\ h_n(x) \end{pmatrix} \end{align} $$

where

$$ \begin{align} g(x) &= \begin{pmatrix} g_1(x) & \cdots & g_n(x) \end{pmatrix} \\ u &= \begin{pmatrix} u_1 \\ \vdots \\ u_n\end{pmatrix} \end{align} $$

The question is now, how shall we define the relative degree of the MIMO system?

Vector Relative Degree

We introduce the concept of vector relative degree in this case. (Definition) Vector Relative Degree: Nonlinear system $\Sigma$ has relative degree $(r_1, r_2, \ldots, r_n)$ at $x_0$ if:

For all $1 \le j \le n, 1 \le i \le n, 0 \le k \le r_i - 2$, $$ L_{g_j}L_f^kh_i = 0, \quad \forall x \text{ in a neighborhood of } x_0 $$
The $n \times n$ matrix, also known as the Decoupling Matrix, $$ A(x) = \begin{pmatrix} L_{g_1}L_f^{r_1-1}h_1 & \cdots & L_{g_n}L_f^{r_n-1}h_n \\ \vdots & \cdots & \vdots \\ L_{g_1}L_f^{r_n-1}h_1 & \cdots & L_{g_n}L_f^{r_n-1}h_n \end{pmatrix} $$ Then, for the i-th output, we can always express it in terms of $$ \begin{align} y_i^{(r_i)} &= L_f^{r_i} h_i(x)+ L_{g_1}L_f^{r_i-1}h_i(x)u_1 + \cdots + L_{g_n}L_f^{r_i-1}h_i(x)u_n \\ &= L_f^{r_i} h_i(x) + \displaystyle \Sigma_j L_{g_j}L_f^{r_i-1}h_i(x)u_j \end{align} $$

If, at least one $L_{g_j}L_f^{r_i-1}h_i(x)$ is non-zero, then the system is feedback linearizable. Therefore, we can also do IO linearization:

$$ \begin{align} \begin{pmatrix} y_1^{(r_1)} \\ \vdots \\ y_n^{(r_n)} \end{pmatrix}&=\begin{pmatrix} L_f^{r_1}h_1(x) \\ \vdots \\ L_f^{r_n}h_n \end{pmatrix} + \begin{pmatrix} L_f^{r_1} h_1(x) & \cdots & L_{g_n}L_f^{r_1-1}h_1(x) \\ \vdots & \cdots & \vdots \\ L_f^{r_n} h_n(x) & \cdots & L_{g_n}L_f^{r_n-1}h_n(x) \end{pmatrix} \begin{pmatrix} u_1 \\ \vdots \\ u_n \end{pmatrix} \\ &= L_fh(x) + A(x) u \end{align} $$

where, notice that we implicitly extended the definition of Lie derivative to its vector form. And the control can be transformed as:

$$ u(x) = A^{-1}(x)(L_fh(x)+ v) \rightarrow \begin{pmatrix} y_1^{(r_1)} \\ \vdots \\ y_n^{(r_n)} \end{pmatrix} = v $$

MIMO Feedback Linearization Theorem

Now we state the feedback linearization theorem in MIMO form: Theorem(MIMO Feedback Linearization): A MIMO nonlinear system $\Sigma$ is:

feedback linearizable, if its vector relative degree $r = (r_1, r_2, \ldots, r_n)$ satisfies such that $$ r_1 + \ldots + r_n = \displaystyle \Sigma_{i=1}^n r_i \ge n$$
If the sum $$ r_1 + \ldots + r_n = \displaystyle \Sigma_{i=1}^n r_i < n$$ , then the system can only be IO linearizable, where we have to rely on the internal zero dynamic to be also stable in order for the full system to be stable.

Examples

Consider a motion of a wheeled vehicle moving in a horizontal plane. The kinematics of the vehicle are given by the differential equations:

$$ \begin{align} \dot{x} &= V \cos \theta \\ \dot{y} &= V \sin \theta \\ \dot{\theta} &= \omega \end{align} $$

Here $(x, y)$ is the location in the horizontal 2D plane, $V$ is the vehicle speed, and $\theta$ denotes the vehicle heading angle, and $\omega$ denotes the vehicle turning rate.

Ill-defined Vector Relative Degree

If we consider the vehicle speed $V$ and the vehicle turning range $\omega$ as two control inputs, and the vehicle locations in the plane as two outputs, the vector relative degree is not well-defined. We notice the system now looks like:

$$ \begin{align} \frac{d}{dt}\begin{pmatrix} x \\ y \\ \theta \end{pmatrix} &= \begin{pmatrix}u_1 \cos \theta \\ u_1 \sin \theta \\ u_2 \end{pmatrix} \\ \begin{pmatrix} y_1 \\ y_2 \end{pmatrix} &= \begin{pmatrix} x \\ y \end{pmatrix} \end{align} $$

If we take the first time derivative of the outputs

$$ \begin{align} \frac{d}{dt}\begin{pmatrix} y_1 \\ y_2 \end{pmatrix} &= \frac{d}{dt}\begin{pmatrix} x \\ y \end{pmatrix} = \begin{pmatrix} u_1 \cos \theta \\ u_1 \sin \theta \end{pmatrix} \end{align} $$

Only the first control input shows up, which is a red flag. If we consider the coupling matrix

$$ A(x) = \begin{pmatrix} \cos \theta & 0 \\ \sin \theta & 0 \end{pmatrix} $$

is actually singular. Therefore the relative degree in this case is not well defined.

Well-defined Vector Relative Degree

If we now consider the vehicle acceleration and the vehicle turning rate as the two control inputs, and we still use the vehicle position as the two outputs, this time the vehicle relative degree is actually well-defined, so long as $V > 0$ for this 4-th order nonlinear system. We have the original system expressed as:

$$ \begin{align} \frac{d}{dt} \begin{pmatrix} x \\ y \\ \theta \\ V \end{pmatrix} &= \begin{pmatrix} V \cos \theta \\ V \sin \theta \\ u_2 \\ u_1 \end{pmatrix} \\ \begin{pmatrix} y_1 \\ y_2 \end{pmatrix} &= \begin{pmatrix} x \\ y \end{pmatrix} \end{align} $$

If we take the first time derivative of the output vector:

$$ \frac{d}{dt}\begin{pmatrix} y_1 \\ y_2 \end{pmatrix} = \begin{pmatrix} V \cos \theta \\ V \sin \theta \end{pmatrix} $$

We realize that both inputs don’t explicitly show up, therefore we take another round of differentiation:

$$ \frac{d^2}{dt^2}\begin{pmatrix} y_1 \\ y_2 \end{pmatrix} = \begin{pmatrix} u_1 \cos \theta - u_2 V \sin \theta \\ u_1 \sin \theta + u_2 V \cos \theta \end{pmatrix} $$

Now, both inputs show up which is a good sign. We verify this by considering the decoupling matrix:

$$ A(x) = \begin{pmatrix} \cos \theta & -V \sin \theta \\ \sin \theta & V \cos \theta \end{pmatrix} $$

And the determinant is given by

$$ \text{det}(A(x)) = V $$

We now realize that the decoupling matrix is non-singular, as long as the speed is non-zero. Therefore the vector relative degree is well-defined.

Control Law

Given that $r_1 = r_2 = 2$ in this case, and we satisfy $r_1 + r_2 = 4 = n$, we can find a state transformation and a control transformation so that the original system can be feedback linearized. We consider the control transformation as

$$ \begin{pmatrix} \xi_1 = y_1 = x\\ \xi_2 = y_2 = y\\ \xi_3 = \dot{y_1} \\ \xi_4 = \dot{y_2} \end{pmatrix} $$

And we can derive the control transformation as

$$ \begin{pmatrix} v_1 = u_1 \cos \theta - u_2 V \sin \theta \\ v_2 = u_2 \sin \theta + u_2 V \cos \theta \end{pmatrix} $$

and the resulting system now looks like

$$ \begin{align} \begin{cases} \dot{\xi_1} &= \xi_3 \\ \dot{\xi_2} &= \xi_4 \\ \dot{\xi_3} &= v_1 \\ \dot{\xi_4} &= v_2 \end{cases} \end{align} $$

Which is happily a double integrator system, thus can be controlled (pole placed, or LQRed) to be stable.

Feedback Linearization, Part 2

Fri, 12 Jun 2026 19:33:35 +0800

More Facts about IO Linearization

We are now aware of how to perform input-output linearization. To summarize:

For an output with relative degree $r$, we are able to construct a feedback linearization mapping such that the input-output linearized system is of order $r$.
The remaining state will construct a “zero plane” $Z$ where the zero dynamics on the plane will determine the stability of the overall system.

Now, we can draw an obvious conclusion if the zero dynamic is indeed stable:

Theorem: If $z = 0$ is locally exponentially stable for the zero dynamics, $\dot{z} = q(0, z)$, then $u_{IO}, v$ locally exponentially stabilizes $x = 0$.

The proof is as follows:

Proof: The closed loop system is given by

$$ \begin{align} \dot{\xi} &= A_{CL} \xi, A_{CL} = A - BK \\ \dot{z} &= q(\xi, z) \end{align} $$

where

$$ A_{CL} = \begin{pmatrix} 0 & 1 & 0 &\ldots & 0 \\ 0 & 0 & 1 & \ldots & 0 \\ \vdots & \vdots & \vdots & \ddots & \vdots \\ -k_1 & -k_2 & -k_3 & \ldots & -k_r \end{pmatrix} $$

where $\Re{\lambda_i} < 0$ for all $i = 1, \ldots, r$. If we linearize the system at $\xi = z = 0$, we get the following:

$$ \frac{d}{dt} \begin{pmatrix} \delta \xi \\ \delta z \end{pmatrix} = \begin{pmatrix} A_{CL} & 0 \\ \frac{\partial q}{\partial \xi}(0, 0) & \frac{\partial q}{\partial z}(0, 0) \end{pmatrix} \begin{pmatrix} \delta \xi \\ \delta z \end{pmatrix} $$

The matrix is Hurwitz, thus the end of the proof.

We notice that the relationship between the new states $\xi$ and original states $x$ is such that

$$ \xi = T(x)$$

To verify if $T$ is a diffeomorphism, we introduce the following theorem:

Inverse Function Theorem: A function $T: \mathbb{R}^n \to \mathbb{R}^n$, $T \in C^1$ satisfies

$$ \frac{\partial T}{\partial x}(x_0) \neq 0 $$

is full rank, then $T^{-1}$ exists, and is continuous and differentiable.

Now let’s connect IO linearization back to feedback linearization. It’s not hard to see that if we can guarantee the zero dynamic to be stable, then the original system will be stable; the best way to make sure the zero dynamic is always stable is that the zero plane shrinks to just a point, and this is done if $r = n$. Therefore we have the following theorem:

Isidori, chapter 4: If a nonlinear system $\Sigma$ has a relative degree $r$ at $x_0$, then on the neighborhood of $x_0$, the functions

$$ \{ h(x), L_fh(x), \ldots, L_f^{r-1}h(x) \} $$

are independent. Then, we can conclude that, $\Sigma$ is feedback linearizable, if and only if $\exists y = h(x)$ such that the output has relative degree $r = n$.

In short, if the output of the system satisfies that the relative degree equals to the system degree, then the system is always linearizable. However, if the output has a relative degree smaller than the system degree, it’s possible that we didn’t pick a good output – how do we know if a system can in fact be feedback linearizable? We’ll have to introduce some more new concepts to answer the question.

Introduction to Differential Geometry

Our audience may find themselves familiar with these concepts, if they have taken classes in general relativity.

Manifold

Let $M$ be a non-empty set of $\mathbb{R}^n$ and let $1 \le m < n$, then $M$ is an n-dimensional smooth Manifold of $\mathbb{R}^n$ if, $\forall p \in M$, $\exists r > 0$, $F:B_r(p) \to \mathbb{R}^{n-m}$ such that:

$M \cap B_r(p) = { x \in \mathbb{R}^n | F(x) = 0 }$
$ F \in C^0 $
$ \forall \bar{x} \in M \cap B_r(p)$, $\text{rank} \frac{\partial F}{\partial x}(\bar{x}) = n - m $

Intuitively, a manifold is a shape that “embeds” into a Euclidean space. We can always find a local mapping (also known as the “atlas”) to map the manifold into another local region in Euclidean space, given these two spaces have the same dimension.

Some well-known manifolds are:

Circle, a 1D manifold in $\mathbb{R}^2$
Mobius strip, a 2D manifold in $\mathbb{R}^3$
Sphere, a 2D manifold in $\mathbb{R}^3$
Klein bottle, a 2D manifold in $\mathbb{R}^4$

Tangent Space

Let $M$ be a smooth manifold in $\mathbb{R}^n$ and let $p \in M$, suppose $F: B_r{p} \to \mathbb{R}^{n-1}$ satisfies conditions from definitions of $M$. Then the Tangent Space of $p$, denoted as $T_pM$ is such that

$$ T_pM = \{ v \in \mathbb{R}^n | \frac{\partial F}{\partial x}(p) v = 0 \} = \mathbb{N}(\frac{\partial F}{\partial x}(p)) $$

Note that, $\text{dim}(T_pM) = m$.

Tangent Vector

The Tangent Vector is a vector in tangent space.

We denote the relationship of manifold, tangent space and tangent vector like below:

Vector Field

Vector Field $f$ on manifold $M$ is an assignment to each $p \in M$ a vector $f(p) \in T_pM$. Note that, the vector field is $C^k$ if $f \in C^k$.

Lie Bracket

Given $f, g$ as two different vector fields, the Lie Bracket is defined as

$$ \begin{align} [f, g](x) &= \frac{\partial g}{\partial x}(x) f(x) - \frac{\partial f}{\partial x}(x) g(x) \\ &= L_fg - L_gf \\ \end{align} $$

The Lie bracket can also be expressed in terms of “adjoint” operator, i.e.:

$$ ad_f g(x) = [f, g](x) $$

We can use adjoint operator to express nested Lie brackets:

$$ \begin{align} ad_f^2g(x) &= [f, ad_f g(x)] \\ &= [f, [f, g]](x) \end{align} $$

In general, we have

$$ ad_f^kg(x) = [f, ad_f^{k-1}g(x)] $$

An example of Lie bracket calculation is as follows:

$$ \begin{align} f &= \begin{pmatrix} x_2 \\ -\sin x_1 - x_2 \end{pmatrix} \\ g &= \begin{pmatrix} 0 \\ x_1 \end{pmatrix} \\ [f, g](x) &= L_fg - L_gf \\ &= \begin{pmatrix} 0 \\ x_2 \end{pmatrix} - \begin{pmatrix} x_1 \\ -x_1 \end{pmatrix} \\ &= \begin{pmatrix} -x_1 \\ x_2 + x_1 \end{pmatrix} \end{align} $$

Some useful properties of Lie bracket:

$[f, f ] = 0$
$[f, g] = -[g, f]$
If $f$ and $g$ are constant vectors, then $[f, g] = 0$.

Now let’s consider a linear system $\dot{x} = Ax + Bu$, if we express this in terms of control-affine system form, we have

$$\begin{align} \dot{x} &= f(x) + g(x)u \\ f(x) &= Ax \\ g(x) &= B \end{align} $$$$ \begin{align} ad_fg &= -AB \\ ad_f^2g &= A^2B \\ ad_f^3g &= -A^3B \\ \vdots \\ ad_f^kg &= (-1)^k A^k B \end{align} $$

Tangent Bundle

The Tangent Bundle of a manifold $M$ is defined as

$$ TM = \bigcup_{p \in M} T_pM $$

That is, it’s the “bundle” of all tangent spaces at each point in the manifold.

Distribution

Suppose $f_1, f_2, \ldots, f_n$ are vector fields, the Distribution is defined as

$$ \Delta (x) = \text{span}\{f_1(x), f_2(x), \ldots, f_n(x)\} $$

whereas at each specific point $x$, $\Delta(x)$ represents the subspace of the tangent space $T_xM$.

$\Delta$ is non-singular distribution if $\text{dim}(\Delta(x))$ is a constant $\forall x$.
$\Delta$ is involutive if $$ \forall f, g \in \Delta \Rightarrow [f, g] \in \Delta $$

Let’s consider the following example:

$$ \begin{align} f_1 &= \begin{pmatrix} 2x_2 \\ 1 \\ 0 \end{pmatrix} \\ f_2 &= \begin{pmatrix} 1 \\ 0 \\ x_2 \end{pmatrix} \\ \Delta &= [f_1, f_2] \end{align} $$

Because $\text{dim}\Delta(x) = 2$ for all $x$, the distribution $\Delta$ is non-singular.

$\Delta$ is involutive is equivalent to

$$ [f_1, f_2] = \begin{pmatrix} 0 \\ 0 \\ 1 \end{pmatrix} \in \Delta $$

if and only if $\text{rank}(f_1, f_2, [f_1, f_2]) = 2 \quad \forall x$. Unfortunately the rank is 3, therefore $\Delta$ is not involutive.

Feedback Linearizability

With all these mathematical definitions, we are finally able to determine whether a nonlinear system can actually be feedback linearized, using the following theorem:

A nonlinear system $\Sigma$ is feedback linearizable, if an only if:

$[g(x), ad_fg(x), \ldots, ad_f^{n-1}g(x) ]$ has rank $n$, $\forall x$. This condition guarantees controllability.

$\Delta = \text{span}{g, ad_fg, \ldots, ad_f^{n-2}g}$ is involutive.

If we are able to determine whether the system is feedback linearizable, the next step will be to look for the specific output with relative degree $n$. From our earlier discussion, we are looking for a function $y = h(x)$ such that it meets the following conditions:

$$\begin{align} \begin{cases} L_gh = L_gL_f = &\ldots = L_gL_f^{n-2}h = 0 \forall x \\ L_gL_f^{n-1}h &\neq 0 \end{cases} \end{align} $$

In fact, these two conditions are equivalent to the following two conditions:

$$\begin{align} \begin{cases} L_gh = L_{ad_fg}h = &\ldots = L_{ad_f^{n-2}g}h = 0 \forall x \\ L_{ad_f^{n-1}g}h &\neq 0 \end{cases} \end{align} $$

The advantage of the latter formulation is that we can write the first condition as:

$$ \frac{\partial h}{\partial x} \begin{pmatrix} g(x) & ad_fg(x) & \cdots & ad_f^{n-2}g(x) \end{pmatrix} = 0 $$

The important fact here is that, the solution for this partial differential equation only exists, if $\Delta = {g, ad_fg, \ldots, ad_f^{n-2}g}$ is involutive, according to the (Frobenius theorem)[https://en.wikipedia.org/wiki/Frobenius_theorem_(differential_topology)].

To prove that the two conditions are indeed equivalent, we use the following lemma:

Lemma: Given $L_gh = L_gL_f h = \ldots = L_gL_f^{n-2}h = 0$ for all $x \in B_\delta (x_0)$, then we have
$$L_gL_f^kh = (-1)^k L_{ad_f^kg}h, \forall k = 0,1,\ldots, r-1$$
.

This lemma can be proven using induction, and we skip the full proof here.

In this chapter, we discussed the condition for a nonlinear system to be fully feedback linearizable. In the final chapter, we’ll give some examples and extend to multi-input multi-output case.

Feedback Linearization, Part 1

Thu, 11 Jun 2026 21:38:58 +0800

Feedback linearization is a seemingly obvious but powerful technique in control theory that transforms a nonlinear system into a linear one through state and input feedback.

I learned this technique when I was taking MEC237 from Berkeley, and I later realized it’s actually pretty useful and one of the most universal techniques in nonlinear control.

Motivation

Let’s consider a first-order nonlinear system:

$$ \dot{x} = x^3 + u $$

Where $x$ is our internal state, and $u$ is our control input. If the system has no control, state $x$ is unstable. Intuitively, if $x$ is greater than 0, $\dot{x}$ is also greater, pushing it away from the equilibrium point, and vice versa if $x < 0$.

One caveat here is that, we can’t use Lyapunov indirect method to conclude instability, because the Jacobian matrix is 0, and nothing can be concluded from a both non-positive and non-negative Jacobian matrix eigenvalue.

How shall we use the control input to stabilize the system? Let’s consider the input $u = -x^3 - x$, where if we sub-into the original system:

$$ \dot{x} = x^3 + (-x^3 - x) = -x $$

This is now a negative feedback system with eigenvalue strictly negative, and we can therefore conclude stability.

What did we make the control input do? We use a nonlinear term $-x^3$ in the control input to cancel the original system’s unstable term, and introduce another stabilizing linear term $-x$ to ensure stability. The mechanism where we use feedback to achieve a stable linear system is called Feedback Linearization.

However, is this technique universal? The answer is no. Look at the following system:

$$ \begin{align} \begin{cases} \dot{x}_1 = a \sin x_2 \\ \dot{x}_2 = -x_1^2 + u \end{cases} \end{align} $$

In fact, any input $u$ can’t linearize both states $x_1$ and $x_2$. However, if we do a state transformation like below:

$$ \begin{align} z_1 &= x_1 \\ z_2 &= a \sin x_2 \end{align} $$

Then it’s not too hard to verify that the transformed system can actually be linearized. Thus by combining state transform and control transform, we are able to achieve feedback linearization.

Some Definitions

We now give some useful definitions to help understand the feedback linearization technique.

Control-Affine System

A system that has the following form is called a Control-Affine System:

$$ \dot{x} = f(x) + g(x)u $$

where $f(x)$ and $g(x)$ are smooth vector fields.

An example of a system that’s not control-affine is:

$$ \dot{x} = f(x) + g(x)u^2 $$

Diffeomorphism

In differential geometry, a diffeomorphism is a smooth invertible map between differentiable manifolds, whose inverse is also smooth. To express in mathematical language, such mapping $T$ satisfies $T \in C^1$ and $T^{-1} \in C^1$.

Feedback Linearizable

A nonlinear control-affine system $\Sigma: \dot{x} = f(x) + g(x)u$ is said to be feedback linearizable if there exists a control law $u = \alpha (x) + \beta (x)v$ and state transform $z = T(x)$, where $T$ is a diffeomorphism, such that the transformed system $\dot{z} = Az + Bv$ satisfies $(A, B)$ is controllable.

Lie Derivative

The Lie Derivative is an operator such that:

$$ L_f u = \frac{\partial u}{\partial x} f(x) $$

We will see how Lie derivative helps us simplify some notations later.

Input-Output Linearization

There are cases that we can’t perform full feedback linearization, but we can still achieve input-output linearization.

Consider the same system that we defined earlier:

$$ \begin{cases} \dot{x}_1 = a \sin x_2 \\ \dot{x}_2 = -x_1^2 + u \\ y = x_2 \end{cases} $$

Note that now we assign the output $y$ to be only a function of $x_2$. Now, we can perform Input-Output Linearization $ u = x_1^2 + v$ such that:

$$ y = x_2 = -x_1^2 + x_1^2 +v = v$$

from there, we manage to achieve a linear relationship between the new control law $v$ and output $y$. However, since $x_1$ is an unobservable state from $y$, we can’t tell just from $y$ whether the inner system is stable – therefore it’s possible for the inner state to explode while the output shows nothing, causing system failure.

Now with all these definitions, we would like to answer the following questions:

When is a system feedback linearizable?
If not feedback linearizable, when is the system IO linearizable?
Is there connection between IO linearization and system linearization?

Relative Degree of Output $y$

Let’s consider the following system that is a generalization of a SISO control-affine system:

$$ \begin{align} \dot{x} &= f(x) + g(x)u \\ y &= h(x) \end{align} $$

Where $f, g, h$ are sufficiently smooth.

We notice that $y$ is not a function of $u$, to the first order because there is no direct control term in $y$. Let’s try to take the derivative of $y$:

$$ \begin{align} \dot{y} &= \frac{\partial h(x)}{\partial x} \dot{x} \\ &= \frac{\partial h(x)}{\partial x} (f(x) + g(x)u) \\ &= L_f h(x) + L_g h(x) u \end{align} $$

Note that we used the Lie derivation notation.

Now assume that $L_g h(x) \neq 0$, then we have a direct term $u$ in $\dot{y}$. We can therefore make $u$ such that:

$$ u = L_g h(x) ^ {-1} (-L_f h(x) + v) $$

and so that:

$$ \dot{y} = v $$

What if $L_g h(x) = 0$? In that case, $u$ doesn’t appear in the first derivative of $y$:

$$ \dot{y} = L_f h(x) $$

But no worries, we can take another derivative operation:

$$ \ddot{y} = L_f^2 h(x) + L_gL_fh(x)u $$

Note that, $L_aL_bc(x) = L_a(L_b c(x))$. Suppose we have $L_gL_fh(x) \neq 0$, then we can again IO linearize the system as:

$$ u = L_gL_f h(x)^{-1}[-L_f^2 h(x) + v] $$

and thus:

$$ \ddot{y} = v $$

If we continue doing this, we will arrive at step $r$:

$$ y^{(r)} = L_f^rh(x) +L_gL_f^{r-1}h(x)u $$

And if we have $L_gL_f^{r-1}h(x) \neq 0$, we can make it such that:

$$ u = L_gL_f^{r-1}h(x)^{-1}[-L_f^r h(x) + v] $$

and therefore

$$ y^{r} = v $$

In this case, the IO linearized system is a $r^{th}$ order linear system.

We now give the definition of $r$:

A SISO system $\dot{x} = f(x) + g(x)u, y = h(x)$ has relative degree $r$ with respect to the output $y = h(x)$ around $x_0$ if:

$\forall 0 \le k < r-1$, $L_gL_f^kh(x) = 0$, $\forall x \in $ neighborhood of $x_0$.
$L_gL_f^{r-1}h(x) \neq 0$, $\forall x \in $ neighborhood of $x_0$. Let’s look at some examples.

$$ \begin{align} \dot{x}_1 &= x_2 \\ \dot{x}_2 &= -x_1^3 + u \\ y &= x_1 \end{align} $$

It’s obvious that the relative degree is not 0 because $u$ doesn’t show up directly in $y$. We take the first derivative of $y$:

$$ \dot{y} = x_2$$

Still no $u$. Differentiate again:

$$ \ddot{y}= -x_1^3 + u$$

Now $u$ shows up, therefore the relative degree of $y$ is 2. Note that the coefficient of $u$ is always a well-defined 1, therefore output $y$ always has relative degree of 2 anywhere in $\mathbb{R}$.

$$ \begin{align} \dot{x}_1 &= x_2 + x_3^3 \\ \dot{x}_2 &= x_3 \\ \dot{x}_3 &= u \\ y &= x_1 \end{align} $$

We differentiate $y$ twice:

$$ \ddot{y} = x_3 + 3x_3 u $$

Now we realize that $y$ doesn’t have a well-defined degree around $x_3 = 0$, and has a relative degree of 2 anywhere else.

Let’s try to apply the concept to our familiar linear system:

$$ \begin{align} \dot{x} &= Ax + Bu \\ y &= Cx \end{align} $$

If we differentiate $y$:

$$ \dot{y} = CAx + CBu $$

Now, if $CB = 0$, we’ll have to differentiate again:

$$ \ddot{y} = CA^2x + CAB u $$

continue doing this, we have relative degree $r$ if

$ CB = CAB = \ldots = CA^{r-2}B = 0$
$ CA^{r-1}B \neq 0$

Isn’t the quantity $CA^{r-1}B$ familiar? It’s a composite of the controllability matrix and observability matrix. In fact, this conclusion leads directly to the concept of Kalman decomposition.

Zero Dynamics

A fact regarding the relative degree $r$:

$r$ is always less than or equal to the order of the system $n$, and cannot be greater than $n$. If we keep differentiating without getting $u$ show up in $y$, the relative degree is usually undefined.

Now, for the IO linearized system $y^{(r)} = v$, we can choose the state vector:

$$ z = \begin{pmatrix} y \\ \dot{y} \\ \vdots \\ y^{(r-1)} \end{pmatrix} \in \mathbb{R}^r $$

Thus, we will arrive at:

$$ \dot{z} = \begin{pmatrix} 0 & 1 & 0 & \cdots & 0 \\ 0 & 0 & 1 & \cdots & 0 \\ \vdots & \vdots & \vdots & \ddots & \vdots \\ 0 & 0 & 0 & \cdots & 1 \\ 0 & 0 & 0 & \cdots & 0 \end{pmatrix} z + \begin{pmatrix} 0 \\ 0 \\ \vdots \\ 0 \\ 1 \end{pmatrix} v $$

If you have read another article of mine: Mason’s Gain Formula and Control Canonical Forms, you’ll realize system follows the controllability canonical form, thus $z$ is always controllable, given matrix $A$ is a complete Jordan block. Therefore, we can always define a feedback control mechanism

$$ v = -Kz$$

such that

$$ \dot{z} = (A - BK)z $$

is always stable, or

$$\Re(\lambda(A - BK)) < 0$$

If we convert $v$ back in terms of $x$, we will get

$$ v = -k_1 h(x) - k_2 L_f h(x) - \ldots - k_r L_f^{r-1} h(x) $$

Now that if we have $z(t) \to 0$ as $ t \to \infty$, $y \to 0$, $\dot{y} \to 0$, etc. We can guarantee the output $y$ is stable. But, how about $x$? Is the original system stable? This leads to the discussion of zero dynamics.

If we define the set $Z = {x \in \mathbb{R}^n : h(x) = \dot{h}(x) = \ldots = h^{(r-1)}(x) = 0}$, then $Z$ is called the zero dynamics of the system. It stands for the part of the system where it’s not shown explicitly on the output $y$, or it’s unobservable.

Note that the dimension of the zero dynamics set is $n - r$.

What we did for IO linearization are the following:

We construct the surface $Z$ with dimension $n - r$.
We make $Z$ attractive, i.e. we let $x$ approach the surface asymptotically.
We also make $Z$ invariant, i.e. $x$ never leaves the surface once it’s on the surface.

However, whether the dynamics on the surface is stable dictates whether the original system $x$ is stable. The dynamic on the surface is also known as the zero dynamics. Let’s take an example to illustrate the zero dynamics. Consider the following system:

$$ \begin{align} \dot{x}_1 &= x_2 \\ \dot{x}_2 &= \alpha x_3 + u \\ \dot{x}_3 &= \beta x_3 - u \\ y &= x_1 \end{align} $$

It’s easy to get relative degree of 2 for output $y$ because

$$ \ddot{y} = \alpha x_3 + u $$

Now, suppose when $t \to \infty$, both $y$ and $\dot{y}$ approach zero. What happens to the state $x$? If $y = 0, \dot{y} = 0$, then $x_1 = x_2 = 0$, and we have

$$ \dot{x}_3 = (\beta + \alpha) x_3 $$

Therefore, the zero dynamic on $x_3$ is stable if and only if $\beta + \alpha < 0$.

In fact, in this case $x_3$ is the uncontrollable state of the nonlinear system. Just like linear system theory, if the uncontrollable state is stable, the entire system can be stabilized.

We discussed IO linearization in this article. In part 2, we are going to talk about actual feedback linearization.

Steady-state Error, part 1

Thu, 30 Apr 2026 22:09:36 +0800

Before we resume talking about why adding a capacitor solves the step input problem without solving the ramp input problem, let’s review some basic knowledge from linear system.

Linear System Basics

Initial Value Theorem

For a function $f(t)$ and its Laplace transform $F(s)$, the initial value theorem states that

$$ \lim_{t \to 0} f(t) = \lim_{s \to \infty} sF(s) $$

Final Value Theorem

For a function $f(t)$ and its Laplace transform $F(s)$, the final value theorem states that

$$ \lim_{t \to \infty} f(t) = \lim_{s \to 0} sF(s) $$

Types of Inputs

There are a few basic types of inputs that we use to kickstart the system. Here is a summary table:

Input Type	Time Domain Response $f(t)$	Laplace Transform $F(s)$	Order
Impulse	$\delta(t)$	1	undefined
Step	$u(t)$	$\displaystyle\frac{1}{s}$	0
Ramp	$t \cdot u(t)$	$\displaystyle\frac{1}{s^2}$	1
Parabolic	$\displaystyle \frac{t^2}{2} \cdot u(t)$	$\displaystyle\frac{1}{s^3}$	2
…	…	…	…

Unity Feedback System

We call the following system a Unity-Feedback System, if the feedback path has a gain of 1.

System Types

For a unity feedback system, we assume the controller has a transfer function that looks like:

$$ \displaystyle H(s) = \frac{K(s-z_1)(s-z_2)\cdots(s-z_m)}{s^k(s-p_1)(s-p_2)\cdots(s-p_n)} $$

That is, the controller has $m$ zeros, $n+k$ poles where $k$ poles are at the origin. To satisfy stability, $m < n+k$ otherwise the transfer function will not be a proper transfer function.

We define the type of the system as the number of pure integrators in $H(s)$. In our definition of $H(s)$, the type is $k$.

Now, if we sub $H(s)$ into the unity feedback system, using Mason’s Gain Formula, we have the closed-loop transfer function of the overall system:

$$ \begin{align} \displaystyle H_{cl}(s) &= \frac{H(s)}{1 + H(s)} \\ &= \frac{K(s-z_1)(s-z_2)\cdots(s-z_m)}{s^k(s-p_1)(s-p_2)\cdots(s-p_n) + K(s-z_1)(s-z_2)\cdots(s-z_m)} \end{align} $$

Therefore the transfer function of the steady-state error is given by:

$$ \begin{align} E(s) &= 1 - H_{cl}(s) \\ &= \frac{1}{1 + H(s)} \\ &= \frac{s^k(s-p_1)(s-p_2)\cdots(s-p_n)}{s^k(s-p_1)(s-p_2)\cdots(s-p_n) + K(s-z_1)(s-z_2)\cdots(s-z_m)} \end{align} $$

Now, let’s say if our input is type-N:

$$ \begin{align} F(s) &= \frac{1}{s^{N+1}} \end{align} $$

Then the resulting steady-state error is going to be:

$$ \begin{align} e(s) &= E(s) \cdot F(s) \\ &= \frac{s^{k-N-1}(s-p_1)(s-p_2)\cdots(s-p_n)}{s^k(s-p_1)(s-p_2)\cdots(s-p_n) + K(s-z_1)(s-z_2)\cdots(s-z_m)} \end{align} $$

If we apply final value theorem, we get

$$ \begin{align} \lim_{t \to \infty} e(t) &= \lim_{s \to 0} s \cdot e(s) &=\frac{s^{k-N}(s-p_1)(s-p_2)\cdots(s-p_n)}{s^k(s-p_1)(s-p_2)\cdots(s-p_n) + K(s-z_1)(s-z_2)\cdots(s-z_m)} \end{align} $$

It’s not too hard to see that the term of interest is $s^{k-N}$. We conclude therefore:

$$ \begin{cases} \lim_{t \to \infty} e(t) = 0 & \text{if } k > N \\ \lim_{t \to \infty} e(t) = \frac{1}{K} & \text{if } k = N \\ \lim_{t \to \infty} e(t) = \infty & \text{if } k < N \end{cases} $$

This is a pretty elegant result, in other words, the convergence of steady state error follows such conditions:

If the system type (k) is greater than the input type (N), the steady-state error is zero.
If the system type (k) is smaller than the input type (N), the steady-state error will diverge to infinity.
If the system type and input type are the same, the steady state error will converge to a non-zero number, which will be a finite fraction of the input, depending on the DC gain of the system.

Another intuitive way to look into this is that, if our input is of a higher system and the system itself can’t generate fast enough response, the system will always fall behind the input, and vice versa.

The problems with higher order system

Now if we take the original system we discussed last time when we added a capacitor, we realize that adding that capacitor helped us to increase the system type, and thus we are able to track the input better.

Here is the further question: what if the input is of type 2? Based off our discussion just now, we might just want to add another integrator so as to further increase the system type, like below:

This makes plausible sense; however this system will unfortunately fail. Why? Let’s try to simplify the system by assuming $g_m = 1$ and $C=1$. If we perturb our system by a step input, according to our discussion above, the system should be able to track it without problem. First, the open loop gain is given by:

$$ H_{ol} = \frac{1}{s^2} $$

Now, if we closed the loop and calculate the closed loop gain:

$$ H_{cl} = \frac{1}{1 + H_{ol}(s)} = \frac{s^2}{1+s^2} $$

With a step input, the output has a frequency domain representation of:

$$ V_{out}(s) = H_{cl} \cdot \frac{1}{s} = \frac{s}{1+s^2} $$

Now if we perform inverse laplace transform of the s-domain representation, we will get:

$$ V_{out}(t) = \sin t$$

That is to say, we increased the system type and we wished for the steady state error to converge faster, however the system is not even able to track a type-1 input, but start oscillating. What’s the problem here?

The reason is that by introducing another pole, we introduced 90 degrees more input phase, and thus the effective phase margin of the system is 0. From another angle, we can apply Barkhausen stability criterion and realize that the system automatically satisfies that criterion, and immediately realize that the system is oscillatory.

The fix is to introduce damping to either of the integrator to produce a zero in the forward gain, thus making the phase margin positive.

The derivation is left to our audience if you would like to give it a try.

For the next part, we will further expand the damping concept and introduce a few compensation techniques we can use: lead, lag, and lead-lag compensation to improve the system response.

Mason's Gain Formula and Control Canonical Forms

Thu, 23 Apr 2026 00:28:37 +0800

Introduction

Pop quiz: what’s the transfer function $H(s) = V_{out} / V_{in}$ in the following circuit?

Assumptions:

$C_c$ is a big coupling capacitor.
No channel-length modulation.
You don’t have to solve for DC, all small signal parameters are given. Don’t assume unspecified parameters, for example $r_o$, $C_g$, etc.
The circuit is linear.

OK this circuit does look a bit intimidating. For entry-level analog circuit class takers, they might take out pencil to work through the analysis, but it’s super tedious, time consuming and error-prone.

Look at the circuit, what is the main reason that makes analysis difficult? Feedback. Not just one feedback path, there are two feedbacks from each of the two stages rendering the overall analysis not so straightforward. However, we are going to introduce a very elegant mathematical tool to deal with all these kinds of closed-loop structures.

Mason’s Gain Formula

Samuel Jefferson Mason was born in 1921. As a distinguished electronics engineer, his most famous scientific contributions are Mason’s invariant and Mason’s rule, or Mason’s gain formula, both named after him.

Mason’s gain formula is used to find the transfer function of a closed-loop system. A closed loop system doesn’t need to contain only one loop; it could contain multiple loops, and they can even interact with each other. Conventional algebraic way to find the transfer function usually requires solving complex simultaneous systems, but Mason’s gain formula provides an easy way to find it.

Mason’s gain formula is particularly suitable for a system that can be described using a Signal Flow Graph.

Signal Flow Graph (SFG)

A Signal Flow Graph (SFG) is a graphical representation of a system. As the name suggests, an SFG is a directed graph, meaning it has the following components:

Node: a node is a vertex that represents a variable in a system.
Branch: a branch is a directed edge that represents a transfer function between two nodes. It has a linear gain. If the gain is 1, we don’t annotate it on the SFG.
Input/Output: Input / Output nodes are special nodes where we use to denote the transfer function’s departure and arrival points.
Addition: Two signals could be added together, given SFG is targeting linear systems.

Now, with these simple definitions, we are able to construct more complex notations + structures:

Path: a path is a sequence of branches that connect nodes in the graph, such that no node is visited more than once.
- Forward Path: a forward path is a path from the input node to the output node.
- Path Gain: the product of the gains of all branches in a path.
Loop: a loop is a path that starts and ends at the same node. A loop is a specific type of path.
- Loop Gain: the product of the gains of all branches in a loop.

Example: Type 2 PLL

Shown below is a type-2 PLL:

We are able to see 5 paths here and a simple loop. We defined 4 nodes, with 1 input node and 1 output node.

The Formula

Mason’s Gain Formula states the following:

Mason’s Gain Formula:
$$H(s) = \frac{\sum_{k=1}^{N} P_k \Delta_k}{\Delta}$$
where:

$N$ is the number of forward paths from input to output

$P_k$ is the path gain of the $k$-th forward path

$\Delta$ is the determinant of the system: $\Delta = 1 - \sum L_i + \sum L_i L_j - \sum L_i L_j L_k + \ldots$

$\sum L_i$ is the sum of all individual loop gains

$\sum L_i L_j$ is the sum of products of all pairs of non-touching loops

$\sum L_i L_j L_k$ is the sum of products of all triplets of non-touching loops

and so on…

$\Delta_k$ is the cofactor of the $k$-th forward path, obtained by removing all loops that touch the $k$-th path from $\Delta$

Examples

A Type 2 Charge Pump PLL

Let’s take the type-2 PLL system shown above for example. In this example, there is 1 single loop, and only 1 forward path from input to output. Therefore, $N=1$, and:

$$\begin{align} \Delta &= 1 - L_1 = 1 + K_{PFD}I_{CP}(R + 1/sC)K_{VCO}/s / N \\ \Delta_1 &= 1 \\ \displaystyle \Sigma_{k=1}^{N}P_k \Delta_k &= K_{PFD}I_{CP}(R + 1/sC)K_{VCO}/s \end{align}$$

Bear in mind that $\Delta_1 = 1$ because there is only one loop, and it does touch the forward path, therefore we remove the only contributing loop gain from $\Delta$.

Thus, combining the terms together, we have the expression for the closed loop gain:

$$ H(s) = \frac{K_{PFD}I_{CP}(R + 1/sC)K_{VCO}/s}{1 + K_{PFD}I_{CP}(R + 1/sC)K_{VCO}/s / N} $$

We realize that the loop gain is large when frequency is low, the loop gain dominates and $H(s) = N$, meaning the low-frequency phase noise of the PLL will be the reference times $N^2$. At high frequency, the loop gain dies out and $H(s) = 0$. Therefore the reference to output phase transfer function is a low-pass filter.

A Triple Integrator System

Now, let’s compute the transfer function of the SISO system below:

We notice that there are 3 loops and 3 forwarded paths. Luckily, they all touch each other, which makes our calculation very simple.

$$ \begin{cases} \Delta = 1 + a_1/s + a_2/s^2 + a_3/s^3 \\ p_1 = b1/s \\ \Delta_1 = 1 \\ p_2 = b2/s^2 \\ \Delta_2 = 1 \\ p_3 = b3/s^3 \\ \Delta_3 = 1 \end{cases} $$

Combining all the terms, we have

$$ \begin{align} H(s) &= \frac{b_1/s + b_2/s^2 + b_3/s^3}{1 + a_1/s + a_2/s^2 + a_3/s^3} \\ &= \frac{b_1 s^2 + b_2 s + b_3}{s^3 + a_1 s^2 + a_2 s + a_3} \end{align} $$

Canonical Forms

Doesn’t the last example have a very regular transfer function? This is actually intended.

In a control system modeled in time domain, we have our system defined using state-space model:

$$ \begin{align} \dot{x}(t) &= Ax(t) + Bu(t) \\ y(t) &= Cx(t) \end{align} $$

and we know that, if we perform Laplace transform, while assuming a 0 initial condition, we have

$$ sX(s) = AX(s) + BU(s) $$

Therefore, by algebraic manipulation, we have

$$ Y(s)/U(s) = C(sI - A)^{-1}B $$

Which is the transformation between state space model to transfer functions.

Now, there could be only one transfer function for a state space model, but there could be infinite state space models for one simple transfer function. The general rule of thumb is that, the number of poles in a transfer function regulates the number of states in the corresponding state space model, because we need that many number of integrators. However, we could create more states (but those come with either constraints, or are redundant, meaning linearly independent of the pre-existing states).

There are some state space models that are different from generic ones, if we generate from a transfer function. Here are some of them:

Controllable Canonical Form

We already encounter the controllable canonical form in the previous example.

Controllable canonical form is a specific type of form because the generated state space model is always controllable. The state space model is given by:

$$ \begin{align} \frac{d}{dt}X &= \begin{pmatrix} 0 & 1 & 0 & 0 & \ldots & 0 & 0 \\ 0 & 0 & 1 & 0 & \ldots & 0 & 0 \\ \vdots & \vdots & \vdots & \vdots & \ddots & \vdots & \vdots \\ 0 & 0 & 0 & 0 & \ldots & 1 & 0 \\ -a_0 & -a_1 & -a_2 & -a_3 & \ldots & -a_{n-1} & -a_n \end{pmatrix}X + \begin{pmatrix} 0 \\ 0 \\ \vdots \\ 0 \\ 1 \end{pmatrix}U \\ Y &= \begin{pmatrix} b_0 & b_1 & b_2 & \ldots & b_n \end{pmatrix}X \end{align} $$

According to Rudolf Kalman, the controllability matrix of the controllability canonical form is always going to be full rank. That’s why we call it controllable canonical form.

Observable Canonical Form

Observability, the dual of controllability, also has its canonical form. Its state space model representation is given by:

$$ \begin{align} \frac{d}{dt}X &= \begin{pmatrix} 0 & 0 & 0 & \ldots & 0 & -a_0 \\ 1 & 0 & 0 & \ldots & 0 & -a_1 \\ 0 & 1 & 0 & \ldots & 0 & -a_2 \\ 0 & 0 & 1 & \ldots & 0 & -a_3 \\ \vdots & \vdots & \vdots & \ddots & \vdots & \vdots \\ 0 & 0 & 0 & \ldots & 1 & -a_{n-1} \end{pmatrix}X + \begin{pmatrix} b_0 \\ b_1 \\ b_2 \\ b_3 \\ \vdots \\ b_{n-1} \end{pmatrix}U \\ Y &= \begin{pmatrix} 0 & 0 & 0 & \ldots & 0 & 1 \end{pmatrix}X \end{align} $$

If you take a closer look, you’ll notice that the observable canonical form is precisely the transpose of the controllable canonical form: $A_o = A_c^T$, $B_o = C_c^T$, and $C_o = B_c^T$. This is not a coincidence — it is a direct manifestation of the duality between controllability and observability. Taking the transpose of a state space realization preserves the transfer function, since

$$ H(s) = C(sI - A)^{-1}B = \left[ B^T (sI - A^T)^{-1} C^T \right]^T $$

and the transfer function is a scalar for SISO systems, so the transpose is itself.

By the dual of Kalman’s argument, the observability matrix of the observable canonical form is always full rank, which is why we call it the observable canonical form. Notice as well that, unlike the controllable form where the input coefficients $b_i$ are placed in the output matrix $C$, here they show up directly in the input matrix $B$. Each state $x_i$ accumulates a weighted contribution from $u(t)$ and feeds back through $-a_i$ to drive only the last state, which is then read out at the output. Reading the SFG above from right to left makes the structure obvious: it is the controllable canonical form with all arrows reversed.

Diagonal Form and Jordan Form

The controllable and observable canonical forms are built around the coefficients of the polynomials $a_i$ and $b_i$. The Jordan form takes a different approach: instead of starting from the polynomial coefficients, we start from the poles of the transfer function. Performing partial fraction decomposition,

$$ H(s) = \frac{b_{n-1}s^{n-1} + \ldots + b_1 s + b_0}{(s - p_1)(s - p_2) \ldots (s - p_n)} = \sum_{i=1}^{n} \frac{r_i}{s - p_i} $$

where $p_i$ are the poles and $r_i$ are the residues. Each first-order term $r_i/(s - p_i)$ corresponds to a single integrator with self-feedback $p_i$ and a gain $r_i$ at the output. The SFG above is exactly that — $n$ parallel branches, each with its own pole, all summed at the output.

Stacking these parallel branches into a state space model gives the diagonal Jordan form (assuming distinct poles):

$$ \begin{align} \frac{d}{dt}X &= \begin{pmatrix} p_1 & 0 & 0 & \ldots & 0 \\ 0 & p_2 & 0 & \ldots & 0 \\ 0 & 0 & p_3 & \ldots & 0 \\ \vdots & \vdots & \vdots & \ddots & \vdots \\ 0 & 0 & 0 & \ldots & p_n \end{pmatrix}X + \begin{pmatrix} 1 \\ 1 \\ 1 \\ \vdots \\ 1 \end{pmatrix}U \\ Y &= \begin{pmatrix} r_1 & r_2 & r_3 & \ldots & r_n \end{pmatrix}X \end{align} $$

Because $A$ is diagonal, the states are completely decoupled — each $x_i$ evolves independently as $\dot{x}_i = p_i x_i + u$, and the output is just a weighted sum of these modes. This makes the Jordan form particularly useful for analysis: the eigenvalues of $A$ are read off the diagonal, so stability is immediate (all $\text{Re}(p_i) < 0$), and each mode’s contribution to the output is exactly $r_i$.

When poles are repeated, $A$ is no longer fully diagonalizable. For a pole $\lambda$ with multiplicity $k$, the corresponding diagonal block becomes a Jordan block:

$$ J_k(\lambda) = \begin{pmatrix} \lambda & 1 & 0 & \ldots & 0 \\ 0 & \lambda & 1 & \ldots & 0 \\ \vdots & \vdots & \ddots & \ddots & \vdots \\ 0 & 0 & 0 & \lambda & 1 \\ 0 & 0 & 0 & 0 & \lambda \end{pmatrix} $$

The superdiagonal of 1’s couples adjacent states within the block, which corresponds to terms of the form $r_{i,j}/(s - \lambda)^j$ in the partial fraction expansion. The overall $A$ matrix is still block-diagonal, with one Jordan block per distinct pole.

Unlike the controllable and observable canonical forms — which are guaranteed to be controllable / observable by construction — the Jordan form is only controllable and observable when all residues $r_i$ are nonzero and all poles are distinct. A zero residue corresponds to a pole-zero cancellation in $H(s)$, which means a mode that is either uncontrollable or unobservable (or both). In that sense, the Jordan form is the most honest of the three: it makes hidden modes visible rather than burying them in the structure.

Modified Form

It becomes even trickier if the original system’s poles are not on the real axis, but contains complex conjugate pairs. In this case, we can either sub in diagonal form with the complex entries, or we use what’s called the “modified form”.

The modified Jordan form (also known as the real Jordan form) keeps the state space model entirely real-valued by replacing each pair of complex conjugate poles $p_i = \sigma \pm j\omega$ with a single $2 \times 2$ real block on the diagonal of $A$:

$$ \begin{pmatrix} \sigma + j\omega & 0 \\ 0 & \sigma - j\omega \end{pmatrix} \quad \longrightarrow \quad \begin{pmatrix} \sigma & \omega \\ -\omega & \sigma \end{pmatrix} $$

This block is similar to the diagonal complex form via the change of basis

$$ T = \frac{1}{\sqrt{2}}\begin{pmatrix} 1 & 1 \\ j & -j \end{pmatrix}, $$

so the transfer function is preserved. Concretely, for a system whose poles consist of $m$ real poles $\lambda_1, \ldots, \lambda_m$ and $\ell$ complex conjugate pairs $\sigma_k \pm j\omega_k$, the modified form is

$$ A = \begin{pmatrix} \lambda_1 & & & & & & \\ & \ddots & & & & & \\ & & \lambda_m & & & & \\ & & & \sigma_1 & \omega_1 & & \\ & & & -\omega_1 & \sigma_1 & & \\ & & & & & \ddots & \\ & & & & & & \begin{smallmatrix} \sigma_\ell & \omega_\ell \\ -\omega_\ell & \sigma_\ell \end{smallmatrix} \end{pmatrix} $$

The eigenvalues of each $2 \times 2$ block are exactly $\sigma_k \pm j\omega_k$, so spectral information is unchanged — we have only traded a complex diagonal representation for a real block-diagonal one. This is the form most software packages return by default, since real arithmetic is cheaper and avoids the bookkeeping of conjugate pairs. Repeated complex poles generalize to real Jordan blocks by replacing each scalar entry of the complex Jordan block with the corresponding $2 \times 2$ real block, and each superdiagonal $1$ with a $2 \times 2$ identity.

Where were we?

We talked a lot about the famous Mason’s gain formula, and several canonical forms that we can use to represent linear time-invariant systems. Now going back to the original question: what’s the closed loop gain of the original system?

Well, we do realize that the system had two loops, but they do touch; and the system has only one forward path. That makes our computation significantly easier.

Let’s use some notations here. Let’s denote the forward gain as $F_1, F_2$ and the loop gains as $L_1, L_2$. From basic analog circuit theory, we have:

$$ \begin{align} F_1 &= -\frac{g_{m1}R_{d_1}}{1 + g_{m1}R_{d_1}||1/sC_1} \\ F_2 &= \frac{g_{m2}R_{s2}}{1 + g_{m2}R_{s2}} \\ L_1 &= (G_m R_{gate} || R_{g1} || R_{g2} F_1) \\ L_2 &= (-g_{mp} R_{gate} || R_{g1} || R_{g2} ) F_1 F_2 \end{align} $$

And finally, our closed loop gain:

$$ \frac{V_{out}}{V_{in}}(s) = \frac{F_1F_2}{1 + L_1 + L_2} $$

This looks a lot faster compared to manually breaking down all expressions.

I would like to point out at the end of this article that the original circuit is unstable, because both loop gains are positive. Unless the loop determinant is strictly positive in real part, the circuit will be unstable.

Steady-state Error, part 0

Thu, 16 Apr 2026 22:08:25 +0800

Introduction

I started to have the very first question regarding “steady-state error” when I was a sophomore. I still recall the first class in EE2002: Analog Electronics when the professor introduced the very first Operational Amplifier, and here is what he said:

“An Operational Amplifier, or an OpAmp, is a circuit that has infinite gain, infinite input impedance, and 0 output impedance.”

I was very confused back then. Out from all the 3 properties, the most counterintuitive one was the “infinite gain” property. I can still understand the engineering approximation of infinite input impedance due to CMOS nature, and 0 output impedance if you treat the output as a current source, but infinite gain doesn’t make any sense. For the next few classes, I learned that infinite gain of the OpAmp allows feedback structure to kick in, so that it can provide a gain per the feedback impedance structure.

However, I feel that something is off here but I can’t quite put my finger on it. I didn’t quite get the correct terminology until I turned into a junior when I was taking EE3019: Integrated Electronics when we were introduced feedback in a more well-defined way, and I become aware that there is a term called “steady-state error” that defines the difference between the desired settling value versus the actual settled value. I feel like, yes, this is the correct word I’ve been looking for.

I came across this once again when I started my PhD and started to design what’s called “Phase-Locked Loops (PLLs)”, a specific feedback structure that’s used to amplify a clean clock. I came across two different terms now: Type-1 and Type-2 PLLs. (I doubt whether a lot of PLL designers can actually tell the difference between the two). Interestingly enough, the textbook, nor the slides talk anything about why it’s named type-1 or type-2, as if it’s just a naming convention.

I wasn’t 100% clear on this matter until I took MEC237: linear control where I was introduced the book: Control System Engineering by Norman Nise, and looking into the book actually helped me understand the entire steady-state error theory.

The Problem Setup

Let’s go back and give the problem intuition. Suppose we have an OpAmp, and we would like to use it as a voltage follower, so we configure it like it’s a unit buffer:

Elementary analog circuit professor will tell you that because an ideal OpAmp has infinite gain, and it will always make sure both inputs are equal to each other. WRONG. There are at least two very hand-wavy explanation here:

The assumption of infinite gain is an idealization that doesn’t hold in practice.
The infinite gain assumption also doesn’t explain why it will make two inputs equal to each other.

In reality, the OpAmp’s gain depends on the transconductance gain $g_m$ and the loading impedance $R$, and we define our open loop gain to be

$$ A = g_m R$$

For the OpAmp connected in the abovementioned way, the small signal relationship between the input and output is given by:

$$ V_{out} = A(V_+ - V_-) $$

where $V_+$ and $V_-$ are the voltages at the non-inverting and inverting inputs, respectively.

Now we try to use that relationship to analyze the behavior of the unit buffer. We realize that here we have $V_- = V_{out}$, so solving the equation we have:

$$ V_{out} = A(V_{in} - V_{out}) \\ V_{out} = \frac{A}{1+A} V_{in} $$

Now, if we find the transfer function from input to output, we identify

$$ H(s) = \frac{V_{out}(s)}{V_{in}(s)} = \frac{A}{1+A} \neq 1 $$

which basically means that the output is going to be just slightly smaller compared to the input.

The reason we would like to make our amplifier to be infinite gain is that, if $A \to \infty$, we easily have

$$ H(s) = \lim_{A \to \infty} \frac{A}{1+A} = 1 $$

which means that the output will approach the input as the gain approaches infinity; otherwise there will be an error term between the real output versus the desired output, which is given by

$$ \begin{align} \Delta V &= V_{out,desired} - V_{out,real} \\ &= V_{in} - V_{out,real} \\ &= V_{in} - \frac{A}{1+A} V_{in} \\ &= \frac{1}{1+A} V_{in} \end{align} $$

We can conclude two things from this expression:

The error is inversely proportional to $(1+A)$. The larger the gain is, the smaller the error is. However, if the gain is finite, no matter what non-zero input we see, the output can never achieve the desired value.
The larger the input voltage is, the larger the error is.

We call the error value between the desired and actual output the steady-state error. This phenomenon happens in closed-loop systems where we would like to control a control plant to approach a value that we want, in this example, the OpAmp is both a controller, a control plant and a detector.

Now, the question becomes if we are able to reduce this error at all.

The Temporary Elixir: A Capacitor

The fix is surprisingly simple. We replace the resistor with a pure capacitor:

Let’s do a time-domain calculation here first. Let’s assume the output current of the $g_m$ cell is defined as $i$, then we have the relationship between $V_{out}$ and $i$:

$$ \begin{align} i &= g_m (V_{in} - V_{out}) \\ i &= C\frac{dV_{out}}{dt} \end{align} $$

This ODE is not too hard to solve by hand. Assume a 0 initial condition on $V_{out}$, we have

$$ V_{out} = V_{in} (1 - e^{-\frac{g_m t}{C}}) $$

The assumption that we had initial condition 0 is equivalent to say that, if we provide a step response at the input, the output looks like an exponential decay curve that gradually goes to the input voltage. If we let $t \to \infty$, then we can easily see that

$$ \lim_{t \to \infty} V_{out} = V_{in} $$

Nothing too fancy here; however, let’s move one step further; how about it if my input is not a step function, but a ramp function?

Ramp Response

Same circuit, but now my $V_{in} = t$. What happens to $V_{out}$? The ODE now becomes:

$$ \begin{align} V_{in}(t) &= t \\ i &= g_m (V_{in}(t) - V_{out}) \\ i &= C\frac{dV_{out}}{dt} \end{align} $$

Again, we can solve the ODE system by direct integration. This gives us:

$$ V_{out}(t) = t - \tau + \tau e^{-\frac{t}{\tau}} $$

where $\tau = \frac{C}{g_m}$.

Interestingly, if we now consider the steady-state error as a function of time, we have

$$ \begin{align} e(t) &= V_{in}(t) - V_{out}(t) \\ &= t - (t - \tau + \tau e^{-\frac{t}{\tau}}) \\ &= \tau - \tau e^{-\frac{t}{\tau}} \\ &= \tau(1 - e^{-\frac{t}{\tau}}) \end{align} $$

As $t \to \infty$, unfortunately the error doesn’t die out, which would be what we have seen for a step response case. We show the plotting here as well:

In fact, we will see in later parts of this series, step input is called a “type-0” input, and ramp input is called a “type-1” input. The original R-loaded OpAmp is called a “type-0” system, and the C-loaded improved OpAmp is called a “type-1” system.

In the next section, we will introduce some very useful mathematics tool to help us analyze the system without solving the ODE every single time.