Mathematics on 四方喫茶舘

Feedback Linearization, Part 3

Sun, 14 Jun 2026 14:51:26 +0800

Feedback Linearization Theorem

We talked about feedback linearization theorem last time. As a recap:

Feedback Linearization Theorem: Nonlinear system $\Sigma: \dot{x} = f(x) + g(x)u$ is feedback linearizable if:

$[g(x), ad_fg(x), \ldots, ad_f^{n-1}g(x)]$ has rank $n$ $\forall x$.
$\Delta = \text{span}{g(x), ad_fg(x), \ldots, ad_f^{n-2}g(x)}$ is involutive.

The first condition guarantees controllability, while the second condition guarantees that we can always find an output $y = h(x)$ that has relative degree equal to the system degree, according to Frobenius theorem. Actually, this is also just observability.

We now look at some examples.

Consider the system

$$ \dot{x} = \begin{pmatrix} a \sin x_2 \\ -x_1^2 \end{pmatrix} + \begin{pmatrix}0 \\ 1 \end{pmatrix}u $$

We would like to ask 2 questions:

Is the system feedback linearizable?
If so, how shall we find the output $y = h(x)$?

To answer the first question, we first validate if the first condition is met from feedback linearization theorem.

$$ g(x) = \begin{pmatrix}0 \\ 1 \end{pmatrix} $$

$$ad_fg = [f, g] = \begin{pmatrix} -a \cos x_2 \\ 0 \end{pmatrix}$$

Therefore,

$$ [g(x), ad_fg(x)] = \begin{pmatrix}0 & -a \cos x_2 \\ 1 & 0 \end{pmatrix} $$

This new matrix is always rank 2, for all $x$, except when $\cos x_2 = 0$. The distribution $\Delta = \text{span}{g(x) }$ has only one element, so it’s trivially involutive. Therefore we conclude the system is feedback linearizable.

Now, how shall we find the output $y$? We would like to find an output $y = h(x)$ such that it has relative degree of $2$, i.e.:

$$\begin{cases} \begin{align} \frac{\partial h}{\partial x} g(x) &= 0 \\ \frac{\partial L_f h}{\partial x} g(x) &\neq 0 \end{align} \end{cases} $$

The first PDE will give us

$$ \frac{\partial h}{\partial x_2} = 0 $$

meaning $h(x)$ shall be independent of $x_2$. We sub this fact into the second PDE:

$$ \frac{\partial L_f h}{\partial x} g(x) = \frac{\partial L_fh}{\partial x_2} = \frac{\partial h}{\partial x_1}a \cos x_2 \neq 0$$

Therefore, we can pick a few candidate $h(x)$, for example: $x_1$, $x_1^5$, and so forth. If we pick $h(x) = x_1$, then we can linearize the system as

$$ \ddot{y} = v$$

where the state and control transform is given by

$$ \begin{cases} \begin{align} y &= x_1 \\ \dot{y} &= a\sin x_2 \\ u &= (x_1^2 + v) \frac{1}{a\cos x_2} \end{align} \end{cases} $$

The audience is encouraged to verify the linearization by substituting the transforms back into the original system.

As a result, we are able to design a linear control between $y$ and $v$ by LQR or pole placement, and we utilize the state and control transform to convert the system back into the original nonlinear system.

MIMO Feedback Linearization

We now move forward to a more complex and generalized system: the multi-input multi-output nonlinear system. For the sake of simplicity, we limit the MIMO to be the square case (meaning we have the same number of inputs and outputs).

If we have a square MIMO system that looks like:

$$ \begin{align} \displaystyle \Sigma: \dot{x} &= f(x) + \Sigma_{i=1}^n g_i(x) u_i \quad x \in \mathbb{R}^n \\ &= g(x)u \\ y &= \begin{pmatrix} h_1(x) \\ \vdots \\ h_n(x) \end{pmatrix} \end{align} $$

where

$$ \begin{align} g(x) &= \begin{pmatrix} g_1(x) & \cdots & g_n(x) \end{pmatrix} \\ u &= \begin{pmatrix} u_1 \\ \vdots \\ u_n\end{pmatrix} \end{align} $$

The question is now, how shall we define the relative degree of the MIMO system?

Vector Relative Degree

We introduce the concept of vector relative degree in this case. (Definition) Vector Relative Degree: Nonlinear system $\Sigma$ has relative degree $(r_1, r_2, \ldots, r_n)$ at $x_0$ if:

For all $1 \le j \le n, 1 \le i \le n, 0 \le k \le r_i - 2$, $$ L_{g_j}L_f^kh_i = 0, \quad \forall x \text{ in a neighborhood of } x_0 $$
The $n \times n$ matrix, also known as the Decoupling Matrix, $$ A(x) = \begin{pmatrix} L_{g_1}L_f^{r_1-1}h_1 & \cdots & L_{g_n}L_f^{r_n-1}h_n \\ \vdots & \cdots & \vdots \\ L_{g_1}L_f^{r_n-1}h_1 & \cdots & L_{g_n}L_f^{r_n-1}h_n \end{pmatrix} $$ Then, for the i-th output, we can always express it in terms of $$ \begin{align} y_i^{(r_i)} &= L_f^{r_i} h_i(x)+ L_{g_1}L_f^{r_i-1}h_i(x)u_1 + \cdots + L_{g_n}L_f^{r_i-1}h_i(x)u_n \\ &= L_f^{r_i} h_i(x) + \displaystyle \Sigma_j L_{g_j}L_f^{r_i-1}h_i(x)u_j \end{align} $$

If, at least one $L_{g_j}L_f^{r_i-1}h_i(x)$ is non-zero, then the system is feedback linearizable. Therefore, we can also do IO linearization:

$$ \begin{align} \begin{pmatrix} y_1^{(r_1)} \\ \vdots \\ y_n^{(r_n)} \end{pmatrix}&=\begin{pmatrix} L_f^{r_1}h_1(x) \\ \vdots \\ L_f^{r_n}h_n \end{pmatrix} + \begin{pmatrix} L_f^{r_1} h_1(x) & \cdots & L_{g_n}L_f^{r_1-1}h_1(x) \\ \vdots & \cdots & \vdots \\ L_f^{r_n} h_n(x) & \cdots & L_{g_n}L_f^{r_n-1}h_n(x) \end{pmatrix} \begin{pmatrix} u_1 \\ \vdots \\ u_n \end{pmatrix} \\ &= L_fh(x) + A(x) u \end{align} $$

where, notice that we implicitly extended the definition of Lie derivative to its vector form. And the control can be transformed as:

$$ u(x) = A^{-1}(x)(L_fh(x)+ v) \rightarrow \begin{pmatrix} y_1^{(r_1)} \\ \vdots \\ y_n^{(r_n)} \end{pmatrix} = v $$

MIMO Feedback Linearization Theorem

Now we state the feedback linearization theorem in MIMO form: Theorem(MIMO Feedback Linearization): A MIMO nonlinear system $\Sigma$ is:

feedback linearizable, if its vector relative degree $r = (r_1, r_2, \ldots, r_n)$ satisfies such that $$ r_1 + \ldots + r_n = \displaystyle \Sigma_{i=1}^n r_i \ge n$$
If the sum $$ r_1 + \ldots + r_n = \displaystyle \Sigma_{i=1}^n r_i < n$$ , then the system can only be IO linearizable, where we have to rely on the internal zero dynamic to be also stable in order for the full system to be stable.

Examples

Consider a motion of a wheeled vehicle moving in a horizontal plane. The kinematics of the vehicle are given by the differential equations:

$$ \begin{align} \dot{x} &= V \cos \theta \\ \dot{y} &= V \sin \theta \\ \dot{\theta} &= \omega \end{align} $$

Here $(x, y)$ is the location in the horizontal 2D plane, $V$ is the vehicle speed, and $\theta$ denotes the vehicle heading angle, and $\omega$ denotes the vehicle turning rate.

Ill-defined Vector Relative Degree

If we consider the vehicle speed $V$ and the vehicle turning range $\omega$ as two control inputs, and the vehicle locations in the plane as two outputs, the vector relative degree is not well-defined. We notice the system now looks like:

$$ \begin{align} \frac{d}{dt}\begin{pmatrix} x \\ y \\ \theta \end{pmatrix} &= \begin{pmatrix}u_1 \cos \theta \\ u_1 \sin \theta \\ u_2 \end{pmatrix} \\ \begin{pmatrix} y_1 \\ y_2 \end{pmatrix} &= \begin{pmatrix} x \\ y \end{pmatrix} \end{align} $$

If we take the first time derivative of the outputs

$$ \begin{align} \frac{d}{dt}\begin{pmatrix} y_1 \\ y_2 \end{pmatrix} &= \frac{d}{dt}\begin{pmatrix} x \\ y \end{pmatrix} = \begin{pmatrix} u_1 \cos \theta \\ u_1 \sin \theta \end{pmatrix} \end{align} $$

Only the first control input shows up, which is a red flag. If we consider the coupling matrix

$$ A(x) = \begin{pmatrix} \cos \theta & 0 \\ \sin \theta & 0 \end{pmatrix} $$

is actually singular. Therefore the relative degree in this case is not well defined.

Well-defined Vector Relative Degree

If we now consider the vehicle acceleration and the vehicle turning rate as the two control inputs, and we still use the vehicle position as the two outputs, this time the vehicle relative degree is actually well-defined, so long as $V > 0$ for this 4-th order nonlinear system. We have the original system expressed as:

$$ \begin{align} \frac{d}{dt} \begin{pmatrix} x \\ y \\ \theta \\ V \end{pmatrix} &= \begin{pmatrix} V \cos \theta \\ V \sin \theta \\ u_2 \\ u_1 \end{pmatrix} \\ \begin{pmatrix} y_1 \\ y_2 \end{pmatrix} &= \begin{pmatrix} x \\ y \end{pmatrix} \end{align} $$

If we take the first time derivative of the output vector:

$$ \frac{d}{dt}\begin{pmatrix} y_1 \\ y_2 \end{pmatrix} = \begin{pmatrix} V \cos \theta \\ V \sin \theta \end{pmatrix} $$

We realize that both inputs don’t explicitly show up, therefore we take another round of differentiation:

$$ \frac{d^2}{dt^2}\begin{pmatrix} y_1 \\ y_2 \end{pmatrix} = \begin{pmatrix} u_1 \cos \theta - u_2 V \sin \theta \\ u_1 \sin \theta + u_2 V \cos \theta \end{pmatrix} $$

Now, both inputs show up which is a good sign. We verify this by considering the decoupling matrix:

$$ A(x) = \begin{pmatrix} \cos \theta & -V \sin \theta \\ \sin \theta & V \cos \theta \end{pmatrix} $$

And the determinant is given by

$$ \text{det}(A(x)) = V $$

We now realize that the decoupling matrix is non-singular, as long as the speed is non-zero. Therefore the vector relative degree is well-defined.

Control Law

Given that $r_1 = r_2 = 2$ in this case, and we satisfy $r_1 + r_2 = 4 = n$, we can find a state transformation and a control transformation so that the original system can be feedback linearized. We consider the control transformation as

$$ \begin{pmatrix} \xi_1 = y_1 = x\\ \xi_2 = y_2 = y\\ \xi_3 = \dot{y_1} \\ \xi_4 = \dot{y_2} \end{pmatrix} $$

And we can derive the control transformation as

$$ \begin{pmatrix} v_1 = u_1 \cos \theta - u_2 V \sin \theta \\ v_2 = u_2 \sin \theta + u_2 V \cos \theta \end{pmatrix} $$

and the resulting system now looks like

$$ \begin{align} \begin{cases} \dot{\xi_1} &= \xi_3 \\ \dot{\xi_2} &= \xi_4 \\ \dot{\xi_3} &= v_1 \\ \dot{\xi_4} &= v_2 \end{cases} \end{align} $$

Which is happily a double integrator system, thus can be controlled (pole placed, or LQRed) to be stable.

Feedback Linearization, Part 2

Fri, 12 Jun 2026 19:33:35 +0800

More Facts about IO Linearization

We are now aware of how to perform input-output linearization. To summarize:

For an output with relative degree $r$, we are able to construct a feedback linearization mapping such that the input-output linearized system is of order $r$.
The remaining state will construct a “zero plane” $Z$ where the zero dynamics on the plane will determine the stability of the overall system.

Now, we can draw an obvious conclusion if the zero dynamic is indeed stable:

Theorem: If $z = 0$ is locally exponentially stable for the zero dynamics, $\dot{z} = q(0, z)$, then $u_{IO}, v$ locally exponentially stabilizes $x = 0$.

The proof is as follows:

Proof: The closed loop system is given by

$$ \begin{align} \dot{\xi} &= A_{CL} \xi, A_{CL} = A - BK \\ \dot{z} &= q(\xi, z) \end{align} $$

where

$$ A_{CL} = \begin{pmatrix} 0 & 1 & 0 &\ldots & 0 \\ 0 & 0 & 1 & \ldots & 0 \\ \vdots & \vdots & \vdots & \ddots & \vdots \\ -k_1 & -k_2 & -k_3 & \ldots & -k_r \end{pmatrix} $$

where $\Re{\lambda_i} < 0$ for all $i = 1, \ldots, r$. If we linearize the system at $\xi = z = 0$, we get the following:

$$ \frac{d}{dt} \begin{pmatrix} \delta \xi \\ \delta z \end{pmatrix} = \begin{pmatrix} A_{CL} & 0 \\ \frac{\partial q}{\partial \xi}(0, 0) & \frac{\partial q}{\partial z}(0, 0) \end{pmatrix} \begin{pmatrix} \delta \xi \\ \delta z \end{pmatrix} $$

The matrix is Hurwitz, thus the end of the proof.

We notice that the relationship between the new states $\xi$ and original states $x$ is such that

$$ \xi = T(x)$$

To verify if $T$ is a diffeomorphism, we introduce the following theorem:

Inverse Function Theorem: A function $T: \mathbb{R}^n \to \mathbb{R}^n$, $T \in C^1$ satisfies

$$ \frac{\partial T}{\partial x}(x_0) \neq 0 $$

is full rank, then $T^{-1}$ exists, and is continuous and differentiable.

Now let’s connect IO linearization back to feedback linearization. It’s not hard to see that if we can guarantee the zero dynamic to be stable, then the original system will be stable; the best way to make sure the zero dynamic is always stable is that the zero plane shrinks to just a point, and this is done if $r = n$. Therefore we have the following theorem:

Isidori, chapter 4: If a nonlinear system $\Sigma$ has a relative degree $r$ at $x_0$, then on the neighborhood of $x_0$, the functions

$$ \{ h(x), L_fh(x), \ldots, L_f^{r-1}h(x) \} $$

are independent. Then, we can conclude that, $\Sigma$ is feedback linearizable, if and only if $\exists y = h(x)$ such that the output has relative degree $r = n$.

In short, if the output of the system satisfies that the relative degree equals to the system degree, then the system is always linearizable. However, if the output has a relative degree smaller than the system degree, it’s possible that we didn’t pick a good output – how do we know if a system can in fact be feedback linearizable? We’ll have to introduce some more new concepts to answer the question.

Introduction to Differential Geometry

Our audience may find themselves familiar with these concepts, if they have taken classes in general relativity.

Manifold

Let $M$ be a non-empty set of $\mathbb{R}^n$ and let $1 \le m < n$, then $M$ is an n-dimensional smooth Manifold of $\mathbb{R}^n$ if, $\forall p \in M$, $\exists r > 0$, $F:B_r(p) \to \mathbb{R}^{n-m}$ such that:

$M \cap B_r(p) = { x \in \mathbb{R}^n | F(x) = 0 }$
$ F \in C^0 $
$ \forall \bar{x} \in M \cap B_r(p)$, $\text{rank} \frac{\partial F}{\partial x}(\bar{x}) = n - m $

Intuitively, a manifold is a shape that “embeds” into a Euclidean space. We can always find a local mapping (also known as the “atlas”) to map the manifold into another local region in Euclidean space, given these two spaces have the same dimension.

Some well-known manifolds are:

Circle, a 1D manifold in $\mathbb{R}^2$
Mobius strip, a 2D manifold in $\mathbb{R}^3$
Sphere, a 2D manifold in $\mathbb{R}^3$
Klein bottle, a 2D manifold in $\mathbb{R}^4$

Tangent Space

Let $M$ be a smooth manifold in $\mathbb{R}^n$ and let $p \in M$, suppose $F: B_r{p} \to \mathbb{R}^{n-1}$ satisfies conditions from definitions of $M$. Then the Tangent Space of $p$, denoted as $T_pM$ is such that

$$ T_pM = \{ v \in \mathbb{R}^n | \frac{\partial F}{\partial x}(p) v = 0 \} = \mathbb{N}(\frac{\partial F}{\partial x}(p)) $$

Note that, $\text{dim}(T_pM) = m$.

Tangent Vector

The Tangent Vector is a vector in tangent space.

We denote the relationship of manifold, tangent space and tangent vector like below:

Vector Field

Vector Field $f$ on manifold $M$ is an assignment to each $p \in M$ a vector $f(p) \in T_pM$. Note that, the vector field is $C^k$ if $f \in C^k$.

Lie Bracket

Given $f, g$ as two different vector fields, the Lie Bracket is defined as

$$ \begin{align} [f, g](x) &= \frac{\partial g}{\partial x}(x) f(x) - \frac{\partial f}{\partial x}(x) g(x) \\ &= L_fg - L_gf \\ \end{align} $$

The Lie bracket can also be expressed in terms of “adjoint” operator, i.e.:

$$ ad_f g(x) = [f, g](x) $$

We can use adjoint operator to express nested Lie brackets:

$$ \begin{align} ad_f^2g(x) &= [f, ad_f g(x)] \\ &= [f, [f, g]](x) \end{align} $$

In general, we have

$$ ad_f^kg(x) = [f, ad_f^{k-1}g(x)] $$

An example of Lie bracket calculation is as follows:

$$ \begin{align} f &= \begin{pmatrix} x_2 \\ -\sin x_1 - x_2 \end{pmatrix} \\ g &= \begin{pmatrix} 0 \\ x_1 \end{pmatrix} \\ [f, g](x) &= L_fg - L_gf \\ &= \begin{pmatrix} 0 \\ x_2 \end{pmatrix} - \begin{pmatrix} x_1 \\ -x_1 \end{pmatrix} \\ &= \begin{pmatrix} -x_1 \\ x_2 + x_1 \end{pmatrix} \end{align} $$

Some useful properties of Lie bracket:

$[f, f ] = 0$
$[f, g] = -[g, f]$
If $f$ and $g$ are constant vectors, then $[f, g] = 0$.

Now let’s consider a linear system $\dot{x} = Ax + Bu$, if we express this in terms of control-affine system form, we have

$$\begin{align} \dot{x} &= f(x) + g(x)u \\ f(x) &= Ax \\ g(x) &= B \end{align} $$$$ \begin{align} ad_fg &= -AB \\ ad_f^2g &= A^2B \\ ad_f^3g &= -A^3B \\ \vdots \\ ad_f^kg &= (-1)^k A^k B \end{align} $$

Tangent Bundle

The Tangent Bundle of a manifold $M$ is defined as

$$ TM = \bigcup_{p \in M} T_pM $$

That is, it’s the “bundle” of all tangent spaces at each point in the manifold.

Distribution

Suppose $f_1, f_2, \ldots, f_n$ are vector fields, the Distribution is defined as

$$ \Delta (x) = \text{span}\{f_1(x), f_2(x), \ldots, f_n(x)\} $$

whereas at each specific point $x$, $\Delta(x)$ represents the subspace of the tangent space $T_xM$.

$\Delta$ is non-singular distribution if $\text{dim}(\Delta(x))$ is a constant $\forall x$.
$\Delta$ is involutive if $$ \forall f, g \in \Delta \Rightarrow [f, g] \in \Delta $$

Let’s consider the following example:

$$ \begin{align} f_1 &= \begin{pmatrix} 2x_2 \\ 1 \\ 0 \end{pmatrix} \\ f_2 &= \begin{pmatrix} 1 \\ 0 \\ x_2 \end{pmatrix} \\ \Delta &= [f_1, f_2] \end{align} $$

Because $\text{dim}\Delta(x) = 2$ for all $x$, the distribution $\Delta$ is non-singular.

$\Delta$ is involutive is equivalent to

$$ [f_1, f_2] = \begin{pmatrix} 0 \\ 0 \\ 1 \end{pmatrix} \in \Delta $$

if and only if $\text{rank}(f_1, f_2, [f_1, f_2]) = 2 \quad \forall x$. Unfortunately the rank is 3, therefore $\Delta$ is not involutive.

Feedback Linearizability

With all these mathematical definitions, we are finally able to determine whether a nonlinear system can actually be feedback linearized, using the following theorem:

A nonlinear system $\Sigma$ is feedback linearizable, if an only if:

$[g(x), ad_fg(x), \ldots, ad_f^{n-1}g(x) ]$ has rank $n$, $\forall x$. This condition guarantees controllability.

$\Delta = \text{span}{g, ad_fg, \ldots, ad_f^{n-2}g}$ is involutive.

If we are able to determine whether the system is feedback linearizable, the next step will be to look for the specific output with relative degree $n$. From our earlier discussion, we are looking for a function $y = h(x)$ such that it meets the following conditions:

$$\begin{align} \begin{cases} L_gh = L_gL_f = &\ldots = L_gL_f^{n-2}h = 0 \forall x \\ L_gL_f^{n-1}h &\neq 0 \end{cases} \end{align} $$

In fact, these two conditions are equivalent to the following two conditions:

$$\begin{align} \begin{cases} L_gh = L_{ad_fg}h = &\ldots = L_{ad_f^{n-2}g}h = 0 \forall x \\ L_{ad_f^{n-1}g}h &\neq 0 \end{cases} \end{align} $$

The advantage of the latter formulation is that we can write the first condition as:

$$ \frac{\partial h}{\partial x} \begin{pmatrix} g(x) & ad_fg(x) & \cdots & ad_f^{n-2}g(x) \end{pmatrix} = 0 $$

The important fact here is that, the solution for this partial differential equation only exists, if $\Delta = {g, ad_fg, \ldots, ad_f^{n-2}g}$ is involutive, according to the (Frobenius theorem)[https://en.wikipedia.org/wiki/Frobenius_theorem_(differential_topology)].

To prove that the two conditions are indeed equivalent, we use the following lemma:

Lemma: Given $L_gh = L_gL_f h = \ldots = L_gL_f^{n-2}h = 0$ for all $x \in B_\delta (x_0)$, then we have
$$L_gL_f^kh = (-1)^k L_{ad_f^kg}h, \forall k = 0,1,\ldots, r-1$$
.

This lemma can be proven using induction, and we skip the full proof here.

In this chapter, we discussed the condition for a nonlinear system to be fully feedback linearizable. In the final chapter, we’ll give some examples and extend to multi-input multi-output case.

Feedback Linearization, Part 1

Thu, 11 Jun 2026 21:38:58 +0800

Feedback linearization is a seemingly obvious but powerful technique in control theory that transforms a nonlinear system into a linear one through state and input feedback.

I learned this technique when I was taking MEC237 from Berkeley, and I later realized it’s actually pretty useful and one of the most universal techniques in nonlinear control.

Motivation

Let’s consider a first-order nonlinear system:

$$ \dot{x} = x^3 + u $$

Where $x$ is our internal state, and $u$ is our control input. If the system has no control, state $x$ is unstable. Intuitively, if $x$ is greater than 0, $\dot{x}$ is also greater, pushing it away from the equilibrium point, and vice versa if $x < 0$.

One caveat here is that, we can’t use Lyapunov indirect method to conclude instability, because the Jacobian matrix is 0, and nothing can be concluded from a both non-positive and non-negative Jacobian matrix eigenvalue.

How shall we use the control input to stabilize the system? Let’s consider the input $u = -x^3 - x$, where if we sub-into the original system:

$$ \dot{x} = x^3 + (-x^3 - x) = -x $$

This is now a negative feedback system with eigenvalue strictly negative, and we can therefore conclude stability.

What did we make the control input do? We use a nonlinear term $-x^3$ in the control input to cancel the original system’s unstable term, and introduce another stabilizing linear term $-x$ to ensure stability. The mechanism where we use feedback to achieve a stable linear system is called Feedback Linearization.

However, is this technique universal? The answer is no. Look at the following system:

$$ \begin{align} \begin{cases} \dot{x}_1 = a \sin x_2 \\ \dot{x}_2 = -x_1^2 + u \end{cases} \end{align} $$

In fact, any input $u$ can’t linearize both states $x_1$ and $x_2$. However, if we do a state transformation like below:

$$ \begin{align} z_1 &= x_1 \\ z_2 &= a \sin x_2 \end{align} $$

Then it’s not too hard to verify that the transformed system can actually be linearized. Thus by combining state transform and control transform, we are able to achieve feedback linearization.

Some Definitions

We now give some useful definitions to help understand the feedback linearization technique.

Control-Affine System

A system that has the following form is called a Control-Affine System:

$$ \dot{x} = f(x) + g(x)u $$

where $f(x)$ and $g(x)$ are smooth vector fields.

An example of a system that’s not control-affine is:

$$ \dot{x} = f(x) + g(x)u^2 $$

Diffeomorphism

In differential geometry, a diffeomorphism is a smooth invertible map between differentiable manifolds, whose inverse is also smooth. To express in mathematical language, such mapping $T$ satisfies $T \in C^1$ and $T^{-1} \in C^1$.

Feedback Linearizable

A nonlinear control-affine system $\Sigma: \dot{x} = f(x) + g(x)u$ is said to be feedback linearizable if there exists a control law $u = \alpha (x) + \beta (x)v$ and state transform $z = T(x)$, where $T$ is a diffeomorphism, such that the transformed system $\dot{z} = Az + Bv$ satisfies $(A, B)$ is controllable.

Lie Derivative

The Lie Derivative is an operator such that:

$$ L_f u = \frac{\partial u}{\partial x} f(x) $$

We will see how Lie derivative helps us simplify some notations later.

Input-Output Linearization

There are cases that we can’t perform full feedback linearization, but we can still achieve input-output linearization.

Consider the same system that we defined earlier:

$$ \begin{cases} \dot{x}_1 = a \sin x_2 \\ \dot{x}_2 = -x_1^2 + u \\ y = x_2 \end{cases} $$

Note that now we assign the output $y$ to be only a function of $x_2$. Now, we can perform Input-Output Linearization $ u = x_1^2 + v$ such that:

$$ y = x_2 = -x_1^2 + x_1^2 +v = v$$

from there, we manage to achieve a linear relationship between the new control law $v$ and output $y$. However, since $x_1$ is an unobservable state from $y$, we can’t tell just from $y$ whether the inner system is stable – therefore it’s possible for the inner state to explode while the output shows nothing, causing system failure.

Now with all these definitions, we would like to answer the following questions:

When is a system feedback linearizable?
If not feedback linearizable, when is the system IO linearizable?
Is there connection between IO linearization and system linearization?

Relative Degree of Output $y$

Let’s consider the following system that is a generalization of a SISO control-affine system:

$$ \begin{align} \dot{x} &= f(x) + g(x)u \\ y &= h(x) \end{align} $$

Where $f, g, h$ are sufficiently smooth.

We notice that $y$ is not a function of $u$, to the first order because there is no direct control term in $y$. Let’s try to take the derivative of $y$:

$$ \begin{align} \dot{y} &= \frac{\partial h(x)}{\partial x} \dot{x} \\ &= \frac{\partial h(x)}{\partial x} (f(x) + g(x)u) \\ &= L_f h(x) + L_g h(x) u \end{align} $$

Note that we used the Lie derivation notation.

Now assume that $L_g h(x) \neq 0$, then we have a direct term $u$ in $\dot{y}$. We can therefore make $u$ such that:

$$ u = L_g h(x) ^ {-1} (-L_f h(x) + v) $$

and so that:

$$ \dot{y} = v $$

What if $L_g h(x) = 0$? In that case, $u$ doesn’t appear in the first derivative of $y$:

$$ \dot{y} = L_f h(x) $$

But no worries, we can take another derivative operation:

$$ \ddot{y} = L_f^2 h(x) + L_gL_fh(x)u $$

Note that, $L_aL_bc(x) = L_a(L_b c(x))$. Suppose we have $L_gL_fh(x) \neq 0$, then we can again IO linearize the system as:

$$ u = L_gL_f h(x)^{-1}[-L_f^2 h(x) + v] $$

and thus:

$$ \ddot{y} = v $$

If we continue doing this, we will arrive at step $r$:

$$ y^{(r)} = L_f^rh(x) +L_gL_f^{r-1}h(x)u $$

And if we have $L_gL_f^{r-1}h(x) \neq 0$, we can make it such that:

$$ u = L_gL_f^{r-1}h(x)^{-1}[-L_f^r h(x) + v] $$

and therefore

$$ y^{r} = v $$

In this case, the IO linearized system is a $r^{th}$ order linear system.

We now give the definition of $r$:

A SISO system $\dot{x} = f(x) + g(x)u, y = h(x)$ has relative degree $r$ with respect to the output $y = h(x)$ around $x_0$ if:

$\forall 0 \le k < r-1$, $L_gL_f^kh(x) = 0$, $\forall x \in $ neighborhood of $x_0$.
$L_gL_f^{r-1}h(x) \neq 0$, $\forall x \in $ neighborhood of $x_0$. Let’s look at some examples.

$$ \begin{align} \dot{x}_1 &= x_2 \\ \dot{x}_2 &= -x_1^3 + u \\ y &= x_1 \end{align} $$

It’s obvious that the relative degree is not 0 because $u$ doesn’t show up directly in $y$. We take the first derivative of $y$:

$$ \dot{y} = x_2$$

Still no $u$. Differentiate again:

$$ \ddot{y}= -x_1^3 + u$$

Now $u$ shows up, therefore the relative degree of $y$ is 2. Note that the coefficient of $u$ is always a well-defined 1, therefore output $y$ always has relative degree of 2 anywhere in $\mathbb{R}$.

$$ \begin{align} \dot{x}_1 &= x_2 + x_3^3 \\ \dot{x}_2 &= x_3 \\ \dot{x}_3 &= u \\ y &= x_1 \end{align} $$

We differentiate $y$ twice:

$$ \ddot{y} = x_3 + 3x_3 u $$

Now we realize that $y$ doesn’t have a well-defined degree around $x_3 = 0$, and has a relative degree of 2 anywhere else.

Let’s try to apply the concept to our familiar linear system:

$$ \begin{align} \dot{x} &= Ax + Bu \\ y &= Cx \end{align} $$

If we differentiate $y$:

$$ \dot{y} = CAx + CBu $$

Now, if $CB = 0$, we’ll have to differentiate again:

$$ \ddot{y} = CA^2x + CAB u $$

continue doing this, we have relative degree $r$ if

$ CB = CAB = \ldots = CA^{r-2}B = 0$
$ CA^{r-1}B \neq 0$

Isn’t the quantity $CA^{r-1}B$ familiar? It’s a composite of the controllability matrix and observability matrix. In fact, this conclusion leads directly to the concept of Kalman decomposition.

Zero Dynamics

A fact regarding the relative degree $r$:

$r$ is always less than or equal to the order of the system $n$, and cannot be greater than $n$. If we keep differentiating without getting $u$ show up in $y$, the relative degree is usually undefined.

Now, for the IO linearized system $y^{(r)} = v$, we can choose the state vector:

$$ z = \begin{pmatrix} y \\ \dot{y} \\ \vdots \\ y^{(r-1)} \end{pmatrix} \in \mathbb{R}^r $$

Thus, we will arrive at:

$$ \dot{z} = \begin{pmatrix} 0 & 1 & 0 & \cdots & 0 \\ 0 & 0 & 1 & \cdots & 0 \\ \vdots & \vdots & \vdots & \ddots & \vdots \\ 0 & 0 & 0 & \cdots & 1 \\ 0 & 0 & 0 & \cdots & 0 \end{pmatrix} z + \begin{pmatrix} 0 \\ 0 \\ \vdots \\ 0 \\ 1 \end{pmatrix} v $$

If you have read another article of mine: Mason’s Gain Formula and Control Canonical Forms, you’ll realize system follows the controllability canonical form, thus $z$ is always controllable, given matrix $A$ is a complete Jordan block. Therefore, we can always define a feedback control mechanism

$$ v = -Kz$$

such that

$$ \dot{z} = (A - BK)z $$

is always stable, or

$$\Re(\lambda(A - BK)) < 0$$

If we convert $v$ back in terms of $x$, we will get

$$ v = -k_1 h(x) - k_2 L_f h(x) - \ldots - k_r L_f^{r-1} h(x) $$

Now that if we have $z(t) \to 0$ as $ t \to \infty$, $y \to 0$, $\dot{y} \to 0$, etc. We can guarantee the output $y$ is stable. But, how about $x$? Is the original system stable? This leads to the discussion of zero dynamics.

If we define the set $Z = {x \in \mathbb{R}^n : h(x) = \dot{h}(x) = \ldots = h^{(r-1)}(x) = 0}$, then $Z$ is called the zero dynamics of the system. It stands for the part of the system where it’s not shown explicitly on the output $y$, or it’s unobservable.

Note that the dimension of the zero dynamics set is $n - r$.

What we did for IO linearization are the following:

We construct the surface $Z$ with dimension $n - r$.
We make $Z$ attractive, i.e. we let $x$ approach the surface asymptotically.
We also make $Z$ invariant, i.e. $x$ never leaves the surface once it’s on the surface.

However, whether the dynamics on the surface is stable dictates whether the original system $x$ is stable. The dynamic on the surface is also known as the zero dynamics. Let’s take an example to illustrate the zero dynamics. Consider the following system:

$$ \begin{align} \dot{x}_1 &= x_2 \\ \dot{x}_2 &= \alpha x_3 + u \\ \dot{x}_3 &= \beta x_3 - u \\ y &= x_1 \end{align} $$

It’s easy to get relative degree of 2 for output $y$ because

$$ \ddot{y} = \alpha x_3 + u $$

Now, suppose when $t \to \infty$, both $y$ and $\dot{y}$ approach zero. What happens to the state $x$? If $y = 0, \dot{y} = 0$, then $x_1 = x_2 = 0$, and we have

$$ \dot{x}_3 = (\beta + \alpha) x_3 $$

Therefore, the zero dynamic on $x_3$ is stable if and only if $\beta + \alpha < 0$.

In fact, in this case $x_3$ is the uncontrollable state of the nonlinear system. Just like linear system theory, if the uncontrollable state is stable, the entire system can be stabilized.

We discussed IO linearization in this article. In part 2, we are going to talk about actual feedback linearization.

Steady-state Error, part 1

Thu, 30 Apr 2026 22:09:36 +0800

Before we resume talking about why adding a capacitor solves the step input problem without solving the ramp input problem, let’s review some basic knowledge from linear system.

Linear System Basics

Initial Value Theorem

For a function $f(t)$ and its Laplace transform $F(s)$, the initial value theorem states that

$$ \lim_{t \to 0} f(t) = \lim_{s \to \infty} sF(s) $$

Final Value Theorem

For a function $f(t)$ and its Laplace transform $F(s)$, the final value theorem states that

$$ \lim_{t \to \infty} f(t) = \lim_{s \to 0} sF(s) $$

Types of Inputs

There are a few basic types of inputs that we use to kickstart the system. Here is a summary table:

Input Type	Time Domain Response $f(t)$	Laplace Transform $F(s)$	Order
Impulse	$\delta(t)$	1	undefined
Step	$u(t)$	$\displaystyle\frac{1}{s}$	0
Ramp	$t \cdot u(t)$	$\displaystyle\frac{1}{s^2}$	1
Parabolic	$\displaystyle \frac{t^2}{2} \cdot u(t)$	$\displaystyle\frac{1}{s^3}$	2
…	…	…	…

Unity Feedback System

We call the following system a Unity-Feedback System, if the feedback path has a gain of 1.

System Types

For a unity feedback system, we assume the controller has a transfer function that looks like:

$$ \displaystyle H(s) = \frac{K(s-z_1)(s-z_2)\cdots(s-z_m)}{s^k(s-p_1)(s-p_2)\cdots(s-p_n)} $$

That is, the controller has $m$ zeros, $n+k$ poles where $k$ poles are at the origin. To satisfy stability, $m < n+k$ otherwise the transfer function will not be a proper transfer function.

We define the type of the system as the number of pure integrators in $H(s)$. In our definition of $H(s)$, the type is $k$.

Now, if we sub $H(s)$ into the unity feedback system, using Mason’s Gain Formula, we have the closed-loop transfer function of the overall system:

$$ \begin{align} \displaystyle H_{cl}(s) &= \frac{H(s)}{1 + H(s)} \\ &= \frac{K(s-z_1)(s-z_2)\cdots(s-z_m)}{s^k(s-p_1)(s-p_2)\cdots(s-p_n) + K(s-z_1)(s-z_2)\cdots(s-z_m)} \end{align} $$

Therefore the transfer function of the steady-state error is given by:

$$ \begin{align} E(s) &= 1 - H_{cl}(s) \\ &= \frac{1}{1 + H(s)} \\ &= \frac{s^k(s-p_1)(s-p_2)\cdots(s-p_n)}{s^k(s-p_1)(s-p_2)\cdots(s-p_n) + K(s-z_1)(s-z_2)\cdots(s-z_m)} \end{align} $$

Now, let’s say if our input is type-N:

$$ \begin{align} F(s) &= \frac{1}{s^{N+1}} \end{align} $$

Then the resulting steady-state error is going to be:

$$ \begin{align} e(s) &= E(s) \cdot F(s) \\ &= \frac{s^{k-N-1}(s-p_1)(s-p_2)\cdots(s-p_n)}{s^k(s-p_1)(s-p_2)\cdots(s-p_n) + K(s-z_1)(s-z_2)\cdots(s-z_m)} \end{align} $$

If we apply final value theorem, we get

$$ \begin{align} \lim_{t \to \infty} e(t) &= \lim_{s \to 0} s \cdot e(s) &=\frac{s^{k-N}(s-p_1)(s-p_2)\cdots(s-p_n)}{s^k(s-p_1)(s-p_2)\cdots(s-p_n) + K(s-z_1)(s-z_2)\cdots(s-z_m)} \end{align} $$

It’s not too hard to see that the term of interest is $s^{k-N}$. We conclude therefore:

$$ \begin{cases} \lim_{t \to \infty} e(t) = 0 & \text{if } k > N \\ \lim_{t \to \infty} e(t) = \frac{1}{K} & \text{if } k = N \\ \lim_{t \to \infty} e(t) = \infty & \text{if } k < N \end{cases} $$

This is a pretty elegant result, in other words, the convergence of steady state error follows such conditions:

If the system type (k) is greater than the input type (N), the steady-state error is zero.
If the system type (k) is smaller than the input type (N), the steady-state error will diverge to infinity.
If the system type and input type are the same, the steady state error will converge to a non-zero number, which will be a finite fraction of the input, depending on the DC gain of the system.

Another intuitive way to look into this is that, if our input is of a higher system and the system itself can’t generate fast enough response, the system will always fall behind the input, and vice versa.

The problems with higher order system

Now if we take the original system we discussed last time when we added a capacitor, we realize that adding that capacitor helped us to increase the system type, and thus we are able to track the input better.

Here is the further question: what if the input is of type 2? Based off our discussion just now, we might just want to add another integrator so as to further increase the system type, like below:

This makes plausible sense; however this system will unfortunately fail. Why? Let’s try to simplify the system by assuming $g_m = 1$ and $C=1$. If we perturb our system by a step input, according to our discussion above, the system should be able to track it without problem. First, the open loop gain is given by:

$$ H_{ol} = \frac{1}{s^2} $$

Now, if we closed the loop and calculate the closed loop gain:

$$ H_{cl} = \frac{1}{1 + H_{ol}(s)} = \frac{s^2}{1+s^2} $$

With a step input, the output has a frequency domain representation of:

$$ V_{out}(s) = H_{cl} \cdot \frac{1}{s} = \frac{s}{1+s^2} $$

Now if we perform inverse laplace transform of the s-domain representation, we will get:

$$ V_{out}(t) = \sin t$$

That is to say, we increased the system type and we wished for the steady state error to converge faster, however the system is not even able to track a type-1 input, but start oscillating. What’s the problem here?

The reason is that by introducing another pole, we introduced 90 degrees more input phase, and thus the effective phase margin of the system is 0. From another angle, we can apply Barkhausen stability criterion and realize that the system automatically satisfies that criterion, and immediately realize that the system is oscillatory.

The fix is to introduce damping to either of the integrator to produce a zero in the forward gain, thus making the phase margin positive.

The derivation is left to our audience if you would like to give it a try.

For the next part, we will further expand the damping concept and introduce a few compensation techniques we can use: lead, lag, and lead-lag compensation to improve the system response.

Mason's Gain Formula and Control Canonical Forms

Thu, 23 Apr 2026 00:28:37 +0800

Introduction

Pop quiz: what’s the transfer function $H(s) = V_{out} / V_{in}$ in the following circuit?

Assumptions:

$C_c$ is a big coupling capacitor.
No channel-length modulation.
You don’t have to solve for DC, all small signal parameters are given. Don’t assume unspecified parameters, for example $r_o$, $C_g$, etc.
The circuit is linear.

OK this circuit does look a bit intimidating. For entry-level analog circuit class takers, they might take out pencil to work through the analysis, but it’s super tedious, time consuming and error-prone.

Look at the circuit, what is the main reason that makes analysis difficult? Feedback. Not just one feedback path, there are two feedbacks from each of the two stages rendering the overall analysis not so straightforward. However, we are going to introduce a very elegant mathematical tool to deal with all these kinds of closed-loop structures.

Mason’s Gain Formula

Samuel Jefferson Mason was born in 1921. As a distinguished electronics engineer, his most famous scientific contributions are Mason’s invariant and Mason’s rule, or Mason’s gain formula, both named after him.

Mason’s gain formula is used to find the transfer function of a closed-loop system. A closed loop system doesn’t need to contain only one loop; it could contain multiple loops, and they can even interact with each other. Conventional algebraic way to find the transfer function usually requires solving complex simultaneous systems, but Mason’s gain formula provides an easy way to find it.

Mason’s gain formula is particularly suitable for a system that can be described using a Signal Flow Graph.

Signal Flow Graph (SFG)

A Signal Flow Graph (SFG) is a graphical representation of a system. As the name suggests, an SFG is a directed graph, meaning it has the following components:

Node: a node is a vertex that represents a variable in a system.
Branch: a branch is a directed edge that represents a transfer function between two nodes. It has a linear gain. If the gain is 1, we don’t annotate it on the SFG.
Input/Output: Input / Output nodes are special nodes where we use to denote the transfer function’s departure and arrival points.
Addition: Two signals could be added together, given SFG is targeting linear systems.

Now, with these simple definitions, we are able to construct more complex notations + structures:

Path: a path is a sequence of branches that connect nodes in the graph, such that no node is visited more than once.
- Forward Path: a forward path is a path from the input node to the output node.
- Path Gain: the product of the gains of all branches in a path.
Loop: a loop is a path that starts and ends at the same node. A loop is a specific type of path.
- Loop Gain: the product of the gains of all branches in a loop.

Example: Type 2 PLL

Shown below is a type-2 PLL:

We are able to see 5 paths here and a simple loop. We defined 4 nodes, with 1 input node and 1 output node.

The Formula

Mason’s Gain Formula states the following:

Mason’s Gain Formula:
$$H(s) = \frac{\sum_{k=1}^{N} P_k \Delta_k}{\Delta}$$
where:

$N$ is the number of forward paths from input to output

$P_k$ is the path gain of the $k$-th forward path

$\Delta$ is the determinant of the system: $\Delta = 1 - \sum L_i + \sum L_i L_j - \sum L_i L_j L_k + \ldots$

$\sum L_i$ is the sum of all individual loop gains

$\sum L_i L_j$ is the sum of products of all pairs of non-touching loops

$\sum L_i L_j L_k$ is the sum of products of all triplets of non-touching loops

and so on…

$\Delta_k$ is the cofactor of the $k$-th forward path, obtained by removing all loops that touch the $k$-th path from $\Delta$

Examples

A Type 2 Charge Pump PLL

Let’s take the type-2 PLL system shown above for example. In this example, there is 1 single loop, and only 1 forward path from input to output. Therefore, $N=1$, and:

$$\begin{align} \Delta &= 1 - L_1 = 1 + K_{PFD}I_{CP}(R + 1/sC)K_{VCO}/s / N \\ \Delta_1 &= 1 \\ \displaystyle \Sigma_{k=1}^{N}P_k \Delta_k &= K_{PFD}I_{CP}(R + 1/sC)K_{VCO}/s \end{align}$$

Bear in mind that $\Delta_1 = 1$ because there is only one loop, and it does touch the forward path, therefore we remove the only contributing loop gain from $\Delta$.

Thus, combining the terms together, we have the expression for the closed loop gain:

$$ H(s) = \frac{K_{PFD}I_{CP}(R + 1/sC)K_{VCO}/s}{1 + K_{PFD}I_{CP}(R + 1/sC)K_{VCO}/s / N} $$

We realize that the loop gain is large when frequency is low, the loop gain dominates and $H(s) = N$, meaning the low-frequency phase noise of the PLL will be the reference times $N^2$. At high frequency, the loop gain dies out and $H(s) = 0$. Therefore the reference to output phase transfer function is a low-pass filter.

A Triple Integrator System

Now, let’s compute the transfer function of the SISO system below:

We notice that there are 3 loops and 3 forwarded paths. Luckily, they all touch each other, which makes our calculation very simple.

$$ \begin{cases} \Delta = 1 + a_1/s + a_2/s^2 + a_3/s^3 \\ p_1 = b1/s \\ \Delta_1 = 1 \\ p_2 = b2/s^2 \\ \Delta_2 = 1 \\ p_3 = b3/s^3 \\ \Delta_3 = 1 \end{cases} $$

Combining all the terms, we have

$$ \begin{align} H(s) &= \frac{b_1/s + b_2/s^2 + b_3/s^3}{1 + a_1/s + a_2/s^2 + a_3/s^3} \\ &= \frac{b_1 s^2 + b_2 s + b_3}{s^3 + a_1 s^2 + a_2 s + a_3} \end{align} $$

Canonical Forms

Doesn’t the last example have a very regular transfer function? This is actually intended.

In a control system modeled in time domain, we have our system defined using state-space model:

$$ \begin{align} \dot{x}(t) &= Ax(t) + Bu(t) \\ y(t) &= Cx(t) \end{align} $$

and we know that, if we perform Laplace transform, while assuming a 0 initial condition, we have

$$ sX(s) = AX(s) + BU(s) $$

Therefore, by algebraic manipulation, we have

$$ Y(s)/U(s) = C(sI - A)^{-1}B $$

Which is the transformation between state space model to transfer functions.

Now, there could be only one transfer function for a state space model, but there could be infinite state space models for one simple transfer function. The general rule of thumb is that, the number of poles in a transfer function regulates the number of states in the corresponding state space model, because we need that many number of integrators. However, we could create more states (but those come with either constraints, or are redundant, meaning linearly independent of the pre-existing states).

There are some state space models that are different from generic ones, if we generate from a transfer function. Here are some of them:

Controllable Canonical Form

We already encounter the controllable canonical form in the previous example.

Controllable canonical form is a specific type of form because the generated state space model is always controllable. The state space model is given by:

$$ \begin{align} \frac{d}{dt}X &= \begin{pmatrix} 0 & 1 & 0 & 0 & \ldots & 0 & 0 \\ 0 & 0 & 1 & 0 & \ldots & 0 & 0 \\ \vdots & \vdots & \vdots & \vdots & \ddots & \vdots & \vdots \\ 0 & 0 & 0 & 0 & \ldots & 1 & 0 \\ -a_0 & -a_1 & -a_2 & -a_3 & \ldots & -a_{n-1} & -a_n \end{pmatrix}X + \begin{pmatrix} 0 \\ 0 \\ \vdots \\ 0 \\ 1 \end{pmatrix}U \\ Y &= \begin{pmatrix} b_0 & b_1 & b_2 & \ldots & b_n \end{pmatrix}X \end{align} $$

According to Rudolf Kalman, the controllability matrix of the controllability canonical form is always going to be full rank. That’s why we call it controllable canonical form.

Observable Canonical Form

Observability, the dual of controllability, also has its canonical form. Its state space model representation is given by:

$$ \begin{align} \frac{d}{dt}X &= \begin{pmatrix} 0 & 0 & 0 & \ldots & 0 & -a_0 \\ 1 & 0 & 0 & \ldots & 0 & -a_1 \\ 0 & 1 & 0 & \ldots & 0 & -a_2 \\ 0 & 0 & 1 & \ldots & 0 & -a_3 \\ \vdots & \vdots & \vdots & \ddots & \vdots & \vdots \\ 0 & 0 & 0 & \ldots & 1 & -a_{n-1} \end{pmatrix}X + \begin{pmatrix} b_0 \\ b_1 \\ b_2 \\ b_3 \\ \vdots \\ b_{n-1} \end{pmatrix}U \\ Y &= \begin{pmatrix} 0 & 0 & 0 & \ldots & 0 & 1 \end{pmatrix}X \end{align} $$

If you take a closer look, you’ll notice that the observable canonical form is precisely the transpose of the controllable canonical form: $A_o = A_c^T$, $B_o = C_c^T$, and $C_o = B_c^T$. This is not a coincidence — it is a direct manifestation of the duality between controllability and observability. Taking the transpose of a state space realization preserves the transfer function, since

$$ H(s) = C(sI - A)^{-1}B = \left[ B^T (sI - A^T)^{-1} C^T \right]^T $$

and the transfer function is a scalar for SISO systems, so the transpose is itself.

By the dual of Kalman’s argument, the observability matrix of the observable canonical form is always full rank, which is why we call it the observable canonical form. Notice as well that, unlike the controllable form where the input coefficients $b_i$ are placed in the output matrix $C$, here they show up directly in the input matrix $B$. Each state $x_i$ accumulates a weighted contribution from $u(t)$ and feeds back through $-a_i$ to drive only the last state, which is then read out at the output. Reading the SFG above from right to left makes the structure obvious: it is the controllable canonical form with all arrows reversed.

Diagonal Form and Jordan Form

The controllable and observable canonical forms are built around the coefficients of the polynomials $a_i$ and $b_i$. The Jordan form takes a different approach: instead of starting from the polynomial coefficients, we start from the poles of the transfer function. Performing partial fraction decomposition,

$$ H(s) = \frac{b_{n-1}s^{n-1} + \ldots + b_1 s + b_0}{(s - p_1)(s - p_2) \ldots (s - p_n)} = \sum_{i=1}^{n} \frac{r_i}{s - p_i} $$

where $p_i$ are the poles and $r_i$ are the residues. Each first-order term $r_i/(s - p_i)$ corresponds to a single integrator with self-feedback $p_i$ and a gain $r_i$ at the output. The SFG above is exactly that — $n$ parallel branches, each with its own pole, all summed at the output.

Stacking these parallel branches into a state space model gives the diagonal Jordan form (assuming distinct poles):

$$ \begin{align} \frac{d}{dt}X &= \begin{pmatrix} p_1 & 0 & 0 & \ldots & 0 \\ 0 & p_2 & 0 & \ldots & 0 \\ 0 & 0 & p_3 & \ldots & 0 \\ \vdots & \vdots & \vdots & \ddots & \vdots \\ 0 & 0 & 0 & \ldots & p_n \end{pmatrix}X + \begin{pmatrix} 1 \\ 1 \\ 1 \\ \vdots \\ 1 \end{pmatrix}U \\ Y &= \begin{pmatrix} r_1 & r_2 & r_3 & \ldots & r_n \end{pmatrix}X \end{align} $$

Because $A$ is diagonal, the states are completely decoupled — each $x_i$ evolves independently as $\dot{x}_i = p_i x_i + u$, and the output is just a weighted sum of these modes. This makes the Jordan form particularly useful for analysis: the eigenvalues of $A$ are read off the diagonal, so stability is immediate (all $\text{Re}(p_i) < 0$), and each mode’s contribution to the output is exactly $r_i$.

When poles are repeated, $A$ is no longer fully diagonalizable. For a pole $\lambda$ with multiplicity $k$, the corresponding diagonal block becomes a Jordan block:

$$ J_k(\lambda) = \begin{pmatrix} \lambda & 1 & 0 & \ldots & 0 \\ 0 & \lambda & 1 & \ldots & 0 \\ \vdots & \vdots & \ddots & \ddots & \vdots \\ 0 & 0 & 0 & \lambda & 1 \\ 0 & 0 & 0 & 0 & \lambda \end{pmatrix} $$

The superdiagonal of 1’s couples adjacent states within the block, which corresponds to terms of the form $r_{i,j}/(s - \lambda)^j$ in the partial fraction expansion. The overall $A$ matrix is still block-diagonal, with one Jordan block per distinct pole.

Unlike the controllable and observable canonical forms — which are guaranteed to be controllable / observable by construction — the Jordan form is only controllable and observable when all residues $r_i$ are nonzero and all poles are distinct. A zero residue corresponds to a pole-zero cancellation in $H(s)$, which means a mode that is either uncontrollable or unobservable (or both). In that sense, the Jordan form is the most honest of the three: it makes hidden modes visible rather than burying them in the structure.

Modified Form

It becomes even trickier if the original system’s poles are not on the real axis, but contains complex conjugate pairs. In this case, we can either sub in diagonal form with the complex entries, or we use what’s called the “modified form”.

The modified Jordan form (also known as the real Jordan form) keeps the state space model entirely real-valued by replacing each pair of complex conjugate poles $p_i = \sigma \pm j\omega$ with a single $2 \times 2$ real block on the diagonal of $A$:

$$ \begin{pmatrix} \sigma + j\omega & 0 \\ 0 & \sigma - j\omega \end{pmatrix} \quad \longrightarrow \quad \begin{pmatrix} \sigma & \omega \\ -\omega & \sigma \end{pmatrix} $$

This block is similar to the diagonal complex form via the change of basis

$$ T = \frac{1}{\sqrt{2}}\begin{pmatrix} 1 & 1 \\ j & -j \end{pmatrix}, $$

so the transfer function is preserved. Concretely, for a system whose poles consist of $m$ real poles $\lambda_1, \ldots, \lambda_m$ and $\ell$ complex conjugate pairs $\sigma_k \pm j\omega_k$, the modified form is

$$ A = \begin{pmatrix} \lambda_1 & & & & & & \\ & \ddots & & & & & \\ & & \lambda_m & & & & \\ & & & \sigma_1 & \omega_1 & & \\ & & & -\omega_1 & \sigma_1 & & \\ & & & & & \ddots & \\ & & & & & & \begin{smallmatrix} \sigma_\ell & \omega_\ell \\ -\omega_\ell & \sigma_\ell \end{smallmatrix} \end{pmatrix} $$

The eigenvalues of each $2 \times 2$ block are exactly $\sigma_k \pm j\omega_k$, so spectral information is unchanged — we have only traded a complex diagonal representation for a real block-diagonal one. This is the form most software packages return by default, since real arithmetic is cheaper and avoids the bookkeeping of conjugate pairs. Repeated complex poles generalize to real Jordan blocks by replacing each scalar entry of the complex Jordan block with the corresponding $2 \times 2$ real block, and each superdiagonal $1$ with a $2 \times 2$ identity.

Where were we?

We talked a lot about the famous Mason’s gain formula, and several canonical forms that we can use to represent linear time-invariant systems. Now going back to the original question: what’s the closed loop gain of the original system?

Well, we do realize that the system had two loops, but they do touch; and the system has only one forward path. That makes our computation significantly easier.

Let’s use some notations here. Let’s denote the forward gain as $F_1, F_2$ and the loop gains as $L_1, L_2$. From basic analog circuit theory, we have:

$$ \begin{align} F_1 &= -\frac{g_{m1}R_{d_1}}{1 + g_{m1}R_{d_1}||1/sC_1} \\ F_2 &= \frac{g_{m2}R_{s2}}{1 + g_{m2}R_{s2}} \\ L_1 &= (G_m R_{gate} || R_{g1} || R_{g2} F_1) \\ L_2 &= (-g_{mp} R_{gate} || R_{g1} || R_{g2} ) F_1 F_2 \end{align} $$

And finally, our closed loop gain:

$$ \frac{V_{out}}{V_{in}}(s) = \frac{F_1F_2}{1 + L_1 + L_2} $$

This looks a lot faster compared to manually breaking down all expressions.

I would like to point out at the end of this article that the original circuit is unstable, because both loop gains are positive. Unless the loop determinant is strictly positive in real part, the circuit will be unstable.

Steady-state Error, part 0

Thu, 16 Apr 2026 22:08:25 +0800

Introduction

I started to have the very first question regarding “steady-state error” when I was a sophomore. I still recall the first class in EE2002: Analog Electronics when the professor introduced the very first Operational Amplifier, and here is what he said:

“An Operational Amplifier, or an OpAmp, is a circuit that has infinite gain, infinite input impedance, and 0 output impedance.”

I was very confused back then. Out from all the 3 properties, the most counterintuitive one was the “infinite gain” property. I can still understand the engineering approximation of infinite input impedance due to CMOS nature, and 0 output impedance if you treat the output as a current source, but infinite gain doesn’t make any sense. For the next few classes, I learned that infinite gain of the OpAmp allows feedback structure to kick in, so that it can provide a gain per the feedback impedance structure.

However, I feel that something is off here but I can’t quite put my finger on it. I didn’t quite get the correct terminology until I turned into a junior when I was taking EE3019: Integrated Electronics when we were introduced feedback in a more well-defined way, and I become aware that there is a term called “steady-state error” that defines the difference between the desired settling value versus the actual settled value. I feel like, yes, this is the correct word I’ve been looking for.

I came across this once again when I started my PhD and started to design what’s called “Phase-Locked Loops (PLLs)”, a specific feedback structure that’s used to amplify a clean clock. I came across two different terms now: Type-1 and Type-2 PLLs. (I doubt whether a lot of PLL designers can actually tell the difference between the two). Interestingly enough, the textbook, nor the slides talk anything about why it’s named type-1 or type-2, as if it’s just a naming convention.

I wasn’t 100% clear on this matter until I took MEC237: linear control where I was introduced the book: Control System Engineering by Norman Nise, and looking into the book actually helped me understand the entire steady-state error theory.

The Problem Setup

Let’s go back and give the problem intuition. Suppose we have an OpAmp, and we would like to use it as a voltage follower, so we configure it like it’s a unit buffer:

Elementary analog circuit professor will tell you that because an ideal OpAmp has infinite gain, and it will always make sure both inputs are equal to each other. WRONG. There are at least two very hand-wavy explanation here:

The assumption of infinite gain is an idealization that doesn’t hold in practice.
The infinite gain assumption also doesn’t explain why it will make two inputs equal to each other.

In reality, the OpAmp’s gain depends on the transconductance gain $g_m$ and the loading impedance $R$, and we define our open loop gain to be

$$ A = g_m R$$

For the OpAmp connected in the abovementioned way, the small signal relationship between the input and output is given by:

$$ V_{out} = A(V_+ - V_-) $$

where $V_+$ and $V_-$ are the voltages at the non-inverting and inverting inputs, respectively.

Now we try to use that relationship to analyze the behavior of the unit buffer. We realize that here we have $V_- = V_{out}$, so solving the equation we have:

$$ V_{out} = A(V_{in} - V_{out}) \\ V_{out} = \frac{A}{1+A} V_{in} $$

Now, if we find the transfer function from input to output, we identify

$$ H(s) = \frac{V_{out}(s)}{V_{in}(s)} = \frac{A}{1+A} \neq 1 $$

which basically means that the output is going to be just slightly smaller compared to the input.

The reason we would like to make our amplifier to be infinite gain is that, if $A \to \infty$, we easily have

$$ H(s) = \lim_{A \to \infty} \frac{A}{1+A} = 1 $$

which means that the output will approach the input as the gain approaches infinity; otherwise there will be an error term between the real output versus the desired output, which is given by

$$ \begin{align} \Delta V &= V_{out,desired} - V_{out,real} \\ &= V_{in} - V_{out,real} \\ &= V_{in} - \frac{A}{1+A} V_{in} \\ &= \frac{1}{1+A} V_{in} \end{align} $$

We can conclude two things from this expression:

The error is inversely proportional to $(1+A)$. The larger the gain is, the smaller the error is. However, if the gain is finite, no matter what non-zero input we see, the output can never achieve the desired value.
The larger the input voltage is, the larger the error is.

We call the error value between the desired and actual output the steady-state error. This phenomenon happens in closed-loop systems where we would like to control a control plant to approach a value that we want, in this example, the OpAmp is both a controller, a control plant and a detector.

Now, the question becomes if we are able to reduce this error at all.

The Temporary Elixir: A Capacitor

The fix is surprisingly simple. We replace the resistor with a pure capacitor:

Let’s do a time-domain calculation here first. Let’s assume the output current of the $g_m$ cell is defined as $i$, then we have the relationship between $V_{out}$ and $i$:

$$ \begin{align} i &= g_m (V_{in} - V_{out}) \\ i &= C\frac{dV_{out}}{dt} \end{align} $$

This ODE is not too hard to solve by hand. Assume a 0 initial condition on $V_{out}$, we have

$$ V_{out} = V_{in} (1 - e^{-\frac{g_m t}{C}}) $$

The assumption that we had initial condition 0 is equivalent to say that, if we provide a step response at the input, the output looks like an exponential decay curve that gradually goes to the input voltage. If we let $t \to \infty$, then we can easily see that

$$ \lim_{t \to \infty} V_{out} = V_{in} $$

Nothing too fancy here; however, let’s move one step further; how about it if my input is not a step function, but a ramp function?

Ramp Response

Same circuit, but now my $V_{in} = t$. What happens to $V_{out}$? The ODE now becomes:

$$ \begin{align} V_{in}(t) &= t \\ i &= g_m (V_{in}(t) - V_{out}) \\ i &= C\frac{dV_{out}}{dt} \end{align} $$

Again, we can solve the ODE system by direct integration. This gives us:

$$ V_{out}(t) = t - \tau + \tau e^{-\frac{t}{\tau}} $$

where $\tau = \frac{C}{g_m}$.

Interestingly, if we now consider the steady-state error as a function of time, we have

$$ \begin{align} e(t) &= V_{in}(t) - V_{out}(t) \\ &= t - (t - \tau + \tau e^{-\frac{t}{\tau}}) \\ &= \tau - \tau e^{-\frac{t}{\tau}} \\ &= \tau(1 - e^{-\frac{t}{\tau}}) \end{align} $$

As $t \to \infty$, unfortunately the error doesn’t die out, which would be what we have seen for a step response case. We show the plotting here as well:

In fact, we will see in later parts of this series, step input is called a “type-0” input, and ramp input is called a “type-1” input. The original R-loaded OpAmp is called a “type-0” system, and the C-loaded improved OpAmp is called a “type-1” system.

In the next section, we will introduce some very useful mathematics tool to help us analyze the system without solving the ODE every single time.

Poisson Point Process (PPP) and Bit Error Rate (BER)

Mon, 13 Apr 2026 21:42:59 +0800

I realize when I’m sharing my knowledge with my colleagues, they are largely not Chinese users. Therefore I’ll try to mark tech-related things down in English starting from today, in my blog.

Introduction

I was taking STAT150 last semester from UC Berkeley. Although the teaching wasn’t as engaging as I wished for, I was able to grasp most of the useful key concepts. One of the very useful mathematical models was the Poisson point process.

I encountered this process once again when I was doing my link measurement, when we were supposed to benchmark the chip’s bit error rate.

Here are two questions that arise from this:

If a link has a bit error rate of $10^{-15}$, what does this mean?

Does this mean that if I send $10^{15}$ bits, I’m likely to see 1 error, or I’m likely to see some error?

If I were to benchmark a link’s performance, how many bits should I send in order to confidently say that the link has a BER less than $10^{-15}$?

Does sending $10^{15}$ bits and observing no error suffice?

Without giving direct answers to both questions, let’s review some fundamentals.

Poisson Distribution

We give the formal definition of a 1D Poisson distribution here.

Definition (Poisson Distribution): A random variable $X$ follows a Poisson distribution with parameter $\lambda > 0$, denoted $X \sim \text{Poisson}(\lambda)$, if its probability mass function (PMF) is given by:
$$ > \begin{align} > P(X = k) = \frac{\lambda^k e^{-\lambda}}{k!}, \quad k = 0, 1, 2, \ldots > \end{align} > $$
where $k!$ denotes the factorial of $k$.

Key Properties:

The mean and variance of a Poisson distribution are both equal to the parameter $\lambda$:

$$ \begin{align} E[X] &= \lambda \\ \text{Var}(X) &= \lambda \end{align} $$

Intuitive Interpretation:

The Poisson distribution models the number of events occurring in a fixed interval of time or space, given that events occur independently at a constant average rate. The parameter $\lambda$ represents the expected number of events in that interval.

Common Applications:

Number of arrivals in a queue during a time period
Number of photons detected by a sensor in a fixed duration
Number of errors in a data transmission over a fixed number of bits
Number of radioactive decays in a given time window

Poisson Point Process

Definition (Poisson Point Process):

A Poisson point process (PPP) with rate (or intensity) $\lambda > 0$ is a stochastic process ${N(t) : t \geq 0}$ that counts the number of events occurring in the time interval $[0, t]$. It satisfies the following properties:

Independent Increments: For any non-overlapping intervals $[t_1, t_2)$ and $[t_3, t_4)$ with $t_2 \leq t_3$, the number of events in these intervals are independent random variables.

Stationary Increments: The distribution of the number of events in any interval depends only on the length of that interval, not on its starting time. Specifically, for any $t > 0$ and $s \geq 0$:
$$ > \begin{align} > N(s + t) - N(s) \sim \text{Poisson}(\lambda t) > \end{align} > $$

No Multiple Events: The probability of more than one event occurring in an infinitesimal time interval $dt$ is negligible, i.e., $o(dt)$.

Initial Condition: $N(0) = 0$.

Counting Process Characterization:

For a Poisson point process with rate $\lambda$, the number of events $N(t)$ in a time interval $[0, t]$ follows a Poisson distribution:

$$ \begin{align} P(N(t) = k) = \frac{(\lambda t)^k e^{-\lambda t}}{k!}, \quad k = 0, 1, 2, \ldots \end{align} $$

The expected number of events in time $t$ is:

$$ \begin{align} E[N(t)] = \lambda t \end{align} $$

Inter-arrival Times:

An important consequence of the Poisson point process is that the time intervals between consecutive events (inter-arrival times) are independent and exponentially distributed with rate $\lambda$. If $T_i$ denotes the time until the $i$-th event, then:

$$ \begin{align} T_i \sim \text{Exponential}(\lambda), \quad f(t) = \lambda e^{-\lambda t}, \quad t \geq 0 \end{align} $$

Condition on Event Count:

If we know $N(t) = k$, the positions of the $k$ events in the interval $[0, t]$ are distributed as independent and uniformly on $[0, t]$.

How does this relate to bit error rate?

If we operate a link, whether it will yield an error depends on whether the random jitter exceeds the eye width, thus we sample the incorrect data. Random jitter, however, follows a Gaussian distribution. If we assume the clock is centered at the quadrature point, and the eye width happens to be $6\sigma$, then we immediately arrive at the conclusion that the probability of success is $99.6%$. Given the clock is usually from a PLL whose jitter profile is a stationary process (after observing longer than the loop constant), we can safely say between symbols, the error probability is independent. Of course this is a very crude assumption because factors such as inter-symbol interference from a low-pass channel are not taken into account, but for simplicity let’s move forward with this assumption.

A quick note is that an open loop oscillator’s jitter sequence is not a stationary process; it’s a random walk. Meaning if we observe long enough, the oscillator’s phase deviation will grow unbounded. In the time domain, the jitter is just the instantaneous standard deviation, which grows over time.

Now, if we observe 10 such samples, each of them has independent success probability of $99.6%$, it shouldn’t be hard to see that the probability of all 10 samples being successful is $(0.996)^{10}$. The probability of having 1 error will be if one of them is having an error, and all others are successful. To extend this result, the error profile should follow a binomial distribution:

$$ \begin{align} \text{Error} \sim \text{Bin}(N, p) \end{align} $$

where $N$ is the number of bits sent, and $p$ is the probability of error for each bit.

This whole story now sounds like we are flipping an uneven coin every single time, and the total error count follows a binomial distribution. How does this relate to Poisson process?

Law of Rare Events

Theorem (Law of Rare Events, Poisson Limit Theorem):

Let $X_n \sim \text{Bin}(n, p_n)$ be a sequence of binomial random variables where $n \to \infty$ and $p_n \to 0$ such that $n \cdot p_n \to \lambda$ for some constant $\lambda > 0$. Then:
$$ > \begin{align} > \lim_{n \to \infty} P(X_n = k) = \frac{\lambda^k e^{-\lambda}}{k!}, \quad k = 0, 1, 2, \ldots > \end{align} > $$
In other words, $X_n \xrightarrow{d} X$ where $X \sim \text{Poisson}(\lambda)$.

Intuitive Explanation:

The Law of Rare Events states that when we have a large number of independent trials, each with a very small probability of success, the number of successes approximately follows a Poisson distribution. The key condition is that the product $n \cdot p$ (the expected number of events) remains constant as $n$ increases and $p$ decreases.

Practical Implications for Bit Error Rate:

In our BER context:

$N$ is very large (number of bits transmitted)
$p$ is very small (bit error probability, e.g., $10^{-15}$)
The product $\lambda = N \cdot p$ is the expected number of bit errors

Why This Matters:

Computing probabilities with a binomial distribution requires calculating factorials and large powers, which is computationally expensive. The Poisson approximation provides:

$$ \begin{align} P(\text{Error count} = k) \approx \frac{(Np)^k e^{-Np}}{k!} \end{align} $$

This is much simpler to work with, especially for answering our original questions about BER testing.

Now, let’s answer the two questions we had from the beginning.

Question 1: If a link has BER = $10^{-15}$, what does this mean?

This means that, on average, we expect 1 error for every $10^{15}$ bits transmitted. In other words, the probability of any single bit being in error is $10^{-15}$. However, it’s totally possible that we receive 0 errors, 2 errors, 3 errors, and if you get super unlucky, all your received bits are erroneous, but this is super, super, super, super unlikely, although the probability is not zero.

We use the following table to illustrate the probabilities of different error counts when we send exactly $10^{15}$ bits with BER = $10^{-15}$:

Error Count $k$	$P(\text{Error} = k)$	Cumulative Probability	Notes
0	$e^{-1} \approx 0.3679$	36.79%	No errors observed
1	$e^{-1} \approx 0.3679$	73.58%	Exactly 1 error
2	$\frac{1}{2}e^{-1} \approx 0.1839$	89.97%	Exactly 2 errors
3	$\frac{1}{6}e^{-1} \approx 0.0613$	96.10%	Exactly 3 errors
4	$\frac{1}{24}e^{-1} \approx 0.0153$	98.63%	Exactly 4 errors
5	$\frac{1}{120}e^{-1} \approx 0.0031$	99.94%	Exactly 5 errors
$\geq 6$	$\approx 0.0006$	$\geq 99.94%$	6 or more errors

Interpretation:

With $\lambda = 10^{15} \times 10^{-15} = 1$, the error count follows a Poisson distribution with parameter $\lambda = 1$. The table reveals several surprising facts:

Zero errors are most likely: There’s a 36.79% chance of observing no errors at all!
One error is equally likely: One error is also expected with 36.79% probability.
Multiple errors are possible: There’s a 27.4% chance of observing 2 or more errors.

This directly answers your first question: sending $10^{15}$ bits with BER $10^{-15}$ does NOT guarantee you’ll see exactly 1 error. You’re actually more likely to see either 0 or 1 error, with roughly equal probability.

Question 2: How many bits should I send to confidently establish BER < $10^{-15}$?

This is more complex and requires statistical hypothesis testing. However, we can provide some intuition using the Poisson model.

If we observe 0 errors after sending $N$ bits, what can we claim about the BER? Using the Poisson approximation with $\lambda = N \cdot p$:

$$ \begin{align} P(\text{0 errors observed} \mid \text{true BER} = p) = e^{-Np} \end{align} $$

Now, here comes a concept called “confidence level.” Confidence level means the probability of getting ourselves right. For example, if we want to confirm our bit error rate is <1e-15, but we only send 10 bits and see 0 errors, the confidence that I can safely say my bit error rate is <1e-15 is very low. However if I send 1e27 bits and I see 0 error so far, I can very confidently say that the link has BER <1e-15.

Then, how do we set our confidence level? What does this mean intuitively? Let’s take it the contrapositive way:

If I know my bit error rate = 1e-15, that means if I send 3e15 bits, it’s 95% probable that I’ll see at least 1 error.
The contrapositive of the above statement is that, if I observe 0 errors after sending 3e15 bits, I can be 95% confident that the true BER is less than 1e-15.

The math is shown below:

$$ \begin{align} e^{-Np} &= 0.05 \\ -Np &= \ln(0.05) \\ Np &\approx 2.996 \approx 3 \end{align} $$

This means to claim BER < $10^{-15}$ with 95% confidence after observing zero errors, we need:

$$ \begin{align} N \cdot 10^{-15} &= 3 \\ N &= 3 \times 10^{15} \end{align} $$

So the answer to Question 2 is: No, sending $10^{15}$ bits and observing no error does NOT suffice. You would need to send approximately $3 \times 10^{15}$ bits to claim with 95% confidence that the BER is less than $10^{-15}$.

Beyond Raw BER testing

SiTime has this useful webpage to calculate the experiment time based on the required confidence level and desired accuracy.

In real practice, sending 1e27 bits is usually not physically possible. Take a 256Gbps parallel link for example, 1e27 bits testing will take 3.9e15 seconds, meaning 1e12 hours, meaning 123 million years to complete. I am not sure if human civilization will still exist by then. Therefore instead, people assume a jitter profile (DJ+RJ), for example dual dirac + Gaussian, and use only RJ component to estimate the true bit error rate. This is also known as the bathtub method.

主动滤波器(9)：频率变换(4)

Mon, 25 Aug 2025 22:17:53 +0800

在频率变换（3）里，我们证明了频率变换（1）里直觉性的推导实际上是充分必要的解。基于我们的证明，我们提出了几种基本的从低通滤波器衍生其他三种高通，带通和带阻滤波器的方法。

除了这三种简单的频率变换之外，这一节我们讨论几种特殊的频率变换方法。

理查变换 (The Richard’s Transformation)

假如说我们想要把一个低通滤波器变成一个带通滤波器，但是这个带通滤波器要有周期性响应，如下图：

在图中，我们将原本带宽为1 rad/s的低通滤波器变换为中心频率为π，2π…以及π的整数倍的带通滤波器。我们该如何实现这种滤波器？

根据频率变换（1）里讲的两条基本原则：

零点映射：$\omega = 0$必须移动到$f(\omega) = 0$，也就是说$\omega$的零点必须移动到$f(\omega)$的零点
极点映射：$\omega = \infty$必须移动到$f(\omega) = \infty$，也就是说$\omega$的极点必须移动到$f(\omega)$的极点

我们知道，这个变换的零点一定在$0$, $\pm \pi, \pm 2\pi, \ldots, k\pi $的位置上，而变换的极点一定在$\pm \frac{\pi}{2}, \pm \frac{3\pi}{2}, \ldots, (2k+1)\frac{\pi}{2}$的位置上$k \in \mathbb{Z}$。

也就是说，我们的变换应该满足这样的形式：

$$ \begin{aligned} f(\omega) &= \frac{l \omega (\omega^2 - \pi^2)(\omega^2 - (2\pi)^2)\ldots}{(\omega^2 - (\frac{\pi}{2})^2)(\omega^2 - (\frac{3\pi}{2})^2)\ldots} \\ &= l_1\frac{[\omega(1-\frac{\omega}{\pi}^2)(1-\frac{\omega}{(2\pi)}^2)\ldots(1-\frac{\omega}{(k\pi)}^2)]}{[(1-\frac{\omega}{(\frac{\pi}{2})}^2)(1-\frac{\omega}{(\frac{3\pi}{2})}^2)\ldots(1 - \frac{\omega}{k\pi + \frac{\pi}{2}}^2)]} \\ &= l_1 \frac{\displaystyle\prod_{k=0}^{\infty}\omega(1 - \frac{\omega}{(k\pi)}^2)}{\displaystyle\prod_{k=1}^{\infty}(1 - \frac{\omega}{(k\pi + \frac{\pi}{2})}^2)} \end{aligned} $$

实际上，如果我们绘制分子这个无限乘积，它看起来就像：

事实上，欧拉告诉我们这两个无限乘积都是三角函数：

$$ \begin{aligned} \sin(\omega) &= \prod_{k=1}^{\infty}\omega(1 - \frac{\omega}{(k\pi)}^2) \\ \cos(\omega) &= \prod_{k=0}^{\infty}(1 - \frac{\omega}{(k\pi + \frac{\pi}{2})}^2) \end{aligned} $$

因此，我们有：

$$ f(\omega) = l_1 \frac{\sin(\omega)}{\cos(\omega)} = l_1 \tan(\omega) $$

如果我们把复频率换回普通的频率：

$$\begin{aligned} f(s) = f(j\omega) &= jl_1 \tan(\omega) \\ &= jl_1 \tan(\frac{s}{j}) = l_1 \tanh (s) \end{aligned} $$

由于我们把$\omega \rightarrow l_1 \tan \omega$, 因此如果截止频率为1，那么新的截止频率满足$1 = l_1 \tan(\omega_{\text{bw}}) $，也就是说如果指定一个新的截止频率，$ s \rightarrow \frac{\tanh s}{\tan \omega_{\text{bw}}}$. 如果我们不想要在$\pi$的通带中心点，我们则可以使用放缩。因此，最后的变换公式为：

$$ s \rightarrow \frac{\tanh \frac{s\pi}{\omega_{0}}}{\tan \omega_{\text{bw}}} $$

电路实现

为了简单起见，我们不改变截止频率，只改变中心频率，那么$ s \rightarrow l_1 \tanh \frac{s\pi}{\omega_{0}} $. 在此变换下，一个电感$sL$将会变换成一个$Ll_1\tanh(\frac{s\pi}{\omega_{0}})$.那么问题来了，我们真的有这样一个电子元件可以实现$\tanh$的频率响应特性吗？

传输线(Transmission Line)理论

这个电路就是我们熟知的传输线，如果读者对射频电路有所了解的话。一个传输线由两个平行导体和一个介质组成，信号在传输线中传播时，会在导体之间形成电场和磁场，从而实现信号的传输。传输线的特性阻抗与其几何结构和介质材料有关。波方程告诉我们，传输线需要满足电报员方程，而要满足电报员方程，我们只需要令正向传播的电压与电流和反向传播的电压与电流满足如下关系：

$$ \begin{cases} V(x) = V^+(x) + V^-(x) \\ I(x) = \frac{V^+(x)}{Z_0} - \frac{V^-(x)}{Z_0} \end{cases} $$

其中$Z_0$是传输线的特性阻抗（characteristic impedance）。

要描述一段传输线，除了传输线的特性阻抗之外，我们还需要这段传输线的时间差（time delay），这段时间差告诉我们电磁波从传输线的一端发射到另一端所需的时间，通常记为$\tau$。现在，假如我们在某个点满足传输线方程，我们把考虑的点左移动时间$\tau$，那么正向传播的时间将会被提前$\tau$，反向传播的时间将会被延后$\tau$，但是传输线方程依然需要成立：

$$ \begin{aligned} V^+ &\rightarrow V^+e^{s\tau_1} \\ V^- &\rightarrow V^-e^{-s\tau_1} \end{aligned} $$

假如我们把传输线的一端短路，那么欧姆定律一定要成立：

$$ V^+ = -V^-, V^- + V^+ = 0 $$

那么在传输线的另外一端，

$$ \begin{aligned} V_{in} &= V^+ (e^{s\tau} - e^{-s\tau}) \\ &= V^+ (2\sinh(s\tau)) \end{aligned} $$

$$ \begin{aligned} I_{in} &= \frac{V^+e^{s\tau}}{Z_0} - -\frac{V^-e^{-s\tau}}{Z_0} \\ &= \frac{V^+}{Z_0} (e^{s\tau} + e^{-s\tau}) &= $$

主动滤波器(8)：频率变换(3)

Thu, 21 Aug 2025 20:58:21 +0800

在上一节中，我们得出了两个关于纯LC电路输入阻抗的重要结论：

阻抗的零点和极点必须位于虚轴上。
阻抗（以及导纳）的留数必须是正实数。

本节将进一步分析纯LC网络阻抗在虚轴上的行为，并探讨频率变换的唯一性与实现方式。

柯西-黎曼方程 (Cauchy-Riemann Equations)

对于复平面上的解析函数 $f(x, y) = u(x, y) + iv(x, y)$，其中 $u(x, y)$ 和 $v(x, y)$ 分别为实部和虚部，柯西-黎曼方程给出了函数解析的必要条件：

$$ \begin{aligned} \frac{\partial u}{\partial x} &= \frac{\partial v}{\partial y} \\ \frac{\partial u}{\partial y} &= -\frac{\partial v}{\partial x} \end{aligned} $$

由于LC网络的阻抗是有理函数，必然满足解析性，因此阻抗也必须满足柯西-黎曼方程。对于 $Z(s)$，有：

$$ \frac{\partial}{\partial \sigma}\Re [Z(\sigma + j\omega)] = \frac{\partial}{\partial \omega}\Im [Z(\sigma + j\omega)] $$

也就是说，阻抗实部对实频率的变化率等于虚部对虚频率的变化率。

结合上一节的正实性结论，进一步有：

$$ \frac{\partial}{\partial \omega}\Im [Z(\sigma + j\omega)] = \frac{\partial}{\partial \sigma}\Re [Z(\sigma + j\omega)] > 0 $$

并且，如果输入信号频率为实数，阻抗为实数；若频率为纯虚数，阻抗也为纯虚数。因此：

$$ \left.Z(s)\right|_{s=j\omega} = jX(\omega) \quad \therefore \frac{dX(\omega)}{d\omega} > 0 $$

简单验证如下：

对于电感：$Z(j\omega) = j\omega L \implies \frac{dX(\omega)}{d\omega} = L > 0$
对于电容：$Z(j\omega) = \frac{1}{j\omega C} = \frac{j}{-\omega C} \implies \frac{dX(\omega)}{d\omega} = \frac{1}{\omega^2 C} > 0$

因此，LC网络输入阻抗在虚轴上的导数始终为正。这意味着在虚轴上不可能出现连续的极点或零点，否则会与单调性矛盾。如下图所示：

所以，极点和零点在虚轴上必定交替出现：

并且，零点数与极点数的差额最多为1。综合所有结论，频率变换的推导实际上是唯一的。

充分必要的频率变换

我们建立了所有需要证明充分性的理论基础，现在是时候来检验我们之前直觉推导的频率变换的唯一性了。

低通-带通变换

低通-带通变换的映射关系如下：

根据频率变换的第一节，变换函数需满足：

$$ f(\omega) \propto \frac{(\omega + 1)(\omega - 1)}{\omega} = \frac{\omega^2 - 1}{\omega} $$

我们无法引入新的零点，因为差额已经为1.我们也无法引入新的极点。若新的极点为0，那么极点0的重数将不是1.若新的极点模长小于1，将不满足极点-零点交替出现的原则。若新的极点模长为1，将与零点抵消。若新的极点模长大于1，在频率为无穷大的时候的响应就不满足直觉。

由于无法引入新的极点或零点，唯一可调的是比例常数 $K$，且 $K$ 必须为正实数：

$$ f(\omega) = K\frac{(\omega - 1)(\omega + 1)}{\omega} $$

假设原低通滤波器带宽为 $\omega_{LP}$，则带通滤波器的两个截止频率满足：

$$ \omega_{LP} = K\frac{\omega^2 - 1}{\omega} $$

舍弃负频率，解得：

$$ \begin{cases} \omega_a = \frac{\omega_{LP}}{2K} + \sqrt{1 + \frac{\omega_{LP}^2}{4K^2}} \\ \omega_b = \frac{\omega_{LP}}{2K} - \sqrt{1 + \frac{\omega_{LP}^2}{4K^2}} \end{cases} $$

有：

$$ \omega_a \omega_b = \omega_{LP}^2 = 1 $$$$ \omega_a - \omega_b = \frac{\omega_{LP}}{K} $$

即，两个截止频率的几何平均为中心频率。带通滤波器的品质因子(Q) 定义为：

$$ Q = \frac{\text{Center Frequency}}{\text{Bandwidth}} = \frac{\omega_{LP}}{\omega_a - \omega_b} = K $$

最终映射为：

$$ s \rightarrow Q\left(\frac{s}{\omega_0} + \frac{\omega_0}{s}\right) $$

电感的变换：

$$ sL \rightarrow Q\left(\frac{s}{\omega_0} + \frac{\omega_0}{s}\right)L $$

即，电感 $L$ 变为电感 $\frac{LQ}{\omega_0}$ 与电容 $\frac{1}{QL\omega_0}$ 的串联：

这个结论符合我们的工程直觉，因为在DC的电感是一个短路，而在$\omega_0$的新电路也是短路。无穷频率的电感将是断路，而DC+无穷频率的新电路也是断路。

电容的变换：

$$ C \rightarrow \frac{QC}{\omega_0} \parallel \frac{1}{QC\omega_0} $$

即，电容 $C$ 变为电容 $\frac{QC}{\omega_0}$ 与电感 $\frac{1}{QC\omega_0}$ 的并联：

DC的电容是断路，而在$\omega_0$的新电路也是断路。无穷频率的电容是短路，而DC+无穷频率的新电路也是短路。

因此，低通-带通变换后，LC滤波器的阶数翻倍：

低通-高通变换

低通-高通变换如下：

变换关系为：

$$ f(\omega) \propto \frac{1}{\omega} $$

我们无法引入新的极点，否则零极点差额将会超过1.我们亦无法引入新的零点，否则无穷大的响应将不满足直觉。

因此我们能改变的只有成比例常数：

$$ f(\omega) = \frac{-K}{\omega} $$

我们一定要引入负号，否则新的阻抗不会是增函数。最终映射为：

$$ j\omega \rightarrow -j\frac{K}{\omega} \rightarrow \frac{K}{j\omega} $$

若需任意高通频率：

$$ s \rightarrow \frac{\omega_0}{s} $$

低通-高通变换后，电容变为电感，电感变为电容：

低通-带阻变换

低通-带阻变换如下：

变换关系为：

$$ f(\omega) \propto \frac{\omega}{(\omega + 1)(\omega - 1)} = \frac{\omega}{\omega^2 - 1} $$

与带通变换一样，我们无法加入任何新的极点或零点。同样，唯一未知量为比例系数，且必须为负实数：

$$ f(\omega) = \frac{-K\omega}{\omega^2 - 1} $$

带阻滤波器的两个截止频率的几何平均为中心频率，品质因子定义同前。最终映射为：

$$ s \rightarrow \frac{1}{Q\left(\frac{s}{\omega_0} + \frac{\omega_0}{s}\right)} $$

电感变为电容与电感的并联，电容变为电容与电感的串联：

综上，频率变换的形式和参数均由网络的物理特性唯一决定，无法随意添加极点或零点。所有变换均严格遵循正实性和极点零点交替分布的原则。

频率变换类型总结表

下表总结了几种从低通出发的频率变换类型及其特性。

变换类型	变换公式	元件变换方式	阶数变化
低通 → 带通	$s \rightarrow Q\left(\frac{s}{\omega_0} + \frac{\omega_0}{s}\right)$	电感 $\rightarrow$ 串联电感+电容电容 $\rightarrow$ 并联电感+电容	翻倍
低通 → 高通	$s \rightarrow \frac{\omega_0}{s}$	电感 $\rightarrow$ 电容电容 $\rightarrow$ 电感	不变
低通 → 带阻	$s \rightarrow \frac{1}{Q\left(\frac{s}{\omega_0} + \frac{\omega_0}{s}\right)}$	电感 $\rightarrow$ 并联电感+电容电容 $\rightarrow$ 串联电感+电容	翻倍

主动滤波器(7)：频率变换(2)

Wed, 20 Aug 2025 19:57:30 +0800

在上一节中，我们讨论了频率变换的工程直觉。简而言之，频率变换的核心准则只有一条：

零频点应映射到新的零点，无穷频点应映射到新的极点。

基于这一原则，我们通过直觉推导了从低通滤波器到其他类型滤波器的映射关系。然而，这些推导仅能得到“成正比”的关系，属于必要但不充分条件。

本节将进一步探讨纯LC电路的实现特性，并给出充分性证明。

特勒根定理（Tellegen’s Theorem）

在深入分析任何网络之前，我们先引入特勒根定理，为后续推导提供新的数学工具。

考虑如下图所示的网络，底部节点接地，各节点已标注电压与电流。每条支路可包含任意被动或主动元件，且可能为线性或非线性。

我们约定如下：

电流方向：流入节点为正，流出为负。
电压极性：高电位端为正，低电位端为负。

首先，我们可在节点1、2、3建立KCL方程，或用矩阵形式表示：

$$\begin{bmatrix} -1 & 0 & 0 & 1 & -1 & 0 \\ 1 & 1 & 1 & 0 & 0 & 0 \\ 0 & -1 & 0 & 0 & 1 & 1 \end{bmatrix} \begin{bmatrix} i_1 \\ i_2 \\ i_3 \\ i_4 \\ i_5 \\ i_6 \end{bmatrix} = \begin{bmatrix} 0 \\ 0 \\ 0 \end{bmatrix} $$

记为 $\textbf{A}\textbf{I} = \textbf{0}$。

这一关系总是成立，否则电流将无故产生或消失，违背物理定律。$\textbf{A}$ 的每一行对应一个节点的KCL，每一列对应一条支路的电流方向。

定义支路电压（Branch Voltage） 为第n支路的电压，例如支路1的电压为 $V_1 - V_2$。构建支路电压向量，满足：

$$ \textbf{V}_B = - \textbf{A}^T \textbf{V} $$

以本例验证：

$$\textbf{V}_B=\begin{bmatrix} V_1 - V_2 \\ -V_2 + V_3 \\ -V_2 \\ -V_1 \\ V_1 - V_3 \\ -V_3 \end{bmatrix} = -\begin{bmatrix} -1 & 1 & 0 \\ 0 & -1 & 1 \\ 0 & 1 & 0 \\ 1 & 0 & 0 \\ -1 & 0 & 1 \\ 0 & 0 & 1 \end{bmatrix} \begin{bmatrix} V_1 \\ V_2 \\ V_3 \end{bmatrix} = -A^T V $$

由能量守恒，有：

$$ \textbf{V}_B^T \textbf{I} = 0 $$

或展开为：

$$ \sum_k v_{bk}i_k = 0 $$

证明如下：

$$ \begin{aligned} \textbf{V}_B^T \cdot \textbf{I} &= (-A^T \textbf{V}_B)^T \cdot \textbf{I} \\ &= -\textbf{V}_B^T A \cdot \textbf{I} \\ &= -\textbf{V}_B^T \cdot \textbf{0} \\ &= 0 \end{aligned} $$

上述推导基于KCL和KVL，并未假设元件类型或线性特性。特勒根定理进一步指出，即使支路电流和支路电压分别对应不同网络的元件，这一广义能量守恒关系依然成立。

考虑如下两个网络，A与B实现方式完全不同，A可能由电容、电感组成，B则可能包含主动源或其他元件。

同样有：

$$ \begin{aligned} \sum_k{v_{b1k}i_{2k}} &= - [A^T \textbf{V}_1]^T \textbf{I}_2 \\ &= - \textbf{V}_1^T A \cdot \textbf{I}_2 \\ &= - \textbf{V}_1^T \cdot \textbf{0} \\ &= 0 \end{aligned} $$

这一结论极具普适性，表明只要网络结构相同，无论元件如何分布，广义能量守恒都成立。该定理适用于任意线性或非线性电路。

对于感性元件，功率定义为 $P = VI^*$，因此可得：

$$\textbf{V}^T_B \textbf{I}^* = \sum_k v_{bk} i_k^* = 0 $$

LC网络的极点与零点

考虑一个仅由LC元件组成的无损耗网络：

由能量守恒，有：

$$ \begin{aligned} |I_{1}(s)|^2 Z(s) &= \sum_{\text{All L and C}} v_k(s) i_k^*(s) \\ &= \sum_{\text{All L}} s L_k i_k(s) i_k^*(s) + \sum_{\text{All C}} \frac{1}{sC_k} i_k(s) i_k^*(s) \\ &= \sum_{\text{All L}} s L_k |I_k(s)|^2 + \sum_{\text{All C}} \frac{1}{s C_k} |I_k(s)|^2 \end{aligned} $$

令 $I_1(s) = 1$，则

$$ \begin{aligned} Z(s) &= \sum_{\text{All L}} s L_k |I_k(s)|^2 + \sum_{\text{All C}} \frac{1}{sC_k} |I_k(s)|^2 \\ &= \sum_{\text{All L}} s P_1 + \sum_{\text{All C}} \frac{1}{s} P_2 \end{aligned} $$

其中 $C$, $L$, $|I|^2$ 均为正实数，因此 $P_1, P_2 > 0$。

由此可得两点结论：

若频率变量 $s$ 为实数，则 $Z(s)$ 也为实数。
若 $s$ 有正实部，则 $Z(s)$ 的实部也为正。

我们将满足此性质的函数称为正实函数（Positive Real Function）。即对于仅含L和C的网络，有：

$$ \Re{[Z(s)]} \begin{cases} > 0 \text{ if } \Re[s] > 0 \\ = 0 \text{ if } \Re[s] = 0 \\ < 0 \text{ if } \Re[s] < 0 \end{cases} $$

复变函数的极点行为观察

观察复变函数在极点附近的行为。以 $\frac{1}{s - p_1}$ 为例，在极点 $p_1$ 左侧，函数实部为负；在右侧，实部为正。

若引入带有相位的留数（Residue），则分界线会随留数相位旋转。例如，绘制 $\frac{\angle 45^{\circ}}{s}$ 的实部分布：

若极点为重复极点，如 $\frac{1}{s^2}$，其实部分布如下：

而我们期望的函数实部行为应如下图所示：

因此，为使函数实部符合预期，需满足以下条件：

无左半平面极点，否则分界线将不在虚轴。
无右半平面极点，同理。
虚轴上的极点必须为简单极点（重数为1），否则分界线将非对称分割复平面。
留数必须为正实数，否则分界线将发生旋转。

此外，实系数函数的极点必以共轭对出现。上述讨论对导纳同样适用，因此零点也满足类似约束。

下一节将进一步探讨LC网络在虚轴上的零极点分布。

主动滤波器(6)：频率变换(1)

Wed, 23 Jul 2025 21:35:22 +0800

到目前为止，我们的所有讨论都是基于低通滤波器的。我们讨论了：

如何进行理想低通滤波器的近似
基本的网络合成方法：使用双端LC滤波器的标准形式来设计全极点滤波器，以及第二类切比雪夫滤波器

从这一节开始，我们讨论如何将一个低通滤波器转换为其他类型的滤波器，比如高通、带通和带阻滤波器。也就是说，在以后的工程实践中，我们并不需要从头开始设计每一种滤波器，而是可以通过频率变换，将一个低通原型滤波器转换为其他类型的滤波器。

1. 频率变换的基础：缩放操作

1.1 低通缩放(Scaling)

我们知道最基本的频率变换：缩放(Scaling) 。我们之前所有设计的滤波器的截止频率都是1 rad/s，然而在现实生活中，这个值显然不现实。对于一个吉他的效果器而言，我们可能需要的截止频率是1000 rad/s。我们可以通过缩放来实现这个目标。

假如我们原来的传递函数是$H(s)$，只要把$s$替换为$s/\omega_0$，就可以得到一个新的传递函数$H(s/\omega_0)$，其中$\omega_0$是我们想要的截止频率：

$$s \rightarrow \frac{s}{\omega_0} \quad \therefore \quad H(s) \rightarrow H\left(\frac{s}{\omega_0}\right)$$

使用缩放操作，我们有以下关系：

$$\omega \rightarrow \frac{\omega}{\omega_0}, \quad j\omega \rightarrow j\frac{\omega}{\omega_0}, \quad s \rightarrow \frac{s}{\omega_0}$$

1.2 元件阻抗的变换

我们可以看到，使用缩放操作，我们只需要变换原网络里的角频率。哪些元件的值里包括了角频率？答案是只有感性元件有这个特征，因此我们有下列表格展示在缩放情况下的阻抗变换：

元件类型	原阻抗	缩放后的阻抗	评论
电阻(R)	$R$	$R$	无变化
电感(L)	$sL$	$\frac{sL}{\omega_0}$	除以$\omega_0$
电容(C)	$\frac{1}{sC}$	$\frac{\omega_0}{sC}$	乘以$\omega_0$
主动元件(G)	$G$	$G$	无变化

重要结论：只有感性元件的阻抗在缩放时会发生变化。

这种低通到低通的变换非常简单，并且是线性操作。然而很不幸，如果我们要将一个低通滤波器转换为高通滤波器，转换函数将不会是一个线性函数。

一个值得注意的事实是，感性元件在经过变换之后，依然是感性元件。由于感性元件也是无损耗的，因此我们想要保留这个特性，从而保证在变换之后的滤波器依然是无损耗的。

2. 频率变换的基本原则

频率变换不是想怎么变就能怎么变的。假如我们有以下的变换映射：

$$\omega \rightarrow f(\omega)$$

那么，$f(\omega)$必须满足以下条件：

零点映射：$\omega = 0$必须移动到$f(\omega) = 0$，也就是说$\omega$的零点必须移动到$f(\omega)$的零点
极点映射：$\omega = \infty$必须移动到$f(\omega) = \infty$，也就是说$\omega$的极点必须移动到$f(\omega)$的极点
截止频率映射：$\omega$的截止频率也必须移动到$f(\omega)$的截止频率

2.1 带通滤波器的变换

根据上述原则，我们可以得知如果我们要把一个低通滤波器变成一个带通滤波器，我们需要满足以下条件：

零点映射：低通滤波器的零点必须移动到带通滤波器的零点，即$\omega = 0 \rightarrow f(\pm 1) = 0$
- 我们同样做了归一化假设
极点映射：低通滤波器的极点必须移动到带通滤波器的极点，即$\omega = \infty \rightarrow f(0, \pm \infty) = \infty$

因此，我们可以得出结论，转换函数至少需要拥有如下形式：

$$f(\omega) \propto \frac{(\omega + 1)(\omega - 1)}{\omega} = \frac{\omega^2 - 1}{\omega}$$

注意到这只是一个成正比的关系，因为我们并不知道转换函数是否有其他的因子。

2.2 高通滤波器的变换

同样的，对于高通滤波器而言，我们需要满足以下条件：

零点映射：低通滤波器的零点必须移动到高通滤波器的极点，即$\omega = 0 \rightarrow f(\infty) = \infty$
极点映射：低通滤波器的极点必须移动到高通滤波器的零点，即$\omega = \infty \rightarrow f(0) = 0$
截止频率映射：高通滤波器的截止频率满足$f(1) = 1$

因此，我们可以得出结论，转换函数至少需要拥有如下形式：

$$f(\omega) \propto \frac{1}{\omega}$$

再次注意到这只是一个成正比的关系，因为我们并不知道转换函数是否有其他的因子。

2.3 带阻滤波器的变换

同样的，对于带阻滤波器而言，我们需要满足以下条件：

零点映射：低通滤波器的零点必须移动到带阻滤波器的极点，即$\omega = 0 \rightarrow f(\pm 1) = \infty$
极点映射：低通滤波器的极点必须移动到带阻滤波器的零点，即$\omega = \infty \rightarrow f(0, \pm \infty) = 0$

因此，我们可以得出结论，转换函数至少需要拥有如下形式：

$$f(\omega) \propto \frac{\omega}{(\omega + 1)(\omega - 1)} = \frac{\omega}{\omega^2 - 1}$$

我们需要做最后一次提示，注意到这只是一个成正比的关系，因为我们并不知道转换函数是否有其他的因子。

2.4 可实现性条件

除了要满足上述映射要求之外，滤波器的基本稳定性要求也必须得到满足。也就是说，滤波器的极点必须在左半平面内，否则滤波器就会不稳定。由于传递函数的倒函数也是一个有效的传递函数，因此这个要求也必须满足于零点上。

稳定性要求：

所有极点必须位于左半平面
所有零点也必须位于左半平面或虚轴上
变换后的滤波器必须保持因果性和稳定性

3. 变换函数的性质总结

通过以上分析，我们可以总结出各种滤波器变换的基本形式：

滤波器类型	变换函数形式	零点-极点映射特征
低通→低通	$s/\omega_0$	线性缩放
低通→高通	$1/\omega$	零极点互换
低通→带通	$(\omega^2-1)/\omega$	一个极点/零点分裂为两个
低通→带阻	$\omega/(\omega^2-1)$	一个极点/零点分裂为两个

最后一次提示，以上变换函数都是成正比的关系，实际应用中可能需要根据具体情况添加其他因子。但是在下一节中，我们将证明其他因子只会是一个常数。

主动滤波器(5)：双端LC滤波器(2)

Wed, 23 Jul 2025 19:28:55 +0800

在上一节中，我们解释了如何使用双端LC滤波器的标准形式来设计一个全极点滤波器。总结起来，我们有以下的设计流程：

设计传递函数$H(j\omega)$
进行各种归一化假设
通过$H(j\omega)$推导反射系数$\Gamma(s)$
通过$\Gamma(s)$推导输入阻抗$Z_{in}(s)$
通过$Z_{in}(s)$推导整个滤波器的电路结构

我们知道，使用双端LC滤波器的标准形式，输入阻抗可以表示为：

$$Z_{in}(s) = sL_1 + \cfrac{1}{sC_1 + \cfrac{1}{sL_3 + \cfrac{1}{\ddots}}}$$

这是一个有限的连分数形式。我们可以通过这个公式推导出$Z_{in}(s)$，从而得到滤波器的电路结构。

本节将对LC标准形式进行总结，并讲解一些特殊情况的处理方法。

1. 全极点LC滤波器的性质

双端LC滤波器具有以下重要性质：

反射系数特性：在通带内，反射系数几乎为0；在阻带内，反射系数接近1
零点特性：在传递函数的零点处，反射系数为0
阶数限制：使用LC标准形式，我们只能实现传递函数的分子和分母阶数差不超过1的滤波器
- 这是因为在频率趋近于无穷时，输入阻抗只能表现为L、R、C中任意一种形式的特性
可实现的滤波器类型：
- 奇数阶或偶数阶巴特沃斯滤波器
- 奇数阶第一类切比雪夫滤波器
- 无法实现偶数阶第一类切比雪夫滤波器，因为其直流增益不等于1/2

那么，我们能否实现第二类切比雪夫滤波器？答案是肯定的，但需要对双端LC标准形式做适当的改进。

2. 第二类切比雪夫滤波器的电路实现

2.1 有限零点与谐振回路

第二类切比雪夫滤波器具有有限零点，而标准的双端LC形式只能实现全极点滤波器。因此，仅使用标准形式无法插入有限零点，从而无法实现第二类切比雪夫滤波器。

解决方案是引入谐振回路(Resonant Tank) 。

并联谐振回路

并联谐振回路的结构如下：

对于并联谐振回路，其输入阻抗为：

$$Z_{in}(j\omega) = \frac{1}{j\omega C} \parallel j\omega L = \frac{j\omega L/C}{1 - \omega^2 LC}$$

串联谐振回路

对于串联谐振回路，其输入阻抗为：

$$Z_{in}(j\omega) = j\omega L + \frac{1}{j\omega C} = -j\frac{1 - \omega^2 LC}{\omega C}$$

谐振特性

在$\omega = \frac{1}{\sqrt{LC}}$时：

串联谐振回路的阻抗为0
并联谐振回路的阻抗为无穷大

这个特殊的角频率称为谐振频率(Resonant Frequency)。利用谐振回路的这一特性，我们就可以在滤波器中插入有限零点。

2.2 设计实例：三阶第二类切比雪夫滤波器

让我们尝试实现一个三阶第二类切比雪夫滤波器，使用如下电路结构：

2.2.1 输入阻抗函数

三阶第二类切比雪夫滤波器对应的输入阻抗为：

$$Z_{in}(s) = \frac{2s^3 + 0.6746s^2 + 0.2271s + 0.0400}{0.6746s^2 + 0.2271s + 0.0400}$$

设计提示 (Shanthi Pavan): 当设计具有无理系数的滤波器时，舍入规则的经验法则是保持有效数字位数等于滤波器的阶数。例如，三阶滤波器应保留3位有效数字。否则，舍入误差会在求解电路其余部分时产生负阻抗。

2.2.2 高频特性分析

在无限频率时，由于电容短路，电感占据阻抗的主导地位，因此：

$$L_1 + L_2 \parallel L_3 = \frac{2}{0.6746}$$

2.2.3 零点条件

由于$Z_{in}(s) = sL_1 + Z_{in1}(s)$，我们需要确保在传递函数的零点处，串联谐振回路的阻抗为0。

这样设计的原因有两个：

在传递函数的零点处，反射系数为0
当谐振回路将剩余阻抗全部短路时，所有输入能量都被反射，负载端电压为0

因此，在零点频率$\omega_z$处：

$$Z_{in}(j\omega_z) = j\omega_z L_1$$

通过计算可得：$L_1 = 2.8384\text{H}$

2.2.4 剩余电路分析

去掉$L_1$后，剩余部分的输入导纳为：

$$Y_{in}(s) = \frac{0.6742s^2 + 0.2271s + 0.0400}{0.0852s^3 + 0.0299s^2 + 0.1135s + 0.0400}$$

串联谐振回路的导纳为：

$$Y_{\text{resonant}}(s) = \frac{sC_2}{1 + s^2L_2C_2}$$

在$\omega_z$处，导纳趋近于无穷大，且$\omega_z = \frac{1}{\sqrt{L_2C_2}}$。

关键观察：在接近$\omega_z$时，$Y_{\text{resonant}}(s)$和$Y_{in1}(s)$的行为非常相似：

$$\lim_{s \to j\omega_z} Y_{in1}(s) = \lim_{s \to j\omega_z} Y_{\text{resonant}}(s)$$

由于只有一个未知数，可求得：$C_2 = 5.6745\text{F}$

剩余求解过程比较直接。使用与上一节相同的方法，可求得：$L_3 = 2.838\text{H}$

代入所有值后，剩余阻抗为1Ω，正符合我们的负载预期。

2.3 偶数阶第二类切比雪夫滤波器的局限性

很遗憾，我们无法使用双端LC滤波器实现偶数阶第二类切比雪夫滤波器，因为在频率趋近无穷大时，要求的传递函数值非零，这与LC电路的物理特性相矛盾。

3. 总结

本节完成了对双端LC滤波器标准形式的总结，并介绍了如何使用谐振回路来实现奇数阶第二类切比雪夫滤波器。

到目前为止，我们的讨论都集中在低通滤波器的设计。读者可能会好奇：我们如何设计其他类型的滤波器——高通、带通、带阻滤波器？

在接下来的章节中，我们将展示一种优雅的数学变换方法，通过频率变换技术，可以将一个低通原型滤波器等价地转换为其他类型的滤波器。这种方法不仅在理论上具有重要意义，在工程实践中也有广泛应用。

主动滤波器(4)：双端LC滤波器(1)

Sun, 13 Jul 2025 13:38:51 +0800

我们在前三节中讨论了巴特沃斯滤波器以及两种切比雪夫滤波器的特性以及计算方法，但是我们还不知道应该怎么样使用电路元件制造这些对应的滤波器。从这一节开始，我们将从理论部分逐渐过渡到电路实现部分。

1. 双端LC滤波器(Doubly-Terminated LC Filter)

1.1 阻抗匹配(Impedance Matching)

在早期的电话通信系统中，我们想要拥有最大的功率传输效率。根据欧姆定律，我们知道阻抗匹配可以使得信号的功率最大化。如果读者有关于微波电路的基础知识，那么你应该知道仅仅使得输入和输出端实阻抗相等是不够的，因为输入和输出端的感抗元件会修改相位。

如果说输入端和输出端的阻抗是实数，那么令它们相等（阻抗匹配）就可以实现最大功率传输：

$$P = \frac{V^2}{4R}$$

如果说输出端和输入端的阻抗是复数，那么我们需要让输入端和输出端的阻抗满足共轭关系。如果我们无法实现共轭关系，那么我们需要让输入端和输出端的阻抗满足模长相等。

要使得两端阻抗成共轭关系，我们需要在输入端和输出端都引入一个匹配网络(Matching Network) 。匹配网络的作用是将输入端和输出端的阻抗变换为相同的共轭阻抗。一般来说，匹配网络会优先使用感抗元件（电容，电感）而非阻抗元件（电阻），因为感抗元件只消耗感性功率而不消耗有功功率，因此不消耗实际能量。

需要注意的是，阻抗匹配不一定能实现最大功率传输。最大功率传输的条件是输入端和输出端的阻抗完全相等，而阻抗匹配仅仅是使得输入端和输出端的阻抗模长相等。

1.2 双端LC滤波器的基本结构

滤波器的建立和匹配网络十分相似。我们把下面的电路称为双端LC滤波器(Doubly-Terminated LC Filter) ，也被称为双端LC标准型(Doubly-Terminated LC Canonical Form) ，或者双端LC梯形网络(Doubly-Terminated LC Ladder) 。它的基本结构如下：

使用双端LC滤波器的标准形式，我们可以实现任何全极点滤波器。需要注意的是，感抗元件的数量对应于滤波器的极点数量——对于全极点滤波器而言，这就是系统的阶数。

那么接下来的问题就是，假如说我们给定一个滤波器的传递函数，我们要如何设计双端LC滤波器的感抗元件呢？直觉上来说，我们可以建立这个二端网络的阻抗矩阵：

$$Z = \begin{bmatrix} Z_{11} & Z_{12} \\ Z_{21} & Z_{22} \end{bmatrix}$$

然后我们使用分压原理来计算输入端和输出端的阻抗，从而计算出传递函数。然而这样做十分复杂，因此我们需要一个更简单的方法。

1.3 无损网络(Lossless Network)，功率传输，反射系数（Reflection Coefficient）

如果说一个网络仅由感抗元件组成，这个网络也被称为无损网络(Lossless Network) ，因为不会有任何的功率耗散在网络里。

我们将传输给负载电阻的功率定义为$P_a$，那么由于无损网络的特性，我们可以列出以下等式：

$$\begin{aligned} P_a &= \frac{V_o^2}{Z_0} = \frac{|H(j\omega)|^2 V_i^2}{Z_0} \\ &= \frac{V_i^2}{|Z_0 + Z_{in}(s)|^2}\Re(Z_{in}(s)) \end{aligned}$$

我们的目标是找到$Z_{in}(s)$，使得上述等式成立。我们知道，最大功率在输入端和输出端的阻抗相等时实现，此时输出端的功率为：

$$P_{max} = \frac{V_i^2}{4Z_0}$$

如果我们将最大传输功率与实际传输功率取差：

$$\begin{aligned} P_{max} - P_a &= \frac{V_i^2}{4Z_0} - \frac{V_i^2}{|Z_0 + Z_{in}(s)|^2}\Re(Z_{in}(s)) \\ &= \frac{V_i^2}{4Z_0} - \frac{V_i^2(Z_{in}+ Z_{in}^*)}{2(Z_0 + Z_{in})(Z_0 + Z_{in}^*)} \\ &= \frac{V_i^2}{4Z_0} \frac{Z_0^2 - Z_0Z_{in}^* - Z_{in}Z_0 + Z_{in}Z_{in}^* }{(Z_0 + Z_{in})(Z_0 + Z_{in}^*)} \\ &= \frac{V_i^2}{4Z_0} \frac{(Z_{in} - Z_0)(Z_{in}^* - Z_0)}{(Z_0 + Z_{in})(Z_0 + Z_{in}^*)} \end{aligned}$$

如果我们定义：

$$\Gamma = \frac{Z_{in} - Z_0}{Z_{in} + Z_0}$$

那么我们可以得到：

$$P_{max} - P_a = \frac{V_i^2}{4Z_0} \Gamma \Gamma^* = \frac{V_i^2}{4Z_0} |\Gamma|^2$$

我们称$\Gamma$为反射系数(Reflection Coefficient) ，它表示了输入端和输出端的阻抗不匹配程度。反射系数的模长越大，表示输入端和输出端的阻抗越不匹配。当阻抗完全匹配时，反射系数为0，此时功率传输最大。

$$P_a = (1 - |\Gamma|^2) P_{max}$$

回顾我们的功率传输，

$$\frac{V_i^2}{4Z_0} (1 - |\Gamma|^2) = \frac{V_i^2 |H(j\omega)|^2}{Z_0}$$

我们可以得到：

$$|H(j\omega)|^2 = \frac{1 - |\Gamma|^2}{4}$$

由于$|\Gamma| > 0$，因此这种低通滤波器的最大DC增益只能达到$\frac{1}{2}$。

1.4 通带，阻带与反射系数

比较上述反射系数与幅度响应的关系，我们可以得出以下结论：

在通带中 ，反射系数的模长$|\Gamma|$接近0，因此幅度响应$|H(j\omega)|$接近1。
- 此时，全部的功率都传输到负载电阻上。
在阻带中 ，反射系数的模长$|\Gamma|$接近1，因此幅度响应$|H(j\omega)|$接近0。
- 此时，全部的功率都被反射回输入端，没有传输到负载电阻上。

这符合我们的直觉，因为在通带中，信号可以通过滤波器传输到负载，而在阻带中，信号被滤波器阻挡，无法传输到负载。

1.5 例：三阶巴特沃斯滤波器

让我们来看一个具体的例子。我们设计这样一个双端LC滤波器，使得它的传递函数为三阶巴特沃斯滤波器。我们归一化截止频率与阻抗。

首先我们可以列出以下方程：

$$|H(j\omega)|^2 = \frac{1}{1 + \omega^6} \cdot \frac{1}{4} = \frac{1 - |\Gamma|^2}{4}$$

那么，

$$|\Gamma|^2 = \frac{\omega^6}{1 + \omega^6} = \Gamma(j\omega)\Gamma(-j\omega)$$

因此我们可以推得，

$$\Gamma(s) = \frac{s^3}{(s+1)(s^2 + s + 1)}$$

使用归一化条件$Z_0 = 1$，我们可以求得输入阻抗：

$$Z_{in}(s) = \frac{2s^3 + 2s^2 + 2s + 1}{2s^2 + 2s + 1}$$

现在让我们来思考一个问题。以下的两个网络都是三阶标准型，我们应该选择哪一个来实现我们的滤波器？

答案应该是左边的网络，因为当频率趋近于无穷大时，我们希望阻抗趋近于无穷大，而只有左边的网络才能满足这个条件。右边的网络在高频时的阻抗趋近于0，这会导致信号被短路。

在$s \to \infty$时，阻抗大小趋近于1，因此可以求得第一个电感的值为1H。移除掉这个电感，再对接下来的电路进行同样的分析，我们可以得到第二个电容的值为2F，第二个电感的值为1H，而最后的负载电阻值为1Ω，正如我们所期望的。

1.6 小结

在这个例子中，我们展示了如何使用双端LC滤波器的标准形式来实现一个三阶巴特沃斯滤波器。我们通过计算反射系数和输入阻抗来设计滤波器的感抗元件，并确保在高频时阻抗趋近于无穷大。这个方法依然可以用在第一类切比雪夫滤波器的设计上，但是对于第二类切比雪夫滤波器，由于有限的零点，我们需要引入谐振回路(Resonant Tank) 来实现有限的零点。我们将在下一节中讨论这个例子。

主动滤波器(3)：第二类切比雪夫滤波器

Sun, 13 Jul 2025 11:15:41 +0800

引言

在前一节中，我们详细讨论了切比雪夫滤波器的设计原理与数学推导。简言之，切比雪夫滤波器通过允许通带(passband) 内的纹波来实现更陡峭的过渡带，从而在给定的阻带衰减要求下，使用更少的元件。我们可以把切比雪夫滤波器视为巴特沃斯滤波器的一种推广。

然而，切比雪夫滤波器和巴特沃斯滤波器对阻带(stopband) 没有任何特殊要求，这也就是说这两个滤波器专注于设计通带的特性，而阻带的特性是伴随而来的。第二类切比雪夫滤波器(Inverse Chebyshev Filter) 则是切比雪夫滤波器的一个变种，它允许在阻带内引入纹波，从而进一步优化通带和阻带的特性；然而通带的特性便是伴随而来的。

1. 第二类切比雪夫滤波器的基本特性

第二类切比雪夫滤波器，也被称为反切比雪夫滤波器(Inverse Chebyshev Filter) ，是切比雪夫滤波器的一个变种。它的幅度响应如下：

可以看到，第二类切比雪夫滤波器在阻带内引入了纹波。第二类切比雪夫滤波器相比切比雪夫滤波器而言，有了一个本质上的区别：由于阻带内的纹波，我们需要在阻带内引入零点。也就是说，第二类切比雪夫滤波器的传递函数不再是一个全极点滤波器。我们将在后面的章节中看到，这个区别将导致第二类切比雪夫滤波器在电路实现上有显著区别。

1.1 主要特点总结

通带特性：最大平坦（类似巴特沃斯滤波器）
阻带特性：等纹波响应
传递函数：既有极点也有零点
滚降速度：由于阻带零点的存在，比巴特沃斯滤波器更快

2. 第二类切比雪夫滤波器的数学推导

2.1 幅度响应函数的构造

我们同样来构造幅度响应的分母函数。与第一类切比雪夫滤波器类似，我们定义：

$$|H(j\omega)|^2 = \frac{1}{1 + \frac{1}{\epsilon^2} F(\omega^2)}$$

$F$函数应该长得像这样：

可以发现，如果经过x轴和y轴的反函数变换，$F$函数可以直接用切比雪夫多项式来表示：

$$F(\omega^2) = \frac{1}{T_n^2\left(\frac{1}{\omega}\right)}$$

2.2 完整的幅度响应表达式

因此，第二类切比雪夫滤波器的幅度响应可以表示为：

$$|H(j\omega)|^2 = \frac{\epsilon^2 T_n^2\left(\frac{1}{\omega}\right)}{1 + \epsilon^2 T_n^2\left(\frac{1}{\omega}\right)}$$

或者等价地写成：

$$|H(j\omega)|^2 = \frac{1}{1 + \frac{1}{\epsilon^2 T_n^2\left(\frac{1}{\omega}\right)}}$$

2.3 极点的求解

如果要求得第二类切比雪夫滤波器的极点，我们需要解以下方程：

$$1 + \frac{1}{\epsilon^2 T_n^2\left(\frac{s}{j}\right)} = 0$$

即：

$$T_n\left(\frac{s}{j}\right) = \pm \frac{j}{\epsilon}$$

利用第一类切比雪夫滤波器的极点结果，通过变换$s \to \frac{1}{s}$，我们可以得到第二类切比雪夫滤波器的极点。

如果第一类切比雪夫滤波器的极点为$p_k$，则第二类切比雪夫滤波器的极点为：

$$p_{k,inv} = \frac{1}{p_k}$$

3. 设计实例：三阶第二类切比雪夫滤波器

3.1 极点的计算

让我们来设计一个三阶的第二类切比雪夫滤波器。

首先，从上一节可以求得，三阶切比雪夫滤波器的三个极点为：

$$\begin{aligned} p_1, p_2 &= -\sin\frac{\pi}{6} \sinh\left(\frac{1}{3}\sinh^{-1}\frac{1}{\epsilon}\right) \\ &\quad \pm j \cos\frac{\pi}{6} \cosh\left(\frac{1}{3}\sinh^{-1}\frac{1}{\epsilon}\right) \\ p_3 &= -\sin\frac{\pi}{2} \sinh\left(\frac{1}{3}\sinh^{-1}\frac{1}{\epsilon}\right) \end{aligned}$$

切比雪夫滤波器的分母函数因此可以表示为：

$$\begin{aligned} D(s) &= \left(1 - \frac{s}{p_1}\right)\left(1 - \frac{s}{p_2}\right)\left(1 - \frac{s}{p_3}\right) \\ &= 1 + a_1 s + a_2 s^2 + a_3 s^3 \end{aligned}$$

那么对于第二类切比雪夫滤波器，它的分母就应该是：

$$D_{inv}(s) = s^3 + a_1 s^2 + a_2 s + a_3$$

3.2 零点的计算

除了极点之外，我们还需要求得第二类切比雪夫滤波器的零点。零点满足：

$$\epsilon^2 T_n^2\left(\frac{1}{\omega_z}\right) = 0$$

在这个例子中：

$$T_3\left(\frac{1}{\omega_z}\right) = \cos\left(3 \cos^{-1}\left(\frac{1}{\omega_z}\right)\right) = 0$$

因此，零点满足：

$$3 \cos^{-1}\left(\frac{1}{\omega_z}\right) = \frac{(2k+1)\pi}{2}, \quad k = 0, 1, 2$$

即：

$$\frac{1}{\omega_z} = \cos\left(\frac{(2k+1)\pi}{6}\right), \quad k = 0, 1, 2$$

计算得到零点为：

$$\begin{aligned} \omega_{z1}, \omega_{z2} &= \pm \frac{1}{\cos\left(\frac{\pi}{6}\right)} = \pm \frac{2}{\sqrt{3}} \\ \omega_{z3} &= \frac{1}{\cos \left(\frac{\pi}{2}\right)} = \infty \end{aligned}$$

因此我们可以看到三阶第二类切比雪夫滤波器有两个有限的零点和一个无限零点。

3.3 传递函数的完整形式

三阶第二类切比雪夫滤波器的传递函数可以写成：

$$H(s) = K \frac{s^2 + \omega_{z1}^2}{s^3 + a_1 s^2 + a_2 s + a_3}$$

其中$K$是增益常数，通过归一化条件确定。

4. 偶数阶第二类切比雪夫滤波器的特殊性质

对于偶数阶第二类切比雪夫滤波器而言，在$\omega \to \infty$的时候，幅度响应并不会趋近于0，而是会趋近于：

$$\lim_{\omega \to \infty} |H(j\omega)| = \frac{\epsilon}{\sqrt{1 + \epsilon^2}}$$

这个特性带来了一个重要的工程后果：我们将无法使用被动元件实现偶数阶第二类切比雪夫滤波器。这是因为被动LC滤波器在高频时的增益必须趋于零，而偶数阶反切比雪夫滤波器在高频时具有非零增益。

5. 第二类切比雪夫滤波器的通带分析

第二类切比雪夫滤波器在通带中会有怎样的特性？我们通过使$\omega \to 0$来分析通带的特性。

在$\omega \to 0$时，以下近似成立：

$$\cos^{-1}\frac{1}{\omega} \approx \ln\frac{2}{\omega}$$$$\cosh\left(n \cos^{-1}\frac{1}{\omega}\right) \approx \frac{1}{2} e^{n \ln\frac{2}{\omega}} = \frac{1}{2}\left(\frac{2}{\omega}\right)^n$$

因此：

$$\begin{aligned} |H(j\omega)|^2 &\approx \frac{1}{1 + \frac{1}{\epsilon^2 \left(\frac{2}{\omega}\right)^{2n}}} \\ &= \frac{1}{1 + \frac{\omega^{2n}}{4^n \epsilon^2}} \\ &= \frac{1}{1 + \frac{1}{4^n \epsilon^2}\omega^{2n}} \end{aligned}$$

这表明在通带中第二类切比雪夫滤波器趋近于巴特沃斯滤波器的响应形式，但是由于阻带的零点，滚降速度会更快。

6. 椭圆函数滤波器简介

我们看到在通带和在阻带中引入纹波可以改善滤波器的滚降速度。是否能够在通带和阻带中都引入纹波，从而进一步改善滚降速度呢？答案是肯定的，这就是椭圆函数滤波器(Elliptic Filter) 。

椭圆函数滤波器的主要特点：

通带：等纹波响应
阻带：等纹波响应
滚降速度：在所有滤波器类型中最陡峭
复杂性：数学推导最为复杂，需要椭圆积分理论

椭圆函数滤波器允许在通带和阻带中都引入纹波，从而实现最陡峭的滚降速度。椭圆函数滤波器的数学推导涉及椭圆积分和雅可比椭圆函数，较为复杂，我们这里不再详细展开。

7. 总结与对比

7.1 四种经典滤波器的比较

滤波器类型	通带特性	阻带特性	滚降速度	实现复杂度
巴特沃斯	最大平坦	单调下降	最慢	最简单
切比雪夫I	等纹波	单调下降	较快	中等
切比雪夫II	最大平坦	等纹波	较快	中等
椭圆函数	等纹波	等纹波	最快	最复杂

7.2 设计选择指南

巴特沃斯滤波器：适用于对通带平坦度要求极高的应用
第一类切比雪夫滤波器：适用于对阻带衰减要求较高，可以容忍通带纹波的应用
第二类切比雪夫滤波器：适用于对通带平坦度要求较高，可以容忍阻带纹波的应用
椭圆函数滤波器：适用于对滚降速度要求极高，可以容忍通带和阻带纹波的应用

下图展示了四种滤波器的典型幅度响应：

8. 常用的切比雪夫多项式表

阶数$n$	切比雪夫多项式 $T_n(x)$
1	$x$
2	$2x^2 - 1$
3	$4x^3 - 3x$
4	$8x^4 - 8x^2 + 1$
5	$16x^5 - 20x^3 + 5x$
6	$32x^6 - 48x^4 + 18x^2 - 1$
7	$64x^7 - 112x^5 + 56x^3 - 7x$

8.1 递推关系

切比雪夫多项式满足以下递推关系：

$$T_{n+1}(x) = 2xT_n(x) - T_{n-1}(x)$$

其中：

$T_0(x) = 1$
$T_1(x) = x$

这个递推关系为高阶切比雪夫多项式的计算提供了便利。

在下一篇中，我们将讨论这些滤波器的实际电路实现方法。

主动滤波器(2)：切比雪夫近似

Sat, 12 Jul 2025 20:02:28 +0800

在上一篇中，我们讨论了巴特沃斯近似的设计方法及其特性。在深入探讨新的滤波器设计方法之前，我们需要首先分析一个关键问题：

1. 巴特沃斯滤波器的局限性分析

巴特沃斯滤波器因其设计简单而在滤波器工程中被广泛采用。然而，在某些应用场景下，巴特沃斯滤波器并非最优选择。

对于一个N阶滤波器，物理实现所需的最少无源元件（电感或电容）数量为N个。在实际工程设计中，通常给定通带最小增益和阻带最大增益的技术指标。巴特沃斯滤波器在通带内具有最大平坦特性，但这一特性在满足给定规格时可能过于严格。

如本文将要证明的，切比雪夫滤波器通过在通带内引入可控的纹波，能够以更低的阶数实现相同的阻带衰减性能。这一特性在早期模拟滤波器设计中具有重要意义，因为较少的元件意味着更低的成本和更小的体积。即使在现代集成电路设计中，减少电容和电感的使用仍然意味着更小的芯片面积和更低的功耗。

2. 第一类切比雪夫滤波器

2.1 设计原理与幅度响应特性

切比雪夫滤波器的典型幅度响应如下图所示：

我们引入参数$\epsilon$来控制通带内的纹波幅度。定义滤波器的幅度平方响应为：

$$|H(j\omega)|^2 = \frac{1}{D(\omega^2)} = \frac{1}{1 + \epsilon^2 F_1(\omega^2)}$$

其中$F_1(\omega^2)$的理想特性函数应具有如下形式：

2.2 切比雪夫多项式的数学推导

2.2.1 插值法构造多项式

基于$F_1$在$0$、$\omega_2$、$\omega_4$处的零点，可以构造如下多项式插值：

$$F_1(\omega^2) = k^2 \omega^2 (\omega^2 - \omega_2^2)^2(\omega^2 - \omega_4^2)^2$$

对于$F_1(\omega^2) - 1$，考虑到其在$\omega_1$、$\omega_3$、$1$处的零点，可得：

$$F_1(\omega^2) - 1 = k^2 (\omega^2 - \omega_1^2)(\omega^2 - \omega_3^2)(\omega^2 - 1)$$

设$C_5(\omega)^2 = F_1(\omega^2)$，则：

$$C_5(\omega) = k \omega (\omega^2 - \omega_2^2)(\omega^2 - \omega_4^2)$$

由于$C_5$在$\omega_1$、$\omega_3$处取得极值，对其求导可得：

$$\frac{dC_5(\omega)}{d\omega} = k_{unknown}(\omega^2 - \omega_1^2)(\omega^2 - \omega_3^2)$$

2.2.2 微分方程的建立与求解

当$\omega \to \infty$时，$C_5(\omega) \approx k \omega^5$，因此其导数的主导项系数为$5k$，从而$k_{unknown} = 5k$。

比较$F_1 - 1$的两种表达式，可建立如下微分方程：

$$(\frac{1}{5}\frac{dC_5}{d\omega})^2 = \frac{C_5^2 - 1}{\omega^2 - 1}$$

整理得：

$$\frac{dC_5}{\sqrt{C_5^2 - 1}} = \frac{5 d\omega}{\sqrt{\omega^2 - 1}}$$

对两边积分：

$$ \begin{aligned} \int \frac{dC_5}{\sqrt{C_5^2 - 1}} &= \int \frac{5 d\omega}{\sqrt{\omega^2 - 1}} \\ \cosh^{-1}(C_5) &= 5 \cosh^{-1}(\omega) + C_{constant} \end{aligned} $$

应用边界条件$C_5(1) = 1$，得$C_{constant} = -5 \cosh^{-1}(1) = 0$。

因此：

$$C_5(\omega) = \cosh(5\cosh^{-1}(\omega))$$

2.2.3 一般化结果

将上述结果推广至n阶情况：

$$C_n(\omega) = \cosh(n \cosh^{-1}(\omega))$$

2.3 切比雪夫多项式的完整定义

2.3.1 三角函数与双曲函数的关系

根据欧拉公式和双曲函数的定义，可得关键关系：

$$\cosh(jx) = \cos(x), \quad \cos(jx) = \cosh(x)$$

对于反函数的关系：

当$x > 1$时：$\cos^{-1}(x) = j \cosh^{-1}(x)$
当$0 < x < 1$时：$\cosh^{-1}(x) = j \cos^{-1}(x)$

2.3.2 切比雪夫多项式的定义

基于上述关系，切比雪夫多项式的完整定义为：

$$T_n(\omega) = \begin{cases} \cosh(n \cosh^{-1}(\omega)), & \text{当 }\omega \geq 1 \\ \cos(n \cos^{-1}(\omega)), & \text{当 }0 \leq \omega \leq 1 \end{cases}$$

2.3.3 多项式性质的验证

对于$|\omega| \leq 1$的情况，可以验证：

$$ \begin{aligned} T_1(\omega) &= \cos(\cos^{-1}(\omega)) = \omega \\ T_2(\omega) &= \cos(2\cos^{-1}(\omega)) = 2\omega^2 - 1 \end{aligned} $$

通过三角恒等式（倍角公式），可证明$T_n(\omega)$确实是$\omega$的n次多项式。

2.4 切比雪夫滤波器的工程术语

这种多项式以俄国数学家帕夫努季·切比雪夫（Pafnuty Chebyshev）命名，称为切比雪夫多项式。基于切比雪夫近似的滤波器称为切比雪夫滤波器，在文献中也常见以下术语：

等纹波滤波器（Equal Ripple Filter）
- 因其通带内的最大纹波幅度恒定
最小最大值滤波器（Minimax Filter）
- 基于切比雪夫定理：在所有可能的n次多项式中，切比雪夫多项式使得其在给定区间内的最大偏差最小

3. 切比雪夫多项式的重要性质

3.1 偶数阶与奇数阶的区别

偶数阶切比雪夫多项式和奇数阶的最大最小值分布存在显著差异。

例题：四阶切比雪夫多项式在角频率为0时的值

$$T_4(0) = \cos(4\cos^{-1}(0)) = \cos(4 \times \frac{\pi}{2}) = \cos(2\pi) = 1$$

重要结论：

偶数阶切比雪夫多项式：在$\omega = 0$处取最大值
奇数阶切比雪夫多项式：在$\omega = 0$处取最小值

这导致偶数阶切比雪夫滤波器在直流处的幅度响应不为1，而是$\frac{1}{\sqrt{1 + \epsilon^2}}$，如下图所示：

3.2 通带极值点的分布

3.2.1 通带最大值

通带最大值对应于切比雪夫多项式的最小值：

$$\cos(n \cos^{-1}(\omega)) = 0$$

解得：

$$\omega = \cos\left(\frac{(2m + 1)\pi}{2n}\right), \quad m = 0, 1, \ldots, n - 1$$

3.2.2 通带最小值

通带最小值对应于切比雪夫多项式的最大值：

$$\cos(n \cos^{-1}(\omega)) = 1$$

解得：

$$\omega = \cos\left(\frac{2m\pi}{n}\right), \quad m = 0, 1, \ldots, n - 1$$

3.3 纹波与阻带衰减的关系

通过允许一定的纹波，切比雪夫滤波器能够在通带内实现更陡峭的滚降特性。纹波越大，阻带衰减也越大：

特殊情况：当$\epsilon = 0$时，切比雪夫滤波器退化为巴特沃斯滤波器。

4. 切比雪夫滤波器的极点分析

4.1 极点方程的建立

切比雪夫滤波器的极点满足：

$$1 + \epsilon^2 T_n^2(\omega) = 0$$

即：

$$\cos(n \cos^{-1}(\frac{s}{j})) = \pm \frac{j}{\epsilon}$$

4.2 极点的求解过程

设$n \cos^{-1}(\frac{s}{j}) = u + jv$，其中$u, v \in \mathbb{R}$。

使用三角恒等式：

$$\cos(u + jv) = \cos u \cosh v - j \sin u \sinh v = \pm \frac{j}{\epsilon}$$

分离实部和虚部：

$$\begin{cases} \cos u \cosh v = 0 \\ -\sin u \sinh v = \pm \frac{1}{\epsilon} \end{cases}$$

由于$u$和$v$是实数，要使实部为0，必须满足：

$$u = (2m + 1) \frac{\pi}{2}, \quad m = 0, 1, 2, \ldots$$

这使得$\sin u = (-1)^m$。解第二个方程：

$$v = \pm \sinh^{-1}\frac{1}{\epsilon}$$

4.3 极点的最终表达式

代入得到极点的形式：

$$s = \pm \sin\left(\frac{(2m + 1) \pi}{2n}\right) \sinh\left(\frac{1}{n}\sinh^{-1}\frac{1}{\epsilon}\right) + j \cos\left(\frac{(2m + 1) \pi}{2n}\right) \cosh\left(\frac{1}{n}\sinh^{-1}\frac{1}{\epsilon}\right)$$

4.4 极点的几何分布

极点的实部和虚部分别为：

$$\begin{cases} \sigma_0 = \pm \sin\left(\frac{(2m + 1) \pi}{2n}\right) \sinh\left(\frac{1}{n}\sinh^{-1}\frac{1}{\epsilon}\right) \\ \omega_0 = \cos\left(\frac{(2m + 1) \pi}{2n}\right) \cosh\left(\frac{1}{n}\sinh^{-1}\frac{1}{\epsilon}\right) \end{cases}$$

可以得到椭圆方程：

$$\frac{\sigma_0^2}{\sinh^2\left(\frac{1}{n}\sinh^{-1}\frac{1}{\epsilon}\right)} + \frac{\omega_0^2}{\cosh^2\left(\frac{1}{n}\sinh^{-1}\frac{1}{\epsilon}\right)} = 1$$

这表明切比雪夫滤波器的极点分布在椭圆上，椭圆的半长轴和半短轴分别为：

半长轴（虚轴）：$\cosh\left(\frac{1}{n}\sinh^{-1}\frac{1}{\epsilon}\right)$
半短轴（实轴）：$\sinh\left(\frac{1}{n}\sinh^{-1}\frac{1}{\epsilon}\right)$

特殊情况：当$\epsilon \to 0$时，椭圆退化为单位圆，对应巴特沃斯滤波器。

以下是5阶切比雪夫滤波器的极点分布示意图（$\epsilon = 0.1$）：

在下一篇中，我们将讨论切比雪夫滤波器的变种，即第二类切比雪夫滤波器，也被称为反切比雪夫滤波器。

主动滤波器(1)：巴特沃斯近似

Sat, 12 Jul 2025 19:42:37 +0800

前言

最近在学习IIT Madras的EE534课程，该课程由Shanthi Pavan教授讲授，专注于主动滤波器设计。课程内容丰富，涵盖了实用的设计方法、严谨的理论推导以及宝贵的工程直觉。为了巩固学习成果，我将撰写一系列笔记来记录这些重要内容。正如费曼学习法所强调的，通过教授他人来验证和深化自己的理解是最有效的学习方式。

基本概念

在深入学习主动滤波器设计之前，我们首先需要理解一些关键概念：

主动元件（Active Element）：能够提供功率增益的元件，如晶体管、运算放大器、电流源和电压源等。
被动元件（Passive Element）：不能提供功率增益的元件，主要包括电阻、电容和电感等。
滤波器（Filter）：根据频率特性选择性地放大或衰减信号的电路系统。
线性滤波器（Linear Filter）：输出信号是输入信号线性组合的滤波器。
品质因子（Quality Factor, Q）：表征滤波器频率选择性的参数，通常由中心频率与带宽的比值定义。
零点（Zero）：使传递函数分子为零的复频率点，通常对应滤波器的增益极值。
极点（Poles）：使传递函数分母为零的复频率点，决定滤波器的稳定性和频率响应特性。

为什么需要使用RLC电路而非简单的RC或RL电路？

仅使用RC或RL电路时，无法实现共轭极点对的设计，因此无法获得高品质因子滤波器所需的陡峭过渡特性。这是因为一阶RC或RL电路只能产生实数极点，而复数共轭极点对是实现高阶滤波器特性的关键。

主动滤波器概述

主动滤波器理论起源于1920-1950年代的早期滤波器研究，其核心目标是设计能够用多项式精确逼近理想滤波器传递函数的电路。主动滤波器通过运算放大器与被动元件（电阻、电容）的组合来实现滤波功能，避免了传统LC滤波器中电感器的使用。

这一理论催生了网络合成理论的发展，为被动网络的系统性分析和设计提供了坚实的理论基础。

滤波器按频率特性可分为低通、高通、带通和带阻滤波器等多种类型。值得注意的是，所有类型的滤波器都可以通过频率变换从低通滤波器原型导出，因此在系统设计层面，我们主要关注低通滤波器的设计即可。

传递函数的基本性质

理想低通滤波器

一个理想低通滤波器的传递函数应该具有以下特性：

$$H(s) = \frac{N(s)}{D(s)} = \begin{cases} 1 & \text{当 } 0 < \omega < 1 \\ 0 & \text{当 } \omega > 1 \end{cases}$$

当我们将复变量 $s$ 替换为 $j\omega$ 时，传递函数就转化为频率响应函数。

传递函数的共轭性质

对于实值传递函数，以下重要性质成立：

$$|H(j\omega)|^2 = H(j\omega)H^*(j\omega) = H(j\omega)H(-j\omega)$$

这意味着给定传递函数，我们可以直接计算其幅度函数。

从幅度函数重构传递函数

反过来，如果我们已知幅度函数并希望求得传递函数，可以通过将 $\omega$ 替换为 $s/j$ 来实现。然而，这种方法需要求解二次方程，并且需要合理选择根。

幸运的是，对于物理可实现的传递函数，以下约束条件必须满足：

零点：可以位于复平面的任意位置
极点：必须位于左半平面（$\text{Re}(s) < 0$），否则系统将不稳定

对于实值传递函数，其幅度函数是偶函数，相位函数是奇函数。

我们将在后面的笔记中说明，实际上对于纯LC滤波器，传递函数的极点和零点分布具有更严格的几何特性。

全极点滤波器

如果传递函数在有限频率范围内没有零点，则称为全极点滤波器（All-Pole Filter）。这类滤波器的设计和分析相对简单，是滤波器设计的重要基础。

巴特沃斯滤波器（Butterworth Filter）

设计思路与幅度函数

对于全极点滤波器，我们假设其传递函数为：

$$|H(j\omega)|^2 = \frac{1}{D(j\omega)D(-j\omega)} = \frac{1}{D_1(\omega^2)}$$

需要注意的是，如果传递函数是 $N$ 阶的，那么其幅度平方函数将是 $\omega^2$ 的 $N$ 次有理函数。

我们希望 $D_1(\omega^2)$ 满足以下理想特性：

$$ D_1(\omega^2) = \begin{cases} 1 & \text{当 } 0 < \omega < 1 \\ \infty & \text{当 } 1 < \omega < \infty \end{cases} $$

多项式逼近

设：

$$D_1(\omega^2) = 1 + F(\omega^2)$$

则我们需要：

$$\begin{equation} F(\omega^2) = \begin{cases} 0 & \text{当 } 0 < \omega < 1 \\ \infty & \text{当 } 1 < \omega < \infty \end{cases} \end{equation}$$

由于系统的线性特性，传递函数必须是有理函数，因此：

$$F(\omega^2) = f_0 + f_1\omega^2 + f_2\omega^4 + \cdots + f_N\omega^{2N}$$

最大平坦逼近

在 $\omega \to 0$ 时，高阶项趋于零，因此：

$$F(\omega^2) \approx f_1\omega^2$$

为使 $F(\omega^2) = 0$，我们需要 $f_1 = 0$。

通过类似的分析，可以得出 $f_2 = f_3 = \cdots = f_{N-1} = 0$。

但是 $f_N$ 不能为零，否则滤波器将失去滤波特性。因此，我们能够实现的最佳逼近是：

$$F(\omega^2) = f_N\omega^{2N}$$

巴特沃斯响应

取 $f_N = 1$，我们得到：

$$|H(j\omega)|^2 = \frac{1}{1 + \omega^{2N}}$$

这就是著名的巴特沃斯滤波器 的幅度平方函数。

当 $\omega = 1$ 时，$|H(j\omega)|^2 = \frac{1}{2}$，对应 -3dB 点，这定义了巴特沃斯滤波器的截止频率。

这种设计在 $\omega = 0$ 处实现了最大平坦 特性，即在通带内具有最平坦的响应。

极点分布分析

现在我们来分析巴特沃斯滤波器的极点分布。由于 $D(s)$ 是 $N$ 阶多项式，我们可以建立以下方程：

$$D(s)D(-s) = 1 + (-s^2)^N = 1 + (-1)^N s^{2N}$$

极点的对称性

如果 $\sigma_0 + j\omega_0$ 是该方程的一个根，那么以下四个点都是方程的解：

$\sigma_0 + j\omega_0$（原根）
$\sigma_0 - j\omega_0$（共轭根）
$-\sigma_0 + j\omega_0$（负根）
$-\sigma_0 - j\omega_0$（共轭负根）

这种对称性对所有有理函数都成立。然而，根据物理可实现性要求，极点必须位于左半平面以保证系统稳定性，因此我们只选择左半平面的极点。

二阶巴特沃斯滤波器示例

例：考虑 $N = 2$ 的情况。

方程 $1 + s^4 = 0$ 的解为：

$$s^4 = -1 = e^{j(\pi + 2k\pi)}, \quad k = 0, 1, 2, 3$$

因此四个根为：

$$s = e^{j\pi/4}, \quad e^{j3\pi/4}, \quad e^{j5\pi/4}, \quad e^{j7\pi/4}$$

转换为直角坐标：

$$s = e^{j\pi/4} = \frac{1}{\sqrt{2}} + j\frac{1}{\sqrt{2}}$$

$$s = e^{j3\pi/4} = -\frac{1}{\sqrt{2}} + j\frac{1}{\sqrt{2}}$$

$$s = e^{j5\pi/4} = -\frac{1}{\sqrt{2}} - j\frac{1}{\sqrt{2}}$$

$$s = e^{j7\pi/4} = \frac{1}{\sqrt{2}} - j\frac{1}{\sqrt{2}}$$

选择左半平面的极点：

$$s_1 = -\frac{1}{\sqrt{2}} + j\frac{1}{\sqrt{2}}, \quad s_2 = -\frac{1}{\sqrt{2}} - j\frac{1}{\sqrt{2}}$$

传递函数的构造

根据极点位置，我们可以构造二阶巴特沃斯滤波器的分母多项式：

$$D(s) = (s - s_1)(s - s_2) = s^2 + \sqrt{2}s + 1$$

滤波器参数分析

将此结果与二阶系统的标准形式 $s^2 + \frac{\omega_n}{Q}s + \omega_n^2$ 对比：

自然频率：$\omega_n = 1$
品质因子：$Q = \frac{1}{\sqrt{2}} \approx 0.707$

与一阶巴特沃斯滤波器相比，二阶滤波器具有更高的品质因子，从而实现更陡峭的滚降特性。

极点分布的几何特性

从极点分布可以看出，巴特沃斯滤波器的极点均匀分布在单位圆上，且仅选择左半平面的极点以确保系统稳定性。这种均匀分布是巴特沃斯滤波器具有最大平坦特性的根本原因。

常用的巴特沃斯滤波器多项式表

阶数 $N$	分母多项式 $D(s)$	极点位置
1	$s + 1$	$-1$
2	$s^2 + \sqrt{2}s + 1$	$-\frac{1}{\sqrt{2}} + j\frac{1}{\sqrt{2}}, -\frac{1}{\sqrt{2}} - j\frac{1}{\sqrt{2}}$
3	$s^3 + 2s^2 + 2s + 1$	$-\frac{1}{\sqrt{3}} + j\frac{1}{\sqrt{3}}, -\frac{1}{\sqrt{3}} - j\frac{1}{\sqrt{3}}, -1$
4	$s^4 + 2\sqrt{2}s^3 + 4s^2 + 2\sqrt{2}s + 1$	$-\frac{1}{2} + j\frac{1}{2}, -\frac{1}{2} - j\frac{1}{2}, -\frac{1}{2} + j\frac{1}{2}, -\frac{1}{2} - j\frac{1}{2}$
5	$s^5 + 2\sqrt{3}s^4 + 5s^3 + 2\sqrt{3}s^2 + 1$	$-\frac{1}{\sqrt{5}} + j\frac{1}{\sqrt{5}}, -\frac{1}{\sqrt{5}} - j\frac{1}{\sqrt{5}}, -1, -\frac{1}{\sqrt{5}} + j\frac{1}{\sqrt{5}}, -\frac{1}{\sqrt{5}} - j\frac{1}{\sqrt{5}}$

可以发现，当阶数为奇数的时候，-1一定是极点之一，而当阶数为偶数时，-1不是极点。而无论阶数如何，都没有极点会落在j轴上。

巴特沃斯滤波器的Bode Plot

相位噪声与抖动的关系

Mon, 07 Jul 2025 20:42:16 +0800

在现代数字通信系统和时钟生成电路中，相位噪声和抖动是两个关键的性能指标。本文将从理论基础出发，深入分析相位噪声与抖动之间的数学关系，并探讨其在工程实践中的应用意义。

1. 理论基础

1.1 随机过程与平稳性

在分析相位噪声与抖动的关系之前，需要建立必要的随机过程理论基础。

1.1.1 平稳过程的定义

平稳过程（Stationary Process） 是指其统计特性不随时间变化的随机过程。对于平稳过程，其均值和方差在时间上保持恒定。

宽平稳过程（Wide-Sense Stationary Process） 是平稳过程的一个重要特例，其定义为：

均值恒定：$E[X(t)] = \mu_X$（常数）
自相关函数仅依赖于时间差：$R_X(t_1, t_2) = R_X(t_2 - t_1)$

白噪声是宽平稳过程的典型例子。

1.1.2 自相关函数

自相关函数（Autocorrelation Function） 描述了随机过程在不同时间点之间的相关性：

$$R_x(\tau) = E[X(t)X(t+\tau)]$$

对于白噪声，自相关函数具有以下特性：

$$R_{\text{white}}(\tau) = \sigma^2 \delta(\tau)$$

其中$\sigma^2$为噪声功率，$\delta(\tau)$为狄拉克函数。

1.2 频域分析理论

1.2.1 功率谱密度

功率谱密度（Power Spectral Density, PSD） 描述了随机过程在频域中的功率分布：

$$S_x(f) = \lim_{T \to \infty} \frac{1}{T} E[|X_T(f)|^2]$$

1.2.2 维纳-辛钦定理

维纳-辛钦定理（Wiener-Khinchin Theorem） 建立了时域自相关函数与频域功率谱密度之间的重要关系：

$$S_x(f) = \int_{-\infty}^{\infty} R_x(\tau) e^{-j2\pi f\tau} d\tau$$$$R_x(\tau) = \int_{-\infty}^{\infty} S_x(f) e^{j2\pi f\tau} df$$

这表明功率谱密度是自相关函数的傅里叶变换。

1.2.3 帕萨瓦尔定理

帕萨瓦尔定理（Parseval’s Theorem） 给出了时域和频域能量的等价关系：

$$\int_{-\infty}^{\infty} |x(t)|^2 dt = \int_{-\infty}^{\infty} S_x(f) df$$

对于周期信号，有：

$$\int_{-\infty}^{\infty} S_x(f) df = T \int_{-\frac{1}{2T}}^{\frac{1}{2T}} |x(t)|^2 dt$$

其中$T$为信号周期。

1.3 时间与相位关系

相位与时间的基本关系由下式给出：

$$\Delta \phi = \omega \Delta t$$

其中：

$\Delta \phi$：相位变化
$\omega$：角频率
$\Delta t$：时间间隔

2. 抖动标准差与相位噪声的数学关系

2.1 绝对抖动的定义

对于矩形波信号，绝对抖动（Absolute Jitter） 定义为信号边沿相对于理想时刻的偏移。

考虑第$k$个周期，如果相位噪声的时域表示为$\phi_n(t_k)$，则实际的相位满足：

$$\omega_0 t_k + \phi_n(t_k) = 2\pi k$$

因此，绝对抖动可以表示为：

$$ \begin{aligned} a_k &= t_k - \frac{2\pi k}{\omega_0} \\ &= \frac{\phi_n(t_k)}{\omega_0} \end{aligned} $$

2.2 小信号近似分析

基于以下工程假设：

绝对抖动是小量
相位噪声在$kT_0$附近缓慢变化

对相位噪声进行一阶泰勒展开：

$$ \begin{aligned} \phi_n(t_k) &\approx \phi_n(kT_0 + a_k) \\ &= \phi_n(kT_0) + \frac{d\phi_n(t)}{dt}\bigg|_{t=kT_0} a_k \end{aligned} $$

将此式代入绝对抖动的表达式：

$$ a_k = \frac{\phi_n(kT_0)}{\omega_0 - \frac{d\phi_n(t)}{dt}\bigg|_{t=kT_0}} $$

在小信号近似下，$\frac{d\phi_n(t)}{dt} \ll \omega_0$，因此：

$$a_k \approx \frac{\phi_n(kT_0)}{\omega_0}$$

2.3 统计特性分析

2.3.1 自相关函数关系

将相位噪声视为宽平稳过程，其自相关函数为：

$$R_{\phi}(\tau) = E[\phi_n(t)\phi_n(t+\tau)]$$

相应地，抖动的自相关函数为：

$$R_{a}(m) = E[a_k a_{k+m}]$$

由于抖动是相位噪声的比例采样结果，可得：

$$R_{a}(m) = \frac{1}{\omega_0^2} R_{\phi}(mT_0)$$

2.3.2 功率谱密度关系

应用维纳-辛钦定理，抖动的功率谱密度与相位噪声功率谱密度之间存在以下关系：

$$S_a(f) = \frac{1}{\omega_0^2} S_{\phi}(f)$$

2.4 抖动方差的计算

假设抖动为零均值过程，其方差（即均方根抖动）可通过以下方式计算：

$$ \begin{aligned} \sigma_a^2 &= R_a(0) \\ &= \frac{1}{\omega_0^2} R_{\phi}(0) \\ &= \frac{1}{\omega_0^2} \int_{-\infty}^{\infty} S_{\phi}(f) df \end{aligned} $$

这是相位噪声与抖动关系的核心结论：

随机抖动的方差等于相位噪声功率谱密度的积分除以角频率的平方。

Mathematics on 四方喫茶舘

Feedback Linearization, Part 3

Feedback Linearization Theorem

MIMO Feedback Linearization

Vector Relative Degree

MIMO Feedback Linearization Theorem

Examples

Ill-defined Vector Relative Degree

Well-defined Vector Relative Degree

Control Law

Feedback Linearization, Part 2

More Facts about IO Linearization

Introduction to Differential Geometry

Manifold

Tangent Space

Tangent Vector

Vector Field

Lie Bracket

Tangent Bundle

Distribution

Feedback Linearizability

Feedback Linearization, Part 1

Motivation

Some Definitions

Control-Affine System

Diffeomorphism

Feedback Linearizable

Lie Derivative

Input-Output Linearization

Relative Degree of Output $y$

Zero Dynamics

Steady-state Error, part 1

Linear System Basics

Initial Value Theorem

Final Value Theorem

Types of Inputs

Unity Feedback System

System Types

The problems with higher order system

Mason's Gain Formula and Control Canonical Forms

Introduction

Mason’s Gain Formula

Signal Flow Graph (SFG)

Example: Type 2 PLL

The Formula

Examples

A Type 2 Charge Pump PLL

A Triple Integrator System

Canonical Forms

Controllable Canonical Form

Observable Canonical Form

Diagonal Form and Jordan Form

Modified Form

Where were we?

Steady-state Error, part 0

Introduction

The Problem Setup

The Temporary Elixir: A Capacitor

Ramp Response

Poisson Point Process (PPP) and Bit Error Rate (BER)

Introduction

Poisson Distribution

Poisson Point Process

How does this relate to bit error rate?

Law of Rare Events

To Answer the Two BER-Related Questions

Beyond Raw BER testing

主动滤波器(9)：频率变换(4)

理查变换 (The Richard’s Transformation)

电路实现

传输线(Transmission Line)理论

主动滤波器(8)：频率变换(3)

柯西-黎曼方程 (Cauchy-Riemann Equations)

充分必要的频率变换

低通-带通变换

低通-高通变换

低通-带阻变换

频率变换类型总结表

主动滤波器(7)：频率变换(2)

特勒根定理（Tellegen’s Theorem）