这是绿皮书刷题记录系列的第 2 篇。

今天涉及的是我很久没有复习过的微积分与线性代数,因此稍微吃力一点。我们先将书中涉及到的所有数学知识点整理一下。

数学知识整理

极限与导数

  • Derivative: Let $ y=f(x) $,then
    $$
    f^{\prime}(x)=\frac{d y}{d x}=\lim _{\Delta x \rightarrow 0} \frac{\Delta y}{\Delta x}=\lim _{\Delta x \rightarrow 0} \frac{f(x+\Delta x)-f(x)}{\Delta x}
    $$

  • The product rule: If $ u=u(x) $ and $ v=v(x) $ and their respective derivatives exist,
    $$
    \frac{d(u v)}{d x}=u \frac{d v}{d x}+v \frac{d u}{d x}, \quad(u v)^{\prime}=u^{\prime} v+u v^{\prime}
    $$

  • The quotient rule:
    $$
    \frac{d}{d x}\left(\frac{u}{v}\right)={v \frac{d u}{d x}-u \frac{d v}{d x} \over v^{2}},\quad\left(\frac{u}{v}\right)^{\prime}=\frac{u^{\prime} v-u v^{\prime}}{v^{2}}
    $$

  • The chain rule: If $ y=f(u(x)) $ and $ u=u(x) $,then $ \frac{d y}{d x}=\frac{d y}{d u} \frac{d u}{d x} $

  • The generalized power rule: $ \frac{d y^{n}}{d x}=n y^{n-1} \frac{d y}{d x} $ for $ \forall n \neq 0 $

  • Some useful equations:

    • $$
      a^{x}=e^{x \ln a} \quad \ln (a b)=\ln a+\ln b \quad e^{x}=\lim _{n \rightarrow \infty}\left(1+\frac{x}{n}\right)^{n}
      $$

    • For any $k$,
      $$
      \lim _{x \rightarrow 0} \frac{\sin x}{x}=1 \quad \lim _{x \rightarrow 0}(1+x)^{k}=1+k x
      $$

    • $ \lim _{x \rightarrow \infty}\left(\ln x / x^{r}\right)=0 $ for any $ r>0 $

    • $ \lim _{x \rightarrow \infty} x^{r} e^{-x}=0 $ for any $ r $

    • $$
      \frac{d}{d x} e^{u}=e^{u} \frac{d u}{d x} \quad \frac{d a^{u}}{d x}=\left(a^{u} \ln a\right) \frac{d u}{d x} \quad \frac{d}{d x} \ln u=\frac{1}{u} \frac{d u}{d x}=\frac{u^{\prime}}{u}
      $$

    • $$
      \frac{d}{d x} \sin x=\cos x, \frac{d}{d x} \cos x=-\sin x, \frac{d}{d x} \tan x=\sec ^{2} x
      $$

  • Local maximum or minimum: suppose that $ f(x) $ is differentiable at $ c $ and is defined on an open interval containing $ c $. If $ f(c) $ is either a local maximum value or a local minimum value of $ f(x) $,then $ f^{\prime}(c)=0 $.

  • Second Derivative test: Suppose the secondary derivative of $ f(x), f^{\prime \prime}(x) $, is continuous near $ c $. If $ f^{\prime}(c)=0 $ and $ f^{\prime \prime}(c)>0 $,then $ f(x) $ has a local minimum at $ c $; if $ f^{\prime}(c)=0 $ and $ f^{\prime \prime}(c)<0 $,then $ f(x) $ has a local maximum at $ c $.

  • L’Hospital’s rule: Suppose that functions $ f(x) $ and $ g(x) $ are differentiable at $ x \rightarrow a $ and that $ \lim _{x \rightarrow a} g^{\prime}(a) \neq 0 $. Further suppose that $ \lim _{x \rightarrow a} f(a)=0 $ and $ \lim _{x \rightarrow a} g(a)=0 $ or that $ \lim _{x \rightarrow a} f(a) \rightarrow \pm \infty $ and $ \lim _{x \rightarrow a} g(a) \rightarrow \pm \infty $,then $ \lim _{x \rightarrow a} \frac{f(x)}{g(x)}=\lim _{x \rightarrow a} \frac{f^{\prime}(x)}{g^{\prime}(x)} $. L’Hospital’s rule converts the limit from an indeterminate form to a determinate form.

积分

  • If $ f(x)=F^{\prime}(x)$,
    $$
    \int_{a}^{b} f(x)=\int_{a}^{b} F^{\prime}(x) d x=[F(x)]_{a}^{b}=F(b)-F(a)
    $$

    $$
    \frac{d F(x)}{d x}=f(x)
    $$

    $$
    F(a)=y_{a} \Rightarrow F(x)=y_{a}+\int_{a}^{x} f(t) d t
    $$

  • The generalized power rule in reverse:
    $$
    \int u^{k} d u=\frac{u^{k+1}}{k+1}+c \quad(k \neq 1)
    $$

    where $ c $ is any constant.

  • Integration by substitution:
    $$
    \int f(g(x)) \cdot g^{\prime}(x) d x=\int f(u) d u $ with $ u=g(x), \quad d u=g^{\prime}(x) d x
    $$

  • Substitution in definite integrals:
    $$
    \int_{a}^{b} f(g(x)) \cdot g^{\prime}(x) d x=\int_{g(a)}^{p(b)} f(u) d u
    $$

  • Integration by parts: $ \int u d v=u v-\int v d u $

偏导数与多重积分

  • Partial derivative:
    $$
    w=f(x, y) \Rightarrow \frac{\partial f}{\partial x}\left(x_{0}, y_{0}\right)=\lim_{\Delta x \rightarrow 0} \frac{f\left(x_{0}+\Delta x, y_{0}\right)-f\left(x_{0}, y_{0}\right)}{\Delta x}=f_{x}
    $$
  • Second order partial derivatives:
    $$
    \frac{\partial^{2} f}{\partial x^{2}}=\frac{\partial}{\partial x}\left(\frac{\partial f}{\partial x}\right), \quad \frac{\partial^{2} f}{\partial x \partial y}=\frac{\partial}{\partial x}\left(\frac{\partial f}{\partial y}\right)=\frac{\partial}{\partial y}\left(\frac{\partial f}{\partial x}\right)
    $$
  • The general chain rule: Suppose that $ w=f\left(x_{1}, x_{2}, \cdots, x_{m}\right) $ and that each of variables $ x_{1}, x_{2}, \cdots, x_{m} $ is a function of the variables $ t_{1}, t_{2}, \cdots, t_{n} $. If all these functions have continuous first-order partial derivatives, then
    $$
    \frac{\partial w}{\partial t_{i}}=\frac{\partial w}{\partial x_{1}} \frac{\partial x_{1}}{\partial t_{i}}+\frac{\partial w}{\partial x_{2}} \frac{\partial x_{2}}{\partial t_{i}}+\cdots+\frac{\partial w}{\partial x_{m}} \frac{\partial x_{m}}{\partial t_{i}}
    $$
    for each $ i, 1 \leq i \leq n $.
  • Changing Cartesian integrals into polar integrals: The variables in two-dimension plane can be mapped into polar coordinates: $ x=r \cos \theta$, $y=r \sin \theta $. The integration in a continuous polar region $ R $ is converted to
    $$
    \iint_{R} f(x, y) d x d y=\iint_{R} f(r \cos \theta, r \sin \theta) r d r d \theta
    $$

重要的微积分方法

  • Taylor’s series: One-dimensional Taylor’s series expands function $ f(x) $ as the sum of a series using the derivatives at a point $ x=x_{0} $ :
    $$
    f(x)=f\left(x_{0}\right)+f^{\prime}\left(x_{0}\right)\left(x-x_{0}\right)+\frac{f^{\prime \prime}\left(x_{0}\right)}{2 !}\left(x-x_{0}\right)^{2}+\cdots+\frac{f^{(n)}\left(x_{0}\right)}{n !}\left(x-x_{0}\right)^{n}+\cdots
    $$
    If $ x_{0}=0$,
    $$
    f(x)=f(0)+f^{\prime}(0) x+\frac{f^{\prime \prime}(0)}{2 !} x^{2}+\cdots+\frac{f^{(n)}(0)}{n !} x^{n}+\cdots
    $$
    Taylor’s series are often used to represent functions in power series terms. For example, Taylor’s series for three common transcendental functions, $ e^{x}, \sin x $ and $ \cos x $,at $ x_{0}=0 $ are
    $$
    \begin{array}{l}
    e^{x}=\sum_{n=0}^{\infty} \frac{1}{n !}=1+\frac{x}{1 !}+\frac{x^{2}}{2 !}+\frac{x^{3}}{3 !}+\cdots, \
    \sin x=\sum_{n=0}^{\infty} \frac{(-1)^{n} x^{2 n+1}}{(2 n+1) !}=x-\frac{x^{3}}{3 !}+\frac{x^{5}}{5 !}-\frac{x^{7}}{7 !}+\cdots, \
    \cos x=\sum_{n=0}^{\infty} \frac{(-1)^{n} x^{2 n}}{(2 n) !}=1-\frac{x^{2}}{2 !}+\frac{x^{4}}{4 !}-\frac{x^{6}}{6 !}+\cdots
    \end{array}
    $$
    The Taylor’s series can also be expressed as the sum of the $ n $ th-degree Taylor polynomial
    $$
    T_{n}(x)=f\left(x_{0}\right)+f^{\prime}\left(x_{0}\right)\left(x-x_{0}\right)+\frac{f^{\prime \prime}\left(x_{0}\right)}{2 !}\left(x-x_{0}\right)^{2}+\cdots+\frac{f^{(n)}\left(x_{0}\right)}{n !}\left(x-x_{0}\right)^{n}
    $$
    and a remainder
    $$
    R_{n}(x): f(x)=T_{n}(x)+R_{n}(x)
    $$
    For some $ \tilde{x} $ between $ x_{0} $ and $ x, R_{n}(x)=\frac{f^{(n+1)}(\tilde{x})}{(n+1) !}\left|x-x_{0}\right|^{n+1} $. Let $ M $ be the maximum of $ \left|f^{(n+1)}(\tilde{x})\right| $ for all $ \tilde{x} $ between $ x_{0} $ and $ x $,we get constraint $ \left|R_{n}(x)\right| \leq \frac{M \times\left|x-x_{0}\right|^{n+1}}{(n+1) !} $.

  • Newton’s method: Newton’s method, also known as the Newton-Raphson method or the Newton-Fourier method, is an iterative process for solving the equation $ f(x)=0 $. It begins with an initial value $ x_{0} $ and applies the iterative step $ x_{n+1}=x_{n}-\frac{f\left(x_{n}\right)}{f^{\prime}\left(x_{n}\right)} $ to solve $ f(x)=0 $ if $ x_{1}, x_{2}, \cdots $ converge.

    Convergence of Newton’s method is not guaranteed, especially when the starting point is far away from the correct solution. For Newton’s method to converge, it is often necessary that the initial point is sufficiently close to the root; $ f(x) $ must be differentiable around the root. When it does converge, the convergence rate is quadratic, which means $ \frac{\left|x_{n+1}-x_{f}\right|}{\left(x_{n}-x_{f}\right)^{2}} \leq \delta<1 $,where $ x_{f} $ is the solution to $ f(x)=0 $.

  • Bisection method: is an intuitive root-finding algorithm. It starts with two initial values $ a_{0} $ and $ b_{0} $ such that $ f\left(a_{0}\right)<0 $ and $ f\left(b_{0}\right)>0 $. Since $ f(x) $ is differentiable, there must be an $ x $ between $ a_{0} $ and $ b_{0} $ that makes $ f(x)=0 $. At each step, we check the sign of $ f\left(\left(a_{n}+b_{n}\right) / 2\right) $. If $ f\left(\left(a_{n}+b_{n}\right) / 2\right)<0 $,we set $ b_{n+1}=b_{n} $ and $ a_{n+1}=\left(a_{n}+b_{n}\right) / 2 $; If $ f\left(\left(a_{n}+b_{n}\right) / 2\right)>0 $,we set $ a_{n+1}=a_{n} $ and $ b_{n+1}=\left(a_{n}+b_{n}\right) / 2 $; If $ f\left(\left(a_{n}+b_{n}\right) / 2\right)=0 $,or its absolute value is within allowable error, the iteration stops and $ x=\left(a_{n}+b_{n}\right) / 2 $. The bisection method converges linearly, $ \frac{x_{n+1}-x_{f}}{x_{n}-x_{f}} \leq \delta<1 $,which means it is slower than Newton’s method. But once you find an $ a_{0} / b_{0} $ pair, convergence is guaranteed.

  • Secant method: It starts with two initial values $ x_{0}, x_{1} $ and applies the iterative step
    $$
    x_{n+1}=x_{n}-\frac{x_{n}-x_{n-1}}{f\left(x_{n}\right)-f\left(x_{n-1}\right)} f\left(x_{n}\right)
    $$
    It replaces the $ f^{\prime}\left(x_{n}\right) $ in Newton’s method with a linear approximation $ \frac{f\left(x_{n}\right)-f\left(x_{n-1}\right)}{x_{n}-x_{n-1}} $. Compared with Newton’s method, it does not require the calculation of derivative $ f^{\prime}\left(x_{n}\right) $,which makes it valuable if $ f^{\prime}(x) $ is difficult to calculate. Its convergence rate is $ (1+\sqrt{5}) / 2 $,which makes it faster than the bisection method but slower than Newton’s method. Similar to Newton’s method, convergence is not guaranteed if initial values are not close to the root.

  • Lagrange multipliers: The method of Lagrange multipliers is a common technique used to find local maximums/minimums of a multivariate function with one or more constraints.

    Let $ f\left(x_{1}, x_{2}, \cdots, x_{n}\right) $ be a function of $ n $ variables $ x=\left(x_{1}, x_{2}, \cdots, x_{n}\right) $ with gradient vector $ \nabla f(x)=\left\langle\frac{\partial f}{\partial x_{1}}, \frac{\partial f}{\partial x_{2}}, \cdots, \frac{\partial f}{\partial x_{n}}\right\rangle $. The necessary condition for maximizing or minimizing $ f(x) $ subject to a set of $ k $ constraints
    $$
    g_{1}\left(x_{1}, x_{2}, \cdots, x_{n}\right)=0, \quad g_{2}\left(x_{1}, x_{2}, \cdots, x_{n}\right)=0, \cdots, \quad g_{k}\left(x_{1}, x_{2}, \cdots, x_{n}\right)=0
    $$
    is that $ \nabla f(x)+\lambda_{1} \nabla g_{1}(x)+\lambda_{2} \nabla g_{2}(x)+\cdots+\lambda_{k} \nabla g_{k}(x)=0 $,where $ \lambda_{1}, \cdots, \lambda_{k} $ are called the Lagrange multipliers.

常微分方程

  • Separable differential equations: A separable differential equation has the form $ \frac{d y}{d x}=g(x) h(y) $. Since it is separable, we can express the original equation as $ \frac{d y}{h(y)}=g(x) d x $. Integrating both sides, we have the solution $ \int \frac{d y}{h(y)}=\int g(x) d x $.

  • First-order linear differential equations: A first-order differential linear equation has the form $ \frac{d y}{d x}+P(x) y=Q(x) $. The standard approach to solving a first-order differential equation is to identify a suitable function $ I(x) $, called an integrating factor, such that
    $$
    I(x)\left(y^{\prime}+P(x) y\right)=I(x) y^{\prime}+I(x) P(x) y=\left(I(x)y \right)^{\prime}
    $$