机器学习——多项式拟合

大家好，欢迎来到IT知识分享网。

文章目录

开发环境jupyter notebook

一、多项式拟合

引入：通过这个示例，从最简单的机器学习的算法入手实现复现流程

假设我们有两个实值变量 $x, t$ ，满足关系： $t = s i n (2 π x) + ϵ$ ，其中 ϵ 是一个服从高斯分布的随机值。
假设我们有 $N$ 组 $(x, t)$ 的观测值 $x≡(x_1,…,x_N)^T,t≡(t_1,…,t_N)^T$ ：

import numpy as np import scipy as sp import matplotlib.pyplot as plt %matplotlib inline N = 10 # 设置 N x_tr = np.linspace(0, 1, N) # 生成 0，1 之间等距的 N 个 数 t_tr = np.sin(2 * np.pi * x_tr) + 0.25 * np.random.randn(N) # 计算 t,引入随机噪声 # 绘图 xx = np.linspace(0, 1, 500) fig, ax = plt.subplots() ax.plot(x_tr, t_tr, 'co') ax.plot(xx, np.sin(2 * np.pi * xx), 'g') ax.set_xlim(-0.1, 1.1) ax.set_ylim(-1.5, 1.5) ax.set_xticks([0, 1]) ax.set_yticks([-1, 0, 1]) ax.set_xlabel("$x$", fontsize="x-large") ax.set_ylabel("$t$", fontsize="x-large") plt.show()

使用这 $N$ 个数据点作为训练集，我们希望得到这个一个模型：给定一个新的输入 $\hat{x}$ ，预测他对应的输出 $\hat{t}$ 。

1.1 多项式函数拟合

我们使用曲线拟合的方法来拟合这样一个多项式函数：
$y(\mathbf{x},\mathbf{w})=w_0+w_1x+w_0+w_2x^2 +\cdots +w_Mx^M = \sum_{j=0}^{M}w_jx^j \tag1$

其中 $M$ 是多项式的阶数， $x^j$ 表示 $x$ 的 $j$ 次方， $w≡(w_0,w_1,…,w_M)$ 表示多项式的系数。

这些多项式的系数可以通过我们的数据拟合得到，即在训练集上最小化一个关于 $y (x, w)$ 和 $t$ 的损失函数。常见的一个损失函数是平方误差和，定义为：
$E(\mathbf{w}) = \frac{1}{2}\sum_{i=1}^{N}\left \{ y(x,\mathbf{w})-t_n \right \}^2 \tag2$

因子 $\frac{1}{2}$ 是为了之后的计算方便加上的。这个损失函数是非负的，当且仅当函数 $y(x,\mathbf{w})$ 通过所有的数据点时才会为零。
对于这个损失函数，因为它是一个关于 $\mathbf{w}$ 的二次函数，其关于 $\mathbf{w}$ 的梯度是一个关于 $w$ 的线性函数，因此我们可以找到一个唯一解 $\mathbf{w}^⋆$ 。
$\frac{\partial E(\mathbf{w})}{w_{j}} = \sum_{n=1}^{N}\left ( \sum_{j=0}^{M}w_jx_n^j – t_n \right )x_n^j = 0 \tag3$

import numpy as np import scipy as sp import matplotlib.pyplot as plt %matplotlib inline N = 10 # 设置 N x_tr = np.linspace(0, 1, N) # 生成 0，1 之间等距的 N 个 数 t_tr = np.sin(2 * np.pi * x_tr) + 0.25 * np.random.randn(N) # 计算 t,引入随机噪声 fig, axes = plt.subplots(2, 2, figsize=(12, 8)) axes = axes.flatten() Ms = [0, 1, 3, 9] # 拟合参数是多项式的阶数 M，我们可以看看当 M=0,1,3,9 时的效果 for ax, M in zip(axes, Ms): coeff = np.polyfit(x_tr, t_tr, M) # 计算参数 f = np.poly1d(coeff) # 生成函数 y(x, w) # 绘图 xx = np.linspace(0, 1, 500) ax.plot(x_tr, t_tr, 'co') ax.plot(xx, np.sin(2 * np.pi * xx), 'g') ax.plot(xx, f(xx), 'r') ax.set_xlim(-0.1, 1.1) ax.set_ylim(-1.5, 1.5) ax.set_xticks([0, 1]) ax.set_yticks([-1, 0, 1]) ax.set_xlabel("$x$",fontsize="x-large") ax.set_ylabel("$t$",fontsize="x-large") ax.text(0.6, 1, '$M={}$'.format(M), fontsize="x-large") plt.show()

可以看到，M=3 似乎是其中较好的一个选择，M=9 虽然拟合的效果最好（通过了所有的训练数据点），但很明显过拟合了。

检测模型的效果，用一组与训练集相同分布的数据进行测试，然后计算不同模型选择下，训练集和测试集上的 $E(\mathbf{w}^⋆)$ 值。

注意到随着测试点数目的变化， $E(\mathbf{w}^⋆)$ 的尺度也在不断变化，因此一个更好的选择是使用 root-mean-square (RMS) 误差： $E_{RMS}=\sqrt{2E(\mathbf{w}^sstar)/N}$
RMS 误差的衡量尺度和单位与目标值 t 一致。

我们用相同的方法产生100组数据作为测试集，计算不同 M 下的 RMS 误差：

import numpy as np import scipy as sp import matplotlib.pyplot as plt %matplotlib inline N = 10 # 设置 N x_tr = np.linspace(0, 1, N) # 生成 0，1 之间等距的 N 个 数 t_tr = np.sin(2 * np.pi * x_tr) + 0.25 * np.random.randn(N) # 计算 t,引入随机噪声 x_te = np.random.rand(100) t_te = np.sin(2 * np.pi * x_te) + 0.25 * np.random.randn(100) rms_tr, rms_te = [], [] for M in range(10): coeff = np.polyfit(x_tr, t_tr, M) # 计算参数 f = np.poly1d(coeff) # 生成函数 y(x, w) # RMS rms_tr.append(np.sqrt(((f(x_tr) - t_tr)  2).sum() / x_tr.shape[0])) rms_te.append(np.sqrt(((f(x_te) - t_te)  2).sum() / x_te.shape[0])) # 画图 fig, ax = plt.subplots() ax.plot(range(10), rms_tr, 'bo-', range(10), rms_te, 'ro-') ax.set_xlim(-1, 10) ax.set_ylim(0, 1) ax.set_xticks(range(0, 10, 3)) ax.set_yticks([0, 0.5, 1]) ax.set_xlabel("$M$",fontsize="x-large") ax.set_ylabel("$E_{RMS}$",fontsize="x-large") ax.legend(['Traning', 'Test'], loc="best") plt.show()

可以看到 M=9 时，虽然训练集上的误差已经降到 0，但是测试集上的误差却很大，为了更好的拟合这些点，系数都会变得很大：

for i, w in enumerate(np.polyfit(x_tr, t_tr, 9)): print "w_{}, {:.2f}".format(9 - i, w) ''' w_9, -69435.47 w_8, .34 w_7, -.51 w_6, .39 w_5, -.96 w_4, .72 w_3, -34984.60 w_2, 3903.04 w_1, -158.55 w_0, -0.15 '''

1.2 过拟合与正则化

过拟合问题：当训练数据量 N 变多时，模型拟合的过拟合现象在减少。
当模型的复杂度固定时，随着数据的增加，过拟合的现象也在逐渐减少。
如下：M=9 的模型的表现：

import numpy as np import scipy as sp import matplotlib.pyplot as plt %matplotlib inline fig, axes = plt.subplots(1, 2, figsize=(12, 4)) axes = axes.flatten() M = 9 for ax, N in zip(axes, (15, 150)): x_tr_more = np.linspace(0, 1, N) # 生成 0，1 之间等距的N个数 t_tr_more = np.sin(2 * np.pi * x_tr_more) + 0.25 * np.random.randn(N) # 计算 t coeff = np.polyfit(x_tr_more, t_tr_more, M) # 计算参数 f = np.poly1d(coeff) # 生成函数 y(x, w) # 绘图 xx = np.linspace(0, 1, 500) ax.plot(x_tr_more, t_tr_more, 'co') ax.plot(xx, np.sin(2 * np.pi * xx), 'g') ax.plot(xx, f(xx), 'r') ax.set_xlim(-0.1, 1.1) ax.set_ylim(-1.5, 1.5) ax.set_xticks([0, 1]) ax.set_yticks([-1, 0, 1]) ax.set_xlabel("$x$", fontsize="x-large") ax.set_ylabel("$t$", fontsize="x-large") ax.text(0.6, 1, '$N={}$'.format(N), fontsize="x-large") plt.show()

正则化：如果我们一定要在 N=10 的数据上使用 M=9 的模型，那么一个通常的做法是给参数加一个正则项的约束防止过拟合，一个最常用的正则项是平方正则项，即控制所有参数的平方和大小：
$\hat{E(\mathbf{w})} = \frac{1}{2}\sum_{i=1}^{N}\left \{ y(x_n,\mathbf{w})-t_n \right \}^2+\frac{\lambda }{2}||\mathbf{w}||^2 \tag4$

其中 $∥\mathbf{w}∥^2≡w\mathbf{w}^⊤\mathbf{w}=w_0^2+…|w_M^2，λ$ 是控制正则项和误差项的相对重要性。
若设向量 $ϕ (x)$ 满足 $ϕ_i(x)=x^i,i=0,1,…,M$ ，则对 $\mathbf{w}$ 最小化，解应当满足：
$\left [ \sum_{n=1}^{N}\phi (x_n)\phi (x_n)……T+\lambda I \right ]\mathbf{w} = \sum_{n=1}^{N}t_n\phi (x_n) \tag5$

import numpy as np import scipy as sp import matplotlib.pyplot as plt %matplotlib inline def phi(x, M): return x[:,None]  np.arange(M + 1) N = 10 # 设置 N x_tr = np.linspace(0, 1, N) # 生成 0，1 之间等距的 N 个 数 t_tr = np.sin(2 * np.pi * x_tr) + 0.25 * np.random.randn(N) # 计算 t,引入随机噪声 # 加正则项的解 M = 9 lam = 0.0001 phi_x_tr = phi(x_tr, M) S_0 = phi_x_tr.T.dot(phi_x_tr) + lam * np.eye(M+1) y_0 = t_tr.dot(phi_x_tr) coeff = np.linalg.solve(S_0, y_0)[::-1] f = np.poly1d(coeff) # 生成函数 y(x, w) xx = np.linspace(0, 1, 500) fig, ax = plt.subplots() ax.plot(x_tr, t_tr, 'co') ax.plot(xx, np.sin(2 * np.pi * xx), 'g') ax.plot(xx, f(xx), 'r') ax.set_xlim(-0.1, 1.1) ax.set_ylim(-1.5, 1.5) ax.set_xticks([0, 1]) ax.set_yticks([-1, 0, 1]) ax.set_xlabel("$x$", fontsize="x-large") ax.set_ylabel("$t$", fontsize="x-large") plt.show()

1.3 维数灾难

在上述多项式拟合问题中只考虑了 $x$ 是一维变量的结果，但事实上 $x$ 的维度可能远远不止一维，考虑一个 $D$ 维的输入 $\bf x$ 并且选择阶数 $\bf M=3$ ，那么我们有
$y(\mathbf{x},\mathbf{w})=w_0+\sum_{i=1}^{D}w_ix_i+\sum_{i=1}^{D}\sum_{j=1}^{D}w_{ij}x_ix_j+\sum_{i=1}^{D}\sum_{j=1}^{D}\sum_{k=1}^{D}w_{ij}x_ix_jx_k \tag6$

可以看到，随着 $D$ 的增大，独立参数的个数也增大到 $D^3$ 的量级。对于 $M$ 阶的多项式，系数的量级将变成 $D^M$ 。
也就是说，随着 $D$ 的增大，独立参数的数目呈幂数增长。

二、基于概率的曲线拟合

2.1 高斯分布

高斯分布（Gaussian distribution），又叫正态分布（normal distribution）。

对于实值变量 $x$ ，高斯分布定义为：
$\mathcal{N}\left(x\left|~\mu,\sigma^2\right.\right) = \frac{1}{\sqrt{(2\pi\sigma^2)}} \exp\left\{-\frac{1}{2\sigma^2}(x-\mu)^2\right\} \tag7$

参数为均值 $μ$ 和方差 $σ^2$ 。方差的平方根 $σ$ 叫做标准差，方差的倒数 $β=\frac{1}{σ^2}$ 叫做精度。

import numpy as np import scipy as sp import matplotlib.pyplot as plt from scipy.stats import norm %matplotlib inline xx = np.linspace(-3, 3, 200) norm_xx = norm.pdf(xx) fig, ax = plt.subplots() ax.plot(xx, norm_xx, "r") ax.set_ylim(0, 0.5) ax.set_ylabel(r"$\mathcal{N}\left(x|\mu,\sigma^2\right)$", fontsize="xx-large") ax.set_yticks([]) ax.set_yticklabels([]) ax.set_xticks([0]) ax.set_xticklabels([r"$\mu$"], fontsize="xx-large") ax.text(-.1, 0.25, "$2\sigma$", fontsize="xx-large") ax.annotate("", xy=(-1, 0.24), xycoords='data', xytext=(1, 0.24), textcoords='data', arrowprops=dict(arrowstyle="<->", connectionstyle="arc3"), ) plt.show()

多维高斯分布
对于 D 维的向量 x，高斯分布定义为：
$\mathcal{N}\left(\mathbf x\left|~\mathbf{\mu, \Sigma}\right.\right) = \frac{1}{\sqrt[D]{(2\pi)}} \frac{1}{\sqrt{|\mathbf\Sigma|}} \exp \left\{-\frac{1}{2}(\mathbf x – \mathbf \mu)^\top\mathbf\Sigma^{-1}(\mathbf x – \mathbf \mu)\right\} \tag8$

其中， $D$ 维向量 $μ$ 是均值， $D \times D$ 矩阵 $\mathbf{Σ}$ 是方差， $|\mathbf{Σ}|$ 是其行列式。

最大似然估计

假设我们现在有 $N$ 组对 $x$ 的观测数据 $\mathsf x = (x_1,\dots,x_N)^{\text T}$ ，这些数据是独立同分布（independent and identically distributed, i.i.d.）的，都服从一个均值 $μ$ ，方差 $σ^2$ 的高斯分布。那么在给定这些参数的情况下，出现这些观测数据的概率，或者从参数的角度来说，似然函数为：详细求解流程
$p(\mathsf x~|~\mu, \sigma^2)=\prod_{n=1}^N \mathcal N\left(x_n \left|~\mu, \sigma^2\right.\right) \tag9$

符合高斯分布最大似然解：

对 $μ$ 最大化（即样本均值） $\mu = \frac 1 N \sum_{n=1}^N x_n$
对 $σ^2$ 最大化（即样本方差）： $\sigma^2 = \frac 1 N \sum_{n=1}^N (x_n-\mu)^2$

2.2 重新理解曲线拟合

对于曲线拟合的问题，设训练集输入为 $\mathsf x=(x_1, \dots, x_N)^\top$ ，对应的目标值为 $\mathsf t=(t_1, \dots, t_N)^\top$ 。

我们将我们的不确定性用高斯分布来表示，假设给定 $x$ ，对应的目标值 $t$ 服从一个均值为 $y(x,\mathbf w)$ 的高斯分布：
$p(t\left|~x,\mathbf w,\beta\right.)=\mathcal N\left(t\left|~y(x,\mathbf w), \beta^{-1}\right.\right) \tag{11}$

import numpy as np import scipy as sp import matplotlib.pyplot as plt from scipy.stats import norm %matplotlib inline xx = np.linspace(-0.9, 0.9, 100) yy = 4 * xx - np.sin(xx * np.pi) fig, ax = plt.subplots() ax.plot(xx, yy, color="red") ax.set_xlim(-1, 1) ax.set_ylim(-4, 4) ax.set_xticks([0]) ax.set_xticklabels([r'$x_0$'], fontsize="xx-large") ax.set_yticks([0]) ax.set_yticklabels([r'$y(x_0, \mathbf{w})$'], fontsize="xx-large") xx = np.linspace(-4, 4, 100) yy = norm.pdf(xx, scale=0.5) / 5 ax.plot([-1, 0], [0, 0], "g--") ax.plot([0, 0], [-4, 4], "k") ax.plot(yy, xx) ax.annotate("", xy=(0.75, -0.5), xycoords='data', xytext=(0.75, 0.5), textcoords='data', arrowprops=dict(arrowstyle="<->", connectionstyle="arc3"), ) ax.text(0.77, -0.2, r'$2\sigma$', fontsize="xx-large") ax.text(0.15, -1, r'$p(t|x_0,\mathbf{w}, \beta)$', fontsize="xx-large") ax.text(0.5, 3, r'$y(x, \mathbf{w})$', fontsize="xx-large") plt.show()

最大似然

设系数的最大似然解为 $\mathbf{w}$ ，从最大化对数似然的角度来看，求它的问题相当于最小化：
$\frac{1}{2} \sum_{n=1}^N \{y(x,\mathbf w) – t_n\}^2 \tag{14}$

我们有了最大似然的结果之后，对于一个新的输入 $x$ ，其输出 $t$ 应当满足：
$p\left(t\left|~x,\mathbf w, \beta\right.\right) = \mathcal N\left(t\left|~y(x,\mathbf w), \beta^{-1}\right.\right) \tag{16}$

最大化后验概率（maximum posterior, MAP）

假设我们对系数 $\mathbf w$ 有一个先验的知识（ $M$ 是多项式阶数，加上常数项一共 $M + 1$ 维）：
$p(\mathbf w~|~\alpha) = \mathcal{N}(\mathbf w~|~\mathbf 0, \alpha^{-1} I) = \left(\frac{\alpha}{2\pi}\right)^{(M+1)/2} \exp\left\{-\frac{\alpha}{2}\mathbf{w}^\top\mathbf{w}\right\} \tag{17}$
$α$ 控制这个模型的先验分布，这一类的参数通常被叫做超参（hyperparameters），Bayes 公式告诉我们，后验概率正比与先验概率和似然函数的乘积：
$p(\mathbf w~|~\mathsf{x, t}, \alpha, \beta) \propto p(\mathsf t~|~\mathsf x, \mathbf w, \beta)~p(\mathbf w~|~\alpha) \tag{18}$

我们可以通过最大化后验概率（maximum posterior, MAP）来决定参数 $\mathbf w$ 的值，对上式求对数，并去掉跟 $\mathbf w$ 无关的项，我们相当于要最大化：
$-\frac{\beta}{2} \sum_{n=1}^N \{y(x,\mathbf w) – t_n\}^2 -\frac{\alpha}{2}\mathbf{w}^\top\mathbf{w} \tag{19}$
即最小化
$\frac{\beta}{2} \sum_{n=1}^N \{y(x,\mathbf w) – t_n\}^2 +\frac{\alpha}{2}\mathbf{w}^\top\mathbf{w} \tag{20}$

因此，MAP 的结果相当于给多项式拟合加二范数正则化的结果，其中正则参数 $λ = α / β$ 。

Bayes 曲线拟合

虽然在 MAP 中，我们引入了先验分布，但是本质上它还是一个点估计，本质上并不是一个完全的 Bayes 估计。

一个完全的 Bayes 估计要求我们对 $\mathbf w$ 的所有值进行积分。

对于之前的曲线拟合问题，给定训练集 $x$ 和 $t$ ，对于一个新的测试样例 $x$ ，其目标值为 $t$ ，我们考虑预测的分布 $p (t ∣ x, x, t)$ （这里我们假定 $β, α$ 两个参数已经给定了）

其中 $p(t~|~x,\mathbf w)$ 是之前给定的高斯分布， $p(w~|~\mathsf{x,t})$ 是训练集上的后验概率（也是一个高斯分布）。
由于高斯分布的性质，上面的式子本质上也是一个高斯分布，因此可以写成:
$p(t~|~x,\mathsf{x, t})=\mathcal{N}\left(t~|~m(x), s^2(x)\right) \tag{22}$

其中， $\phi_i(x) = x^i, i = 0, \dots, M$ ，矩阵 S：
$\mathbf S^{-1}=\alpha I +\beta \sum_{n=1}^N \phi(x_n)^\top\phi(x_n) \tag{24}$

import numpy as np import scipy as sp import matplotlib.pyplot as plt from scipy.stats import norm %matplotlib inline def phi(x, M): return x[:,None]  np.arange(M + 1) N = 10 x_tr = np.linspace(0, 1, N) # 生成 0，1 之间等距的 N 个 数 t_tr = np.sin(2 * np.pi * x_tr) + 0.25 * np.random.randn(N) # 计算 t # 加正则项的解 M = 9 alpha = 5e-3 beta = 11.1 lam = alpha / beta phi_x_tr = phi(x_tr, M) A_0 = phi_x_tr.T.dot(phi_x_tr) + lam * np.eye(M+1) y_0 = t_tr.dot(phi_x_tr) # 求解 Aw=y coeff = np.linalg.solve(A_0, y_0)[::-1] f = np.poly1d(coeff) # 绘图 xx = np.linspace(0, 1, 500) # Bayes估计的均值和标准差 S = np.linalg.inv(A_0 * beta) m_xx = beta * phi(xx, M).dot(S).dot(y_0) s_xx = np.sqrt(1 / beta + phi(xx, M).dot(S).dot(phi(xx, M).T).diagonal()) fig, ax = plt.subplots() ax.plot(x_tr, t_tr, 'co') ax.plot(xx, np.sin(2 * np.pi * xx), 'g') ax.plot(xx, f(xx), 'r') ax.fill_between(xx, m_xx-s_xx, m_xx+s_xx, color="pink") ax.set_xlim(-0.1, 1.1) ax.set_ylim(-1.5, 1.5) ax.set_xticks([0, 1]) ax.set_yticks([-1, 0, 1]) ax.set_xlabel("$x$", fontsize="x-large") ax.set_ylabel("$t$", fontsize="x-large") plt.show()

免责声明：本站所有文章内容,图片，视频等均是来源于用户投稿和互联网及文摘转载整编而成，不代表本站观点，不承担相关法律责任。其著作权各归其原作者或其出版社所有。如发现本站有涉嫌抄袭侵权/违法违规的内容,侵犯到您的权益，请在线联系站长,一经查实,本站将立刻删除。本文来自网络,若有侵权，请联系删除，如若转载，请注明出处：https://haidsoft.com/131910.html