int-ml
下载地址:https://github.com/daiwk/int-collections/blob/master/int-ml.pdf
本文参考自李航的《统计学习方法》、周志华的《机器学习》、Hulu的《百面机器学习》等。
概述
统计学习三要素
模型
监督学习中,模型是要学习的条件概率分布或决策函数。
模型的假设空间
假设空间是所有可能的条件概率分布或决策函数
定义1
可以定义为决策函数的集合:
X和Y是定义在X和Y上的变量
F是一个参数向量决定的函数族:
参数向量θ取值于n维欧式空间Rn,称为参数空间
定义2
也可以定义为条件概率的集合:
X和Y是定义在X和Y上的随机变量
F是一个参数向量决定的条件概率分布族:
策略
损失函数与风险函数
损失函数(loss function)或代价函数(cost function): 度量预测值f(X)与真实值Y的误差程度,记为L(Y,f(X)),是个非负实值函数。损失函数越小,模型越好。
0-1损失函数:
R_{exp}(f)=E_P[L(Y,f(X))]=\int _{\mathcal{X}\times \mathcal{Y}}L(y,f(x))P(x,y)dxdy
\min_{f\in\mathcal{F}}\frac{1}{N}\sum ^N_{i=1}L(y_i,f(x_i))+\lambda J(f)
\begin{aligned} LL(\theta)&=\log p(X|\theta) \ &= \log \prod ^n_{i=1}p(x_i|\theta) \ &= \sum ^n_{i=1}\log p(x_i|\theta) \ \end{aligned}
\begin{aligned} f(x)&= \arg\max _{\theta}p(\theta|X) \ &= \arg\max _{\theta}\frac{p(X|\theta)p(\theta)}{p(X)} \ &= \arg\max _{\theta} p(X|\theta)p(\theta) \ \end{aligned}
\begin{aligned} f(x)&= \arg\max _{\theta} \log p(X|\theta)p(\theta) \ &= \arg\max {\theta} {\sum ^n{i=1}\log p(x_i|\theta)+\log p(\theta)}\ \end{aligned}
r_{ti} = -\bigg[\frac{\partial L(y_i, f(x_i)))}{\partial f(x_i)}\bigg]{f(x) = f{t-1}(x)}
\begin{matrix} df(x,y)=dc & \ \frac{\partial f}{\partial x} dx + \frac{\partial f}{\partial y} dy =0 & \ & \end{matrix}
\frac{\partial f}{\partial x} dx + \frac{\partial f}{\partial y} dy = {\frac{\partial f}{\partial x}, \frac{\partial f}{\partial y}}\cdot {dx,dy}=0
\min_{x}f(x)
\begin{matrix} s.t & h_i(x)=0,i=1,...,m \ & g_j(x)\le 0,j=1,...,n \end{matrix}
\left{\begin{matrix} g_j(x)\le 0\ \mu_j\ge 0\ \mu_jg_j(x)=0 \end{matrix}\right.
\begin{aligned} f(x)&=f(x^{(k)})+\nabla f(x^{(k)})^T(x-x^{(k)}) \ f(x+\Delta x) &= f(x^{(k)})+\nabla f(x^{(k)})^T(x+\Delta x-x^{(k)}) \ & = f(x^{(k)})+\nabla f(x^{(k)})^T(x-x^{(k)}) + \nabla f(x^{(k)})^T\Delta x \ & = f(x) + \nabla f(x^{(k)})^T\Delta x \end{aligned}
\begin{aligned} f(x) &= f(x^{(x)})+g^T_k(x-x^{(k)})+\frac{1}{2}(x-x^{(k)})^TH(x^{(k)})(x-x^{(k)}) \ &=f(x^{(x)})+g^T_k+\frac{1}{2}(x-x^{(k)})^TH(x^{(k)}) \ &=f(x^{(x)})+[g_k+\frac{1}{2}H(x^{(k)})(x-x^{(k)})]^T(x-x^{(k)}) \ \end{aligned}
\begin{matrix} g_k+\frac{1}{2}H^k(x^{(k+1)}-x^{(k)})=0 \ g_k=-\frac{1}{2}H^k(x^{(k+1)}-x^{(k)}) \ -2H^{-1}_kg_k=x^{(k+1)}-x^{(k)} \ x^{(k+1)} = -2H^{-1}_kg_k+x^{(k)} \end{matrix}
D_{KL}(P||Q)=\sum_{i}P(i)ln\frac{P(i)}{Q(i)}=-\sum _{i}P(i)ln\frac{Q(i)}{P(i)}
D_{KL}(P||Q)=\int ^{\infty }_{-\infty}p(x)ln\frac{p(x)}{q(x)}dx
\begin{aligned} D_{KL}(P||Q)&=\int ^{\infty }{-\infty}p(x)ln\frac{p(x)}{q(x)}dx \ &=\int ^{\infty }{-\infty}p(x)(-ln\frac{q(x)}{p(x)})dx \ &=\int ^{\infty }{-\infty}(-ln\frac{q(x)}{p(x)})p(x)dx \ &\ge -ln(\int^{\infty }{-\infty}\frac{q(x)}{p(x)}p(x)dx) \ &\ge -ln(\int^{\infty }_{-\infty}q(x)dx) \ &=-ln1=0 \end{aligned}
\begin{aligned} E_{x\sim p(x)}[f(x)]&=\int x p(x)f(x)dx \ &=\int x \tilde{p}(x) \frac{p(x)}{\tilde {p}(x)}f(x)dx \ &=E{x\sim \tilde{p}(x)} [\frac{p(x)}{\tilde {p}(x)}f(x)] \ & \simeq \frac{1}{N} \sum ^N{x_i\sim \tilde{p}(x),i=1}\frac{p(x)}{\tilde{p}(x)}f(x) \end{aligned}
最后更新于