Machine Learning中的数学公式
这些公式主要来自Coursera上的Machine Learning课程,汇总在这里,以备查找。
#Linear Regression
###Cost function
\[J(\theta) = {1 \over 2m}\sum_{i=1}^m(h_\theta(x^{(i)}) - y ^{(i)}) ^ 2 \]
###Regularized cost function \[J(\theta) = {1 \over 2m}\left(\sum_{i=1}^m(h_\theta(x^{(i)}) - y ^{(i)}) ^ 2 \right) + {\lambda \over 2m}\sum_{j=1}^n\theta_j^2\]
###Gradient with regularization \[\begin{aligned} {\partial J(\theta) \over \partial \theta_0} = {1 \over m}\sum_{i=1}^m(h_\theta(x^{(i)}) - y^{(i)})x_j^{(i)} \quad\quad \text{for $j = 0$}.\\ {\partial J(\theta) \over \partial \theta_j} = \left( {1 \over m}\sum_{i=1}^m(h_\theta(x^{(i)}) - y^{(i)})x_j^{(i)} \right) + {\lambda \over m} \theta_j \quad\text{for $j \geq 1$}. \end{aligned}\]
###Normal equation
\[\theta = (X^TX)^{-1}X^Ty\]
Normal equation
是计算回归参数 的一种简便方法,它不像Gradient Descent
那样需要不断的迭代来训练,套用上面的公式就可以直接计算出,比较容易实现。但是它在Feature很多的情况下(即n
很大时)性能会非常差,不如Gradient Descent
。
#Logistic Regression
###Cost function
\[J(\theta) = -{1 \over m}\left[\sum_{i=1}^my^{(i)}\log(h_\theta(x^{(i)})) + (1 - y^{(i)})\log(1 - h_\theta(x^{(i)}))\right] \]
###Regularized cost function
\[J(\theta) = -{1 \over m}\left[\sum_{i=1}^my^{(i)}\log(h_\theta(x^{(i)})) + (1 - y^{(i)})\log\left(1 - h_\theta(x^{(i)})\right)\right] + {\lambda \over 2m} \sum_{j=1}^n\theta_j^2\]
###Gradient
\[ {\partial J(\theta) \over \partial \theta_j} = {1 \over m}\sum_{i=1}^m(h_\theta(x^{(i)}) - y^{(i)})x_j^{(i)}\]
###Gradient with regularization
\[\begin{aligned} {\partial J(\theta) \over \partial \theta_0} = {1 \over m}\sum_{i=1}^m(h_\theta(x^{(i)}) - y^{(i)})x_j^{(i)} \quad\quad \text{for $j = 0$}. \\ {\partial J(\theta) \over \partial \theta_j} = \left( {1 \over m}\sum_{i=1}^m(h_\theta(x^{(i)}) - y^{(i)})x_j^{(i)} \right) + {\lambda \over m} \theta_j \quad\text{for $j \geq 1$}. \end{aligned} \]
#Neural network
###Cost function
\[J(\theta) = {1 \over m}\sum_{i=1}^m\sum_{k=1}^K\left[-y_k^{(i)}\log((h_\theta(x^{(i)}))_k) - (1 - y_k^{(i)})\log(1 - (h_\theta(x^{(i)}))_k)\right] \]
###Regularized cost function
\[J(\theta) = {1 \over m}\sum_{i=1}^m\sum_{k=1}^K\left[-y_k^{(i)}\log((h_\theta(x^{(i)}))_k) - (1 - y_k^{(i)})\log(1 - (h_\theta(x^{(i)}))_k)\right] + {\lambda \over 2m}\sum_{l=1}^{L-1}\sum_{i=1}^{sl}\sum_{j=1}^{sl+1}\left(\theta_{ji}^{(l)} \right)^ 2 \]
###Gradient with regularization
\[\begin{aligned} {\partial \over \partial \Theta_{ij}^{(l)} } J(\Theta) = {1 \over m}\Delta_{ij}^{(l)} \quad\quad \text{for $j = 0$}. \\ {\partial \over \partial \Theta_{ij}^{(l)} } J(\Theta) = {1 \over m}\Delta_{ij}^{(l)} + {\lambda \over m} \Theta_{ij}^{(l)} \quad\quad \text{for $j \geq 1$}. \end{aligned}\]
\[\begin{aligned} \Delta_{ij}^{(l)} = \Delta_{ij}^{(l)} + a_j^{(l)} \delta_j^{(l+1)}\\ \delta^{(L)} = a^{(L)} - y^{(i)}\\ a^{(1)} = x\\ z^{(l)} = \Theta^{(l-1)}a^{(l-1)} \\ a^{(l)} = g\left(z^{(l)}\right) \quad \left(\text{add $a_0^{(l)}$}\right) \end{aligned}\]
#Softmax Regression
###Hypothesis function
\[\begin{aligned} h_\theta(x) = \begin{bmatrix} \\ P(y = 1 | x; \theta) \\ P(y = 2 | x; \theta) \\ \vdots \\ P(y = K | x; \theta) \end{bmatrix} = \frac{1}{ \sum_{j=1}^{K}{\exp(\theta^{(j)\top} x) }} \begin{bmatrix} \exp(\theta^{(1)\top} x ) \\ \exp(\theta^{(2)\top} x ) \\ \vdots \\ \exp(\theta^{(K)\top} x ) \\ \end{bmatrix} \end{aligned}\]
###Cost function
\[\begin{aligned} J(\theta) = - \left[ \sum_{i=1}^{m} \sum_{k=1}^{K} 1\left\{y^{(i)} = k\right\}\log\frac{\exp(\theta^{(k)\top} x^{(i)})}{\sum_{j=1}^K \exp(\theta^{(j)\top} x^{(i)})} \right] \end{aligned}\]
###Gradient
\[\begin{align} \nabla_{\theta^{(k)}} J(\theta) = - \sum_{i=1}^{m}{ \left[ x^{(i)} \left( 1\{ y^{(i)} = k\} - P(y^{(i)} = k | x^{(i)}; \theta) \right) \right] } \end{align} \]
blog comments powered by Disqus