1442 lines
54 KiB
Plaintext
1442 lines
54 KiB
Plaintext
{
|
||
"nbformat": 4,
|
||
"nbformat_minor": 0,
|
||
"metadata": {
|
||
"colab": {
|
||
"name": "book_equations.ipynb",
|
||
"version": "0.3.2",
|
||
"provenance": []
|
||
},
|
||
"kernelspec": {
|
||
"display_name": "Python 2",
|
||
"language": "python",
|
||
"name": "python2"
|
||
}
|
||
},
|
||
"cells": [
|
||
{
|
||
"metadata": {
|
||
"id": "ZICa1cn5n1Yv",
|
||
"colab_type": "text"
|
||
},
|
||
"cell_type": "markdown",
|
||
"source": [
|
||
"**수식**\n",
|
||
"\n",
|
||
"*이 노트북은 책에 있는 모든 공식을 모아 놓은 것입니다.*\n",
|
||
"\n",
|
||
"**주의**: 깃허브의 노트북 뷰어는 적절하게 수식을 표현하지 못합니다. 로컬에서 주피터를 실행하여 이 노트북을 보거나 [nbviewer](http://nbviewer.jupyter.org/github/rickiepark/handson-ml/blob/master/book_equations.ipynb)를 사용하세요."
|
||
]
|
||
},
|
||
{
|
||
"metadata": {
|
||
"id": "A_lqPMDMn1Yx",
|
||
"colab_type": "text"
|
||
},
|
||
"cell_type": "markdown",
|
||
"source": [
|
||
"# 1장\n",
|
||
"**식 1-1: 간단한 선형 모델**\n",
|
||
"\n",
|
||
"$\n",
|
||
"\\text{삶의_만족도} = \\theta_0 + \\theta_1 \\times \\text{1인당_GDP}\n",
|
||
"$"
|
||
]
|
||
},
|
||
{
|
||
"metadata": {
|
||
"id": "-ae0t1eRn1Yx",
|
||
"colab_type": "text"
|
||
},
|
||
"cell_type": "markdown",
|
||
"source": [
|
||
"# 2장\n",
|
||
"**식 2-1: 평균 제곱근 오차 (RMSE)**\n",
|
||
"\n",
|
||
"$\n",
|
||
"\\text{RMSE}(\\mathbf{X}, h) = \\sqrt{\\frac{1}{m}\\sum\\limits_{i=1}^{m}\\left(h(\\mathbf{x}^{(i)}) - y^{(i)}\\right)^2}\n",
|
||
"$\n",
|
||
"\n",
|
||
"\n",
|
||
"**표기법 (72 페이지):**\n",
|
||
"\n",
|
||
"$\n",
|
||
" \\mathbf{x}^{(1)} = \\begin{pmatrix}\n",
|
||
" -118.29 \\\\\n",
|
||
" 33.91 \\\\\n",
|
||
" 1,416 \\\\\n",
|
||
" 38,372\n",
|
||
" \\end{pmatrix}\n",
|
||
"$\n",
|
||
"\n",
|
||
"\n",
|
||
"$\n",
|
||
" y^{(1)}=156,400\n",
|
||
"$\n",
|
||
"\n",
|
||
"\n",
|
||
"$\n",
|
||
" \\mathbf{X} = \\begin{pmatrix}\n",
|
||
" (\\mathbf{x}^{(1)})^T \\\\\n",
|
||
" (\\mathbf{x}^{(2)})^T\\\\\n",
|
||
" \\vdots \\\\\n",
|
||
" (\\mathbf{x}^{(1999)})^T \\\\\n",
|
||
" (\\mathbf{x}^{(2000)})^T\n",
|
||
" \\end{pmatrix} = \\begin{pmatrix}\n",
|
||
" -118.29 & 33.91 & 1,416 & 38,372 \\\\\n",
|
||
" \\vdots & \\vdots & \\vdots & \\vdots \\\\\n",
|
||
" \\end{pmatrix}\n",
|
||
"$\n",
|
||
"\n",
|
||
"\n",
|
||
"**식 2-2: 평균 절대 오차**\n",
|
||
"\n",
|
||
"$\n",
|
||
"\\text{MAE}(\\mathbf{X}, h) = \\frac{1}{m}\\sum\\limits_{i=1}^{m}\\left| h(\\mathbf{x}^{(i)}) - y^{(i)} \\right|\n",
|
||
"$\n",
|
||
"\n",
|
||
"**$\\ell_k$ 노름 (74 페이지):**\n",
|
||
"\n",
|
||
"$ \\left\\| \\mathbf{v} \\right\\| _k = (\\left| v_0 \\right|^k + \\left| v_1 \\right|^k + \\dots + \\left| v_n \\right|^k)^{\\frac{1}{k}} $\n"
|
||
]
|
||
},
|
||
{
|
||
"metadata": {
|
||
"id": "8fBBfiofn1Yy",
|
||
"colab_type": "text"
|
||
},
|
||
"cell_type": "markdown",
|
||
"source": [
|
||
"# 3장\n",
|
||
"**식 3-1: 정밀도**\n",
|
||
"\n",
|
||
"$\n",
|
||
"\\text{정밀도} = \\cfrac{TP}{TP + FP}\n",
|
||
"$\n",
|
||
"\n",
|
||
"\n",
|
||
"**식 3-2: 재현율**\n",
|
||
"\n",
|
||
"$\n",
|
||
"\\text{재현율} = \\cfrac{TP}{TP + FN}\n",
|
||
"$\n",
|
||
"\n",
|
||
"\n",
|
||
"**식 3-3: $F_1$ 점수**\n",
|
||
"\n",
|
||
"$\n",
|
||
"F_1 = \\cfrac{2}{\\cfrac{1}{\\text{정밀도}} + \\cfrac{1}{\\text{재현율}}} = 2 \\times \\cfrac{\\text{정밀도}\\, \\times \\, \\text{재현율}}{\\text{정밀도}\\, + \\, \\text{재현율}} = \\cfrac{TP}{TP + \\cfrac{FN + FP}{2}}\n",
|
||
"$\n",
|
||
"\n"
|
||
]
|
||
},
|
||
{
|
||
"metadata": {
|
||
"id": "YTYQZMTjn1Yy",
|
||
"colab_type": "text"
|
||
},
|
||
"cell_type": "markdown",
|
||
"source": [
|
||
"# 4장\n",
|
||
"**식 4-1: 선형 회귀 모델의 예측**\n",
|
||
"\n",
|
||
"$\n",
|
||
"\\hat{y} = \\theta_0 + \\theta_1 x_1 + \\theta_2 x_2 + \\dots + \\theta_n x_n\n",
|
||
"$\n",
|
||
"\n",
|
||
"\n",
|
||
"**식 4-2: 선형 회귀 모델의 예측 (벡터 형태)**\n",
|
||
"\n",
|
||
"$\n",
|
||
"\\hat{y} = h_{\\mathbf{\\theta}}(\\mathbf{x}) = \\mathbf{\\theta}^T \\cdot \\mathbf{x}\n",
|
||
"$\n",
|
||
"\n",
|
||
"\n",
|
||
"**식 4-3: 선형 회귀 모델의 MSE 비용 함수**\n",
|
||
"\n",
|
||
"$\n",
|
||
"\\text{MSE}(\\mathbf{X}, h_{\\mathbf{\\theta}}) = \\dfrac{1}{m} \\sum\\limits_{i=1}^{m}{(\\mathbf{\\theta}^T \\cdot \\mathbf{x}^{(i)} - y^{(i)})^2}\n",
|
||
"$\n",
|
||
"\n",
|
||
"\n",
|
||
"**식 4-4: 정규 방정식**\n",
|
||
"\n",
|
||
"$\n",
|
||
"\\hat{\\mathbf{\\theta}} = (\\mathbf{X}^T \\cdot \\mathbf{X})^{-1} \\cdot \\mathbf{X}^T \\cdot \\mathbf{y}\n",
|
||
"$\n",
|
||
"\n",
|
||
"\n",
|
||
"** 편도함수 기호 (165 페이지):**\n",
|
||
"\n",
|
||
"$\\frac{\\partial}{\\partial \\theta_j} \\text{MSE}(\\mathbf{\\theta})$\n",
|
||
"\n",
|
||
"\n",
|
||
"**식 4-5: 비용 함수의 편도함수**\n",
|
||
"\n",
|
||
"$\n",
|
||
"\\dfrac{\\partial}{\\partial \\theta_j} \\text{MSE}(\\mathbf{\\theta}) = \\dfrac{2}{m}\\sum\\limits_{i=1}^{m}(\\mathbf{\\theta}^T \\cdot \\mathbf{x}^{(i)} - y^{(i)})\\, x_j^{(i)}\n",
|
||
"$\n",
|
||
"\n",
|
||
"\n",
|
||
"**식 4-6: 비용 함수의 그래디언트 벡터**\n",
|
||
"\n",
|
||
"$\n",
|
||
"\\nabla_{\\mathbf{\\theta}}\\, \\text{MSE}(\\mathbf{\\theta}) =\n",
|
||
"\\begin{pmatrix}\n",
|
||
" \\frac{\\partial}{\\partial \\theta_0} \\text{MSE}(\\mathbf{\\theta}) \\\\\n",
|
||
" \\frac{\\partial}{\\partial \\theta_1} \\text{MSE}(\\mathbf{\\theta}) \\\\\n",
|
||
" \\vdots \\\\\n",
|
||
" \\frac{\\partial}{\\partial \\theta_n} \\text{MSE}(\\mathbf{\\theta})\n",
|
||
"\\end{pmatrix}\n",
|
||
" = \\dfrac{2}{m} \\mathbf{X}^T \\cdot (\\mathbf{X} \\cdot \\mathbf{\\theta} - \\mathbf{y})\n",
|
||
"$\n",
|
||
"\n",
|
||
"\n",
|
||
"**식 4-7: 경사 하강법의 스텝**\n",
|
||
"\n",
|
||
"$\n",
|
||
"\\mathbf{\\theta}^{(\\text{다음 스텝})}\\,\\,\\, = \\mathbf{\\theta} - \\eta \\nabla_{\\mathbf{\\theta}}\\, \\text{MSE}(\\mathbf{\\theta})\n",
|
||
"$\n",
|
||
"\n",
|
||
"\n",
|
||
"$ O(\\frac{1}{\\epsilon}) $\n",
|
||
"\n",
|
||
"\n",
|
||
"$ \\hat{y} = 0.56 x_1^2 + 0.93 x_1 + 1.78 $\n",
|
||
"\n",
|
||
"\n",
|
||
"$ y = 0.5 x_1^2 + 1.0 x_1 + 2.0 + \\text{가우시안 잡음} $\n",
|
||
"\n",
|
||
"\n",
|
||
"$ \\dfrac{(n+d)!}{d!\\,n!} $\n",
|
||
"\n",
|
||
"\n",
|
||
"$ \\alpha \\sum_{i=1}^{n}{\\theta_i^2}$\n",
|
||
"\n",
|
||
"\n",
|
||
"**식 4-8: 릿지 회귀의 비용 함수**\n",
|
||
"\n",
|
||
"$\n",
|
||
"J(\\mathbf{\\theta}) = \\text{MSE}(\\mathbf{\\theta}) + \\alpha \\dfrac{1}{2}\\sum\\limits_{i=1}^{n}\\theta_i^2\n",
|
||
"$\n",
|
||
"\n",
|
||
"\n",
|
||
"**식 4-9: 릿지 회귀의 정규 방정식**\n",
|
||
"\n",
|
||
"$\n",
|
||
"\\hat{\\mathbf{\\theta}} = (\\mathbf{X}^T \\cdot \\mathbf{X} + \\alpha \\mathbf{A})^{-1} \\cdot \\mathbf{X}^T \\cdot \\mathbf{y}\n",
|
||
"$\n",
|
||
"\n",
|
||
"\n",
|
||
"**식 4-10: 라쏘 회귀의 비용 함수**\n",
|
||
"\n",
|
||
"$\n",
|
||
"J(\\mathbf{\\theta}) = \\text{MSE}(\\mathbf{\\theta}) + \\alpha \\sum\\limits_{i=1}^{n}\\left| \\theta_i \\right|\n",
|
||
"$\n",
|
||
"\n",
|
||
"\n",
|
||
"**식 4-11: 라쏘 회귀의 서브그래디언트 벡터**\n",
|
||
"\n",
|
||
"$\n",
|
||
"g(\\mathbf{\\theta}, J) = \\nabla_{\\mathbf{\\theta}}\\, \\text{MSE}(\\mathbf{\\theta}) + \\alpha\n",
|
||
"\\begin{pmatrix}\n",
|
||
" \\operatorname{sign}(\\theta_1) \\\\\n",
|
||
" \\operatorname{sign}(\\theta_2) \\\\\n",
|
||
" \\vdots \\\\\n",
|
||
" \\operatorname{sign}(\\theta_n) \\\\\n",
|
||
"\\end{pmatrix} \\quad \\text{여기서 } \\operatorname{sign}(\\theta_i) =\n",
|
||
"\\begin{cases}\n",
|
||
"-1 & \\theta_i < 0 \\text{일 때 } \\\\\n",
|
||
"0 & \\theta_i = 0 \\text{일 때 } \\\\\n",
|
||
"+1 & \\theta_i > 0 \\text{일 때 }\n",
|
||
"\\end{cases}\n",
|
||
"$\n",
|
||
"\n",
|
||
"\n",
|
||
"**식 4-12: 엘라스틱넷 비용 함수**\n",
|
||
"\n",
|
||
"$\n",
|
||
"J(\\mathbf{\\theta}) = \\text{MSE}(\\mathbf{\\theta}) + r \\alpha \\sum\\limits_{i=1}^{n}\\left| \\theta_i \\right| + \\dfrac{1 - r}{2} \\alpha \\sum\\limits_{i=1}^{n}{\\theta_i^2}\n",
|
||
"$\n",
|
||
"\n",
|
||
"\n",
|
||
"**식 4-13: 로지스틱 회귀 모델의 확률 추정(벡터 표현식)**\n",
|
||
"\n",
|
||
"$\n",
|
||
"\\hat{p} = h_{\\mathbf{\\theta}}(\\mathbf{x}) = \\sigma(\\mathbf{\\theta}^T \\cdot \\mathbf{x})\n",
|
||
"$\n",
|
||
"\n",
|
||
"\n",
|
||
"**식 4-14: 로지스틱 함수**\n",
|
||
"\n",
|
||
"$\n",
|
||
"\\sigma(t) = \\dfrac{1}{1 + \\exp(-t)}\n",
|
||
"$\n",
|
||
"\n",
|
||
"\n",
|
||
"**식 4-15: 로지스틱 회귀 모델 예측**\n",
|
||
"\n",
|
||
"$\n",
|
||
"\\hat{y} =\n",
|
||
"\\begin{cases}\n",
|
||
" 0 & \\hat{p} < 0.5 \\text{일 때 } \\\\\n",
|
||
" 1 & \\hat{p} \\geq 0.5 \\text{일 때 } \n",
|
||
"\\end{cases}\n",
|
||
"$\n",
|
||
"\n",
|
||
"\n",
|
||
"**식 4-16: 하나의 훈련 샘플에 대한 비용 함수**\n",
|
||
"\n",
|
||
"$\n",
|
||
"c(\\mathbf{\\theta}) =\n",
|
||
"\\begin{cases}\n",
|
||
" -\\log(\\hat{p}) & y = 1 \\text{일 때 } \\\\\n",
|
||
" -\\log(1 - \\hat{p}) & y = 0 \\text{일 때 }\n",
|
||
"\\end{cases}\n",
|
||
"$\n",
|
||
"\n",
|
||
"\n",
|
||
"**식 4-17: 로지스틱 회귀의 비용 함수(로그 손실)**\n",
|
||
"\n",
|
||
"$\n",
|
||
"J(\\mathbf{\\theta}) = -\\dfrac{1}{m} \\sum\\limits_{i=1}^{m}{\\left[ y^{(i)} log\\left(\\hat{p}^{(i)}\\right) + (1 - y^{(i)}) log\\left(1 - \\hat{p}^{(i)}\\right)\\right]}\n",
|
||
"$\n",
|
||
"\n",
|
||
"\n",
|
||
"**식 4-18: 로지스틱 비용 함수의 편도함수**\n",
|
||
"\n",
|
||
"$\n",
|
||
"\\dfrac{\\partial}{\\partial \\theta_j} \\text{J}(\\mathbf{\\theta}) = \\dfrac{1}{m}\\sum\\limits_{i=1}^{m}\\left(\\mathbf{\\sigma(\\theta}^T \\cdot \\mathbf{x}^{(i)}) - y^{(i)}\\right)\\, x_j^{(i)}\n",
|
||
"$\n",
|
||
"\n",
|
||
"\n",
|
||
"**식 4-19: 클래스 k에 대한 소프트맥스 점수**\n",
|
||
"\n",
|
||
"$\n",
|
||
"s_k(\\mathbf{x}) = ({\\mathbf{\\theta}^{(k)}})^T \\cdot \\mathbf{x}\n",
|
||
"$\n",
|
||
"\n",
|
||
"\n",
|
||
"**식 4-20: 소프트맥스 함수**\n",
|
||
"\n",
|
||
"$\n",
|
||
"\\hat{p}_k = \\sigma\\left(\\mathbf{s}(\\mathbf{x})\\right)_k = \\dfrac{\\exp\\left(s_k(\\mathbf{x})\\right)}{\\sum\\limits_{j=1}^{K}{\\exp\\left(s_j(\\mathbf{x})\\right)}}\n",
|
||
"$\n",
|
||
"\n",
|
||
"\n",
|
||
"**식 4-21: 소프트맥스 회귀 분류기의 예측**\n",
|
||
"\n",
|
||
"$\n",
|
||
"\\hat{y} = \\underset{k}{\\operatorname{argmax}} \\, \\sigma\\left(\\mathbf{s}(\\mathbf{x})\\right)_k = \\underset{k}{\\operatorname{argmax}} \\, s_k(\\mathbf{x}) = \\underset{k}{\\operatorname{argmax}} \\, \\left( ({\\mathbf{\\theta}^{(k)}})^T \\cdot \\mathbf{x} \\right)\n",
|
||
"$\n",
|
||
"\n",
|
||
"\n",
|
||
"**식 4-22: 크로스 엔트로피 비용 함수**\n",
|
||
"\n",
|
||
"$\n",
|
||
"J(\\mathbf{\\Theta}) = - \\dfrac{1}{m}\\sum\\limits_{i=1}^{m}\\sum\\limits_{k=1}^{K}{y_k^{(i)}\\log\\left(\\hat{p}_k^{(i)}\\right)}\n",
|
||
"$\n",
|
||
"\n",
|
||
"**두 확률 분포 $p$ 와 $q$ 사이의 크로스 엔트로피 (196 페이지):**\n",
|
||
"$ H(p, q) = -\\sum\\limits_{x}p(x) \\log q(x) $\n",
|
||
"\n",
|
||
"\n",
|
||
"**식 4-23: 클래스 k 에 대한 크로스 엔트로피의 그래디언트 벡터**\n",
|
||
"\n",
|
||
"$\n",
|
||
"\\nabla_{\\mathbf{\\theta}^{(k)}} \\, J(\\mathbf{\\Theta}) = \\dfrac{1}{m} \\sum\\limits_{i=1}^{m}{ \\left ( \\hat{p}^{(i)}_k - y_k^{(i)} \\right ) \\mathbf{x}^{(i)}}\n",
|
||
"$\n"
|
||
]
|
||
},
|
||
{
|
||
"metadata": {
|
||
"id": "sFBoMnuzn1Yz",
|
||
"colab_type": "text"
|
||
},
|
||
"cell_type": "markdown",
|
||
"source": [
|
||
"# 5장\n",
|
||
"**식 5-1: 가우시안 RBF**\n",
|
||
"\n",
|
||
"$\n",
|
||
"{\\displaystyle \\phi_{\\gamma}(\\mathbf{x}, \\mathbf{\\ell})} = {\\displaystyle \\exp({\\displaystyle -\\gamma \\left\\| \\mathbf{x} - \\mathbf{\\ell} \\right\\|^2})}\n",
|
||
"$\n",
|
||
"\n",
|
||
"\n",
|
||
"**식 5-2: 선형 SVM 분류기의 예측**\n",
|
||
"\n",
|
||
"$\n",
|
||
"\\hat{y} = \\begin{cases}\n",
|
||
" 0 & \\mathbf{w}^T \\cdot \\mathbf{x} + b < 0 \\text{일 때 } \\\\\n",
|
||
" 1 & \\mathbf{w}^T \\cdot \\mathbf{x} + b \\geq 0 \\text{일 때 }\n",
|
||
"\\end{cases}\n",
|
||
"$\n",
|
||
"\n",
|
||
"\n",
|
||
"**식 5-3: 하드 마진 선형 SVM 분류기의 목적 함수**\n",
|
||
"\n",
|
||
"$\n",
|
||
"\\begin{split}\n",
|
||
"&\\underset{\\mathbf{w}, b}{\\operatorname{minimize}}\\,{\\frac{1}{2}\\mathbf{w}^T \\cdot \\mathbf{w}} \\\\\n",
|
||
"&[\\text{조건}] \\, i = 1, 2, \\dots, m \\text{일 때} \\quad t^{(i)}(\\mathbf{w}^T \\cdot \\mathbf{x}^{(i)} + b) \\ge 1\n",
|
||
"\\end{split}\n",
|
||
"$\n",
|
||
"\n",
|
||
"\n",
|
||
"**식 5-4: 소프트 마진 선형 SVM 분류기의 목적 함수**\n",
|
||
"\n",
|
||
"$\n",
|
||
"\\begin{split}\n",
|
||
"&\\underset{\\mathbf{w}, b, \\mathbf{\\zeta}}{\\operatorname{minimize}}\\,{\\dfrac{1}{2}\\mathbf{w}^T \\cdot \\mathbf{w} + C \\sum\\limits_{i=1}^m{\\zeta^{(i)}}}\\\\\n",
|
||
"&[\\text{조건}] \\, i = 1, 2, \\dots, m \\text{일 때} \\quad t^{(i)}(\\mathbf{w}^T \\cdot \\mathbf{x}^{(i)} + b) \\ge 1 - \\zeta^{(i)} \\text{ 이고} \\quad \\zeta^{(i)} \\ge 0\n",
|
||
"\\end{split}\n",
|
||
"$\n",
|
||
"\n",
|
||
"\n",
|
||
"**식 5-5: QP 문제**\n",
|
||
"\n",
|
||
"$\n",
|
||
"\\begin{split}\n",
|
||
"\\underset{\\mathbf{p}}{\\text{minimize}} \\, & \\dfrac{1}{2} \\mathbf{p}^T \\cdot \\mathbf{H} \\cdot \\mathbf{p} \\, + \\, \\mathbf{f}^T \\cdot \\mathbf{p} \\\\\n",
|
||
"[\\text{조건}] \\, & \\mathbf{A} \\cdot \\mathbf{p} \\le \\mathbf{b} \\\\\n",
|
||
"\\text{여기서 } &\n",
|
||
"\\begin{cases}\n",
|
||
" \\mathbf{p} \\, \\text{는 }n_p\\text{ 차원의 벡터 (} n_p = \\text{모델 파라미터 수)}\\\\\n",
|
||
" \\mathbf{H} \\, \\text{는 }n_p \\times n_p \\text{ 크기 행렬}\\\\\n",
|
||
" \\mathbf{f} \\, \\text{는 }n_p\\text{ 차원의 벡터}\\\\\n",
|
||
" \\mathbf{A} \\, \\text{는 } n_c \\times n_p \\text{ 크기 행렬 (}n_c = \\text{제약 수)}\\\\\n",
|
||
" \\mathbf{b} \\, \\text{는 }n_c\\text{ 차원의 벡터}\n",
|
||
"\\end{cases}\n",
|
||
"\\end{split}\n",
|
||
"$\n",
|
||
"\n",
|
||
"\n",
|
||
"**식 5-6: 선형 SVM 목적 함수의 쌍대 형식**\n",
|
||
"\n",
|
||
"$\n",
|
||
"\\begin{split}\n",
|
||
"&\\underset{\\mathbf{\\alpha}}{\\operatorname{minimize}} \\,\n",
|
||
"\\dfrac{1}{2}\\sum\\limits_{i=1}^{m}{\n",
|
||
" \\sum\\limits_{j=1}^{m}{\n",
|
||
" \\alpha^{(i)} \\alpha^{(j)} t^{(i)} t^{(j)} {\\mathbf{x}^{(i)}}^T \\cdot \\mathbf{x}^{(j)}\n",
|
||
" }\n",
|
||
"} \\, - \\, \\sum\\limits_{i=1}^{m}{\\alpha^{(i)}}\\\\\n",
|
||
"&\\text{[조건]}\\,i = 1, 2, \\dots, m \\text{일 때 } \\quad \\alpha^{(i)} \\ge 0\n",
|
||
"\\end{split}\n",
|
||
"$\n",
|
||
"\n",
|
||
"\n",
|
||
"**식 5-7: 쌍대 문제에서 구한 해로 원 문제의 해 계산하기**\n",
|
||
"\n",
|
||
"$\n",
|
||
"\\begin{split}\n",
|
||
"&\\hat{\\mathbf{w}} = \\sum_{i=1}^{m}{\\hat{\\alpha}}^{(i)}t^{(i)}\\mathbf{x}^{(i)}\\\\\n",
|
||
"&\\hat{b} = \\dfrac{1}{n_s}\\sum\\limits_{\\scriptstyle i=1 \\atop {\\scriptstyle {\\hat{\\alpha}}^{(i)} > 0}}^{m}{\\left(1 - t^{(i)}({\\hat{\\mathbf{w}}}^T \\cdot \\mathbf{x}^{(i)})\\right)}\n",
|
||
"\\end{split}\n",
|
||
"$\n",
|
||
"\n",
|
||
"\n",
|
||
"**식 5-8: 2차 다항식 매핑**\n",
|
||
"\n",
|
||
"$\n",
|
||
"\\phi\\left(\\mathbf{x}\\right) = \\phi\\left( \\begin{pmatrix}\n",
|
||
" x_1 \\\\\n",
|
||
" x_2\n",
|
||
"\\end{pmatrix} \\right) = \\begin{pmatrix}\n",
|
||
" {x_1}^2 \\\\\n",
|
||
" \\sqrt{2} \\, x_1 x_2 \\\\\n",
|
||
" {x_2}^2\n",
|
||
"\\end{pmatrix}\n",
|
||
"$\n",
|
||
"\n",
|
||
"\n",
|
||
"**식 5-9: 2차 다항식 매핑을 위한 커널 트릭**\n",
|
||
"\n",
|
||
"$\n",
|
||
"\\begin{split}\n",
|
||
"\\phi(\\mathbf{a})^T \\cdot \\phi(\\mathbf{b}) & \\quad = \\begin{pmatrix}\n",
|
||
" {a_1}^2 \\\\\n",
|
||
" \\sqrt{2} \\, a_1 a_2 \\\\\n",
|
||
" {a_2}^2\n",
|
||
" \\end{pmatrix}^T \\cdot \\begin{pmatrix}\n",
|
||
" {b_1}^2 \\\\\n",
|
||
" \\sqrt{2} \\, b_1 b_2 \\\\\n",
|
||
" {b_2}^2\n",
|
||
"\\end{pmatrix} = {a_1}^2 {b_1}^2 + 2 a_1 b_1 a_2 b_2 + {a_2}^2 {b_2}^2 \\\\\n",
|
||
" & \\quad = \\left( a_1 b_1 + a_2 b_2 \\right)^2 = \\left( \\begin{pmatrix}\n",
|
||
" a_1 \\\\\n",
|
||
" a_2\n",
|
||
"\\end{pmatrix}^T \\cdot \\begin{pmatrix}\n",
|
||
" b_1 \\\\\n",
|
||
" b_2\n",
|
||
" \\end{pmatrix} \\right)^2 = (\\mathbf{a}^T \\cdot \\mathbf{b})^2\n",
|
||
"\\end{split}\n",
|
||
"$\n",
|
||
"\n",
|
||
"**커널 트릭에 관한 본문 중에서 (220 페이지):**\n",
|
||
"[...] 변환된 벡터의 점곱을 간단하게 $ ({\\mathbf{x}^{(i)}}^T \\cdot \\mathbf{x}^{(j)})^2 $ 으로 바꿀 수 있습니다.\n",
|
||
"\n",
|
||
"\n",
|
||
"**식 5-10: 일반적인 커널**\n",
|
||
"\n",
|
||
"$\n",
|
||
"\\begin{split}\n",
|
||
"\\text{선형:} & \\quad K(\\mathbf{a}, \\mathbf{b}) = \\mathbf{a}^T \\cdot \\mathbf{b} \\\\\n",
|
||
"\\text{다항식:} & \\quad K(\\mathbf{a}, \\mathbf{b}) = \\left(\\gamma \\mathbf{a}^T \\cdot \\mathbf{b} + r \\right)^d \\\\\n",
|
||
"\\text{가우시안 RBF:} & \\quad K(\\mathbf{a}, \\mathbf{b}) = \\exp({\\displaystyle -\\gamma \\left\\| \\mathbf{a} - \\mathbf{b} \\right\\|^2}) \\\\\n",
|
||
"\\text{시그모이드:} & \\quad K(\\mathbf{a}, \\mathbf{b}) = \\tanh\\left(\\gamma \\mathbf{a}^T \\cdot \\mathbf{b} + r\\right)\n",
|
||
"\\end{split}\n",
|
||
"$\n",
|
||
"\n",
|
||
"**식 5-11: 커널 SVM으로 예측하기**\n",
|
||
"\n",
|
||
"$\n",
|
||
"\\begin{split}\n",
|
||
"h_{\\hat{\\mathbf{w}}, \\hat{b}}\\left(\\phi(\\mathbf{x}^{(n)})\\right) & = \\,\\hat{\\mathbf{w}}^T \\cdot \\phi(\\mathbf{x}^{(n)}) + \\hat{b} = \\left(\\sum_{i=1}^{m}{\\hat{\\alpha}}^{(i)}t^{(i)}\\phi(\\mathbf{x}^{(i)})\\right)^T \\cdot \\phi(\\mathbf{x}^{(n)}) + \\hat{b}\\\\\n",
|
||
" & = \\, \\sum_{i=1}^{m}{\\hat{\\alpha}}^{(i)}t^{(i)}\\left(\\phi(\\mathbf{x}^{(i)})^T \\cdot \\phi(\\mathbf{x}^{(n)})\\right) + \\hat{b}\\\\\n",
|
||
" & = \\sum\\limits_{\\scriptstyle i=1 \\atop {\\scriptstyle {\\hat{\\alpha}}^{(i)} > 0}}^{m}{\\hat{\\alpha}}^{(i)}t^{(i)} K(\\mathbf{x}^{(i)}, \\mathbf{x}^{(n)}) + \\hat{b}\n",
|
||
"\\end{split}\n",
|
||
"$\n",
|
||
"\n",
|
||
"\n",
|
||
"**식 5-12: 커널 트릭을 사용한 편향 계산**\n",
|
||
"\n",
|
||
"$\n",
|
||
"\\begin{split}\n",
|
||
"\\hat{b} & = \\dfrac{1}{n_s}\\sum\\limits_{\\scriptstyle i=1 \\atop {\\scriptstyle {\\hat{\\alpha}}^{(i)} > 0}}^{m}{\\left(1 - t^{(i)}{\\hat{\\mathbf{w}}}^T \\cdot \\phi(\\mathbf{x}^{(i)})\\right)} = \\dfrac{1}{n_s}\\sum\\limits_{\\scriptstyle i=1 \\atop {\\scriptstyle {\\hat{\\alpha}}^{(i)} > 0}}^{m}{\\left(1 - t^{(i)}{\n",
|
||
" \\left(\\sum_{j=1}^{m}{\\hat{\\alpha}}^{(j)}t^{(j)}\\phi(\\mathbf{x}^{(j)})\\right)\n",
|
||
" }^T \\cdot \\phi(\\mathbf{x}^{(i)})\\right)}\\\\\n",
|
||
" & = \\dfrac{1}{n_s}\\sum\\limits_{\\scriptstyle i=1 \\atop {\\scriptstyle {\\hat{\\alpha}}^{(i)} > 0}}^{m}{\\left(1 - t^{(i)}\n",
|
||
"\\sum\\limits_{\\scriptstyle j=1 \\atop {\\scriptstyle {\\hat{\\alpha}}^{(j)} > 0}}^{m}{\n",
|
||
" {\\hat{\\alpha}}^{(j)} t^{(j)} K(\\mathbf{x}^{(i)},\\mathbf{x}^{(j)})\n",
|
||
"}\n",
|
||
"\\right)}\n",
|
||
"\\end{split}\n",
|
||
"$\n",
|
||
"\n",
|
||
"\n",
|
||
"**식 5-13: 선형 SVM 분류기 비용 함수**\n",
|
||
"\n",
|
||
"$\n",
|
||
"J(\\mathbf{w}, b) = \\dfrac{1}{2} \\mathbf{w}^T \\cdot \\mathbf{w} \\, + \\, C {\\displaystyle \\sum\\limits_{i=1}^{m}max\\left(0, 1 - t^{(i)}(\\mathbf{w}^T \\cdot \\mathbf{x}^{(i)} + b) \\right)}\n",
|
||
"$\n",
|
||
"\n",
|
||
"\n"
|
||
]
|
||
},
|
||
{
|
||
"metadata": {
|
||
"id": "JyogtW6Jn1Y0",
|
||
"colab_type": "text"
|
||
},
|
||
"cell_type": "markdown",
|
||
"source": [
|
||
"# Chapter 6\n",
|
||
"**Equation 6-1: Gini impurity**\n",
|
||
"\n",
|
||
"$\n",
|
||
"G_i = 1 - \\sum\\limits_{k=1}^{n}{{p_{i,k}}^2}\n",
|
||
"$\n",
|
||
"\n",
|
||
"\n",
|
||
"**Equation 6-2: CART cost function for classification**\n",
|
||
"\n",
|
||
"$\n",
|
||
"\\begin{split}\n",
|
||
"&J(k, t_k) = \\dfrac{m_{\\text{left}}}{m}G_\\text{left} + \\dfrac{m_{\\text{right}}}{m}G_{\\text{right}}\\\\\n",
|
||
"&\\text{where }\\begin{cases}\n",
|
||
"G_\\text{left/right} \\text{ measures the impurity of the left/right subset,}\\\\\n",
|
||
"m_\\text{left/right} \\text{ is the number of instances in the left/right subset.}\n",
|
||
"\\end{cases}\n",
|
||
"\\end{split}\n",
|
||
"$\n",
|
||
"\n",
|
||
"**Entropy computation example (page 173):**\n",
|
||
"\n",
|
||
"$ -\\frac{49}{54}\\log(\\frac{49}{54}) - \\frac{5}{54}\\log(\\frac{5}{54}) $\n",
|
||
"\n",
|
||
"\n",
|
||
"**Equation 6-3: Entropy**\n",
|
||
"\n",
|
||
"$\n",
|
||
"H_i = -\\sum\\limits_{k=1 \\atop p_{i,k} \\ne 0}^{n}{{p_{i,k}}\\log(p_{i,k})}\n",
|
||
"$\n",
|
||
"\n",
|
||
"\n",
|
||
"**Equation 6-4: CART cost function for regression**\n",
|
||
"\n",
|
||
"$\n",
|
||
"J(k, t_k) = \\dfrac{m_{\\text{left}}}{m}\\text{MSE}_\\text{left} + \\dfrac{m_{\\text{right}}}{m}\\text{MSE}_{\\text{right}} \\quad\n",
|
||
"\\text{where }\n",
|
||
"\\begin{cases}\n",
|
||
"\\text{MSE}_{\\text{node}} = \\sum\\limits_{\\scriptstyle i \\in \\text{node}}(\\hat{y}_{\\text{node}} - y^{(i)})^2\\\\\n",
|
||
"\\hat{y}_\\text{node} = \\dfrac{1}{m_{\\text{node}}}\\sum\\limits_{\\scriptstyle i \\in \\text{node}}y^{(i)}\n",
|
||
"\\end{cases}\n",
|
||
"$\n"
|
||
]
|
||
},
|
||
{
|
||
"metadata": {
|
||
"id": "mCEpcobOn1Y0",
|
||
"colab_type": "text"
|
||
},
|
||
"cell_type": "markdown",
|
||
"source": [
|
||
"# Chapter 7\n",
|
||
"\n",
|
||
"**Equation 7-1: Weighted error rate of the j^th^ predictor**\n",
|
||
"\n",
|
||
"$\n",
|
||
"r_j = \\dfrac{\\displaystyle \\sum\\limits_{\\textstyle {i=1 \\atop \\hat{y}_j^{(i)} \\ne y^{(i)}}}^{m}{w^{(i)}}}{\\displaystyle \\sum\\limits_{i=1}^{m}{w^{(i)}}} \\quad\n",
|
||
"\\text{where }\\hat{y}_j^{(i)}\\text{ is the }j^{\\text{th}}\\text{ predictor's prediction for the }i^{\\text{th}}\\text{ instance.}\n",
|
||
"$\n",
|
||
"\n",
|
||
"**Equation 7-2: Predictor weight**\n",
|
||
"\n",
|
||
"$\n",
|
||
"\\begin{split}\n",
|
||
"\\alpha_j = \\eta \\log{\\dfrac{1 - r_j}{r_j}}\n",
|
||
"\\end{split}\n",
|
||
"$\n",
|
||
"\n",
|
||
"\n",
|
||
"**Equation 7-3: Weight update rule**\n",
|
||
"\n",
|
||
"$\n",
|
||
"\\begin{split}\n",
|
||
"& \\text{ for } i = 1, 2, \\dots, m \\\\\n",
|
||
"& w^{(i)} \\leftarrow\n",
|
||
"\\begin{cases}\n",
|
||
"w^{(i)} & \\text{if }\\hat{y_j}^{(i)} = y^{(i)}\\\\\n",
|
||
"w^{(i)} \\exp(\\alpha_j) & \\text{if }\\hat{y_j}^{(i)} \\ne y^{(i)}\n",
|
||
"\\end{cases}\n",
|
||
"\\end{split}\n",
|
||
"$\n",
|
||
"\n",
|
||
"**In the text page 194:**\n",
|
||
"\n",
|
||
"Then all the instance weights are normalized (i.e., divided by $ \\sum_{i=1}^{m}{w^{(i)}} $).\n",
|
||
"\n",
|
||
"\n",
|
||
"**Equation 7-4: AdaBoost predictions**\n",
|
||
"\n",
|
||
"$\n",
|
||
"\\hat{y}(\\mathbf{x}) = \\underset{k}{\\operatorname{argmax}}{\\sum\\limits_{\\scriptstyle j=1 \\atop \\scriptstyle \\hat{y}_j(\\mathbf{x}) = k}^{N}{\\alpha_j}} \\quad \\text{where }N\\text{ is the number of predictors.}\n",
|
||
"$\n",
|
||
"\n",
|
||
"\n"
|
||
]
|
||
},
|
||
{
|
||
"metadata": {
|
||
"id": "SFoGMOCsn1Y1",
|
||
"colab_type": "text"
|
||
},
|
||
"cell_type": "markdown",
|
||
"source": [
|
||
"# Chapter 8\n",
|
||
"\n",
|
||
"**Equation 8-1: Principal components matrix**\n",
|
||
"\n",
|
||
"$\n",
|
||
"\\mathbf{V}^T =\n",
|
||
"\\begin{pmatrix}\n",
|
||
" \\mid & \\mid & & \\mid \\\\\n",
|
||
" \\mathbf{c_1} & \\mathbf{c_2} & \\cdots & \\mathbf{c_n} \\\\\n",
|
||
" \\mid & \\mid & & \\mid\n",
|
||
"\\end{pmatrix}\n",
|
||
"$\n",
|
||
"\n",
|
||
"\n",
|
||
"**Equation 8-2: Projecting the training set down to _d_ dimensions**\n",
|
||
"\n",
|
||
"$\n",
|
||
"\\mathbf{X}_{d\\text{-proj}} = \\mathbf{X} \\cdot \\mathbf{W}_d\n",
|
||
"$\n",
|
||
"\n",
|
||
"\n",
|
||
"**Equation 8-3: PCA inverse transformation, back to the original number of dimensions**\n",
|
||
"\n",
|
||
"$\n",
|
||
"\\mathbf{X}_{\\text{recovered}} = \\mathbf{X}_{d\\text{-proj}} \\cdot {\\mathbf{W}_d}^T\n",
|
||
"$\n",
|
||
"\n",
|
||
"\n",
|
||
"$ \\sum_{j=1}^{m}{w_{i,j}\\mathbf{x}^{(j)}} $\n",
|
||
"\n",
|
||
"\n",
|
||
"**Equation 8-4: LLE step 1: linearly modeling local relationships**\n",
|
||
"\n",
|
||
"$\n",
|
||
"\\begin{split}\n",
|
||
"& \\hat{\\mathbf{W}} = \\underset{\\mathbf{W}}{\\operatorname{argmin}}{\\displaystyle \\sum\\limits_{i=1}^{m}} \\left\\|\\mathbf{x}^{(i)} - \\sum\\limits_{j=1}^{m}{w_{i,j}}\\mathbf{x}^{(j)}\\right\\|^2\\\\\n",
|
||
"& \\text{subject to }\n",
|
||
"\\begin{cases}\n",
|
||
" w_{i,j}=0 & \\text{if }\\mathbf{x}^{(j)} \\text{ is not one of the }k\\text{ c.n. of }\\mathbf{x}^{(i)}\\\\\n",
|
||
" \\sum\\limits_{j=1}^{m}w_{i,j} = 1 & \\text{for }i=1, 2, \\dots, m\n",
|
||
"\\end{cases}\n",
|
||
"\\end{split}\n",
|
||
"$\n",
|
||
"\n",
|
||
"**In the text page 223:**\n",
|
||
"\n",
|
||
"[...] then we want the squared distance between $\\mathbf{z}^{(i)}$ and $ \\sum_{j=1}^{m}{\\hat{w}_{i,j}\\mathbf{z}^{(j)}} $ to be as small as possible.\n",
|
||
"\n",
|
||
"\n",
|
||
"**Equation 8-5: LLE step 2: reducing dimensionality while preserving relationships**\n",
|
||
"\n",
|
||
"$\n",
|
||
"\\hat{\\mathbf{Z}} = \\underset{\\mathbf{Z}}{\\operatorname{argmin}}{\\displaystyle \\sum\\limits_{i=1}^{m}} \\left\\|\\mathbf{z}^{(i)} - \\sum\\limits_{j=1}^{m}{\\hat{w}_{i,j}}\\mathbf{z}^{(j)}\\right\\|^2\n",
|
||
"$\n"
|
||
]
|
||
},
|
||
{
|
||
"metadata": {
|
||
"id": "tzBsxT-tn1Y2",
|
||
"colab_type": "text"
|
||
},
|
||
"cell_type": "markdown",
|
||
"source": [
|
||
"# Chapter 9\n",
|
||
"\n",
|
||
"**Equation 9-1: Rectified linear unit**\n",
|
||
"\n",
|
||
"$\n",
|
||
"h_{\\mathbf{w}, b}(\\mathbf{X}) = \\max(\\mathbf{X} \\cdot \\mathbf{w} + b, 0)\n",
|
||
"$"
|
||
]
|
||
},
|
||
{
|
||
"metadata": {
|
||
"id": "PCBRHj6wn1Y3",
|
||
"colab_type": "text"
|
||
},
|
||
"cell_type": "markdown",
|
||
"source": [
|
||
"# Chapter 10\n",
|
||
"\n",
|
||
"**Equation 10-1: Common step functions used in Perceptrons**\n",
|
||
"\n",
|
||
"$\n",
|
||
"\\begin{split}\n",
|
||
"\\operatorname{heaviside}(z) =\n",
|
||
"\\begin{cases}\n",
|
||
"0 & \\text{if }z < 0\\\\\n",
|
||
"1 & \\text{if }z \\ge 0\n",
|
||
"\\end{cases} & \\quad\\quad\n",
|
||
"\\operatorname{sgn}(z) =\n",
|
||
"\\begin{cases}\n",
|
||
"-1 & \\text{if }z < 0\\\\\n",
|
||
"0 & \\text{if }z = 0\\\\\n",
|
||
"+1 & \\text{if }z > 0\n",
|
||
"\\end{cases}\n",
|
||
"\\end{split}\n",
|
||
"$\n",
|
||
"\n",
|
||
"\n",
|
||
"**Equation 10-2: Perceptron learning rule (weight update)**\n",
|
||
"\n",
|
||
"$\n",
|
||
"{w_{i,j}}^{(\\text{next step})} = w_{i,j} + \\eta (y_j - \\hat{y}_j) x_i\n",
|
||
"$\n",
|
||
"\n",
|
||
"\n",
|
||
"**In the text page 266:**\n",
|
||
"\n",
|
||
"It will be initialized randomly, using a truncated normal (Gaussian) distribution with a standard deviation of $ 2 / \\sqrt{\\text{n}_\\text{inputs}} $.\n"
|
||
]
|
||
},
|
||
{
|
||
"metadata": {
|
||
"id": "lZkG8wkrn1Y3",
|
||
"colab_type": "text"
|
||
},
|
||
"cell_type": "markdown",
|
||
"source": [
|
||
"# Chapter 11\n",
|
||
"**Equation 11-1: Xavier initialization (when using the logistic activation function)**\n",
|
||
"\n",
|
||
"$\n",
|
||
"\\begin{split}\n",
|
||
"& \\text{Normal distribution with mean 0 and standard deviation }\n",
|
||
"\\sigma = \\sqrt{\\dfrac{2}{n_\\text{inputs} + n_\\text{outputs}}}\\\\\n",
|
||
"& \\text{Or a uniform distribution between -r and +r, with }\n",
|
||
"r = \\sqrt{\\dfrac{6}{n_\\text{inputs} + n_\\text{outputs}}}\n",
|
||
"\\end{split}\n",
|
||
"$\n",
|
||
"\n",
|
||
"**In the text page 278:**\n",
|
||
"\n",
|
||
"When the number of input connections is roughly equal to the number of output\n",
|
||
"connections, you get simpler equations (e.g., $ \\sigma = 1 / \\sqrt{n_\\text{inputs}} $ or $ r = \\sqrt{3} / \\sqrt{n_\\text{inputs}} $).\n",
|
||
"\n",
|
||
"**Table 11-1: Initialization parameters for each type of activation function**\n",
|
||
"\n",
|
||
"* Logistic uniform: $ r = \\sqrt{\\dfrac{6}{n_\\text{inputs} + n_\\text{outputs}}} $\n",
|
||
"* Logistic normal: $ \\sigma = \\sqrt{\\dfrac{2}{n_\\text{inputs} + n_\\text{outputs}}} $\n",
|
||
"* Hyperbolic tangent uniform: $ r = 4 \\sqrt{\\dfrac{6}{n_\\text{inputs} + n_\\text{outputs}}} $\n",
|
||
"* Hyperbolic tangent normal: $ \\sigma = 4 \\sqrt{\\dfrac{2}{n_\\text{inputs} + n_\\text{outputs}}} $\n",
|
||
"* ReLU (and its variants) uniform: $ r = \\sqrt{2} \\sqrt{\\dfrac{6}{n_\\text{inputs} + n_\\text{outputs}}} $\n",
|
||
"* ReLU (and its variants) normal: $ \\sigma = \\sqrt{2} \\sqrt{\\dfrac{2}{n_\\text{inputs} + n_\\text{outputs}}} $\n",
|
||
"\n",
|
||
"**Equation 11-2: ELU activation function**\n",
|
||
"\n",
|
||
"$\n",
|
||
"\\operatorname{ELU}_\\alpha(z) =\n",
|
||
"\\begin{cases}\n",
|
||
"\\alpha(\\exp(z) - 1) & \\text{if } z < 0\\\\\n",
|
||
"z & if z \\ge 0\n",
|
||
"\\end{cases}\n",
|
||
"$\n",
|
||
"\n",
|
||
"\n",
|
||
"**Equation 11-3: Batch Normalization algorithm**\n",
|
||
"\n",
|
||
"$\n",
|
||
"\\begin{split}\n",
|
||
"1.\\quad & \\mathbf{\\mu}_B = \\dfrac{1}{m_B}\\sum\\limits_{i=1}^{m_B}{\\mathbf{x}^{(i)}}\\\\\n",
|
||
"2.\\quad & {\\mathbf{\\sigma}_B}^2 = \\dfrac{1}{m_B}\\sum\\limits_{i=1}^{m_B}{(\\mathbf{x}^{(i)} - \\mathbf{\\mu}_B)^2}\\\\\n",
|
||
"3.\\quad & \\hat{\\mathbf{x}}^{(i)} = \\dfrac{\\mathbf{x}^{(i)} - \\mathbf{\\mu}_B}{\\sqrt{{\\mathbf{\\sigma}_B}^2 + \\epsilon}}\\\\\n",
|
||
"4.\\quad & \\mathbf{z}^{(i)} = \\gamma \\hat{\\mathbf{x}}^{(i)} + \\beta\n",
|
||
"\\end{split}\n",
|
||
"$\n",
|
||
"\n",
|
||
"**In the text page 285:**\n",
|
||
"\n",
|
||
"[...] given a new value $v$, the running average $v$ is updated through the equation:\n",
|
||
"\n",
|
||
"$ \\hat{v} \\gets \\hat{v} \\times \\text{momentum} + v \\times (1 - \\text{momentum}) $\n",
|
||
"\n",
|
||
"**Equation 11-4: Momentum algorithm**\n",
|
||
"\n",
|
||
"1. $\\mathbf{m} \\gets \\beta \\mathbf{m} - \\eta \\nabla_\\mathbf{\\theta}J(\\mathbf{\\theta})$\n",
|
||
"2. $\\mathbf{\\theta} \\gets \\mathbf{\\theta} + \\mathbf{m}$\n",
|
||
"\n",
|
||
"**In the text page 296:**\n",
|
||
"\n",
|
||
"You can easily verify that if the gradient remains constant, the terminal velocity (i.e., the maximum size of the weight updates) is equal to that gradient multiplied by the learning rate η multiplied by $ \\frac{1}{1 - \\beta} $.\n",
|
||
"\n",
|
||
"\n",
|
||
"**Equation 11-5: Nesterov Accelerated Gradient algorithm**\n",
|
||
"\n",
|
||
"1. $\\mathbf{m} \\gets \\beta \\mathbf{m} - \\eta \\nabla_\\mathbf{\\theta}J(\\mathbf{\\theta} + \\beta \\mathbf{m})$\n",
|
||
"2. $\\mathbf{\\theta} \\gets \\mathbf{\\theta} + \\mathbf{m}$\n",
|
||
"\n",
|
||
"**Equation 11-6: AdaGrad algorithm**\n",
|
||
"\n",
|
||
"1. $\\mathbf{s} \\gets \\mathbf{s} + \\nabla_\\mathbf{\\theta}J(\\mathbf{\\theta}) \\otimes \\nabla_\\mathbf{\\theta}J(\\mathbf{\\theta})$\n",
|
||
"2. $\\mathbf{\\theta} \\gets \\mathbf{\\theta} - \\eta \\, \\nabla_\\mathbf{\\theta}J(\\mathbf{\\theta}) \\oslash {\\sqrt{\\mathbf{s} + \\epsilon}}$\n",
|
||
"\n",
|
||
"**In the text page 298-299:**\n",
|
||
"\n",
|
||
"This vectorized form is equivalent to computing $s_i \\gets s_i + \\left( \\dfrac{\\partial J(\\mathbf{\\theta})}{\\partial \\theta_i} \\right)^2$ for each element $s_i$ of the vector $\\mathbf{s}$.\n",
|
||
"\n",
|
||
"**In the text page 299:**\n",
|
||
"\n",
|
||
"This vectorized form is equivalent to computing $ \\theta_i \\gets \\theta_i - \\eta \\, \\dfrac{\\partial J(\\mathbf{\\theta})}{\\partial \\theta_i} \\dfrac{1}{\\sqrt{s_i + \\epsilon}} $ for all parameters $\\theta_i$ (simultaneously).\n",
|
||
"\n",
|
||
"\n",
|
||
"**Equation 11-7: RMSProp algorithm**\n",
|
||
"\n",
|
||
"1. $\\mathbf{s} \\gets \\beta \\mathbf{s} + (1 - \\beta ) \\nabla_\\mathbf{\\theta}J(\\mathbf{\\theta}) \\otimes \\nabla_\\mathbf{\\theta}J(\\mathbf{\\theta})$\n",
|
||
"2. $\\mathbf{\\theta} \\gets \\mathbf{\\theta} - \\eta \\, \\nabla_\\mathbf{\\theta}J(\\mathbf{\\theta}) \\oslash {\\sqrt{\\mathbf{s} + \\epsilon}}$\n",
|
||
"\n",
|
||
"\n",
|
||
"**Equation 11-8: Adam algorithm**\n",
|
||
"\n",
|
||
"1. $\\mathbf{m} \\gets \\beta_1 \\mathbf{m} - (1 - \\beta_1) \\nabla_\\mathbf{\\theta}J(\\mathbf{\\theta})$\n",
|
||
"2. $\\mathbf{s} \\gets \\beta_2 \\mathbf{s} + (1 - \\beta_2) \\nabla_\\mathbf{\\theta}J(\\mathbf{\\theta}) \\otimes \\nabla_\\mathbf{\\theta}J(\\mathbf{\\theta})$\n",
|
||
"3. $\\mathbf{m} \\gets \\left(\\dfrac{\\mathbf{m}}{1 - {\\beta_1}^T}\\right)$\n",
|
||
"4. $\\mathbf{s} \\gets \\left(\\dfrac{\\mathbf{s}}{1 - {\\beta_2}^T}\\right)$\n",
|
||
"5. $\\mathbf{\\theta} \\gets \\mathbf{\\theta} + \\eta \\, \\mathbf{m} \\oslash {\\sqrt{\\mathbf{s} + \\epsilon}}$\n",
|
||
"\n",
|
||
"**In the text page 309:**\n",
|
||
"\n",
|
||
"We typically implement this constraint by computing $\\left\\| \\mathbf{w} \\right\\|_2$ after each training step\n",
|
||
"and clipping $\\mathbf{w}$ if needed $ \\left( \\mathbf{w} \\gets \\mathbf{w} \\dfrac{r}{\\left\\| \\mathbf{w} \\right\\|_2} \\right) $.\n",
|
||
"\n",
|
||
"\n"
|
||
]
|
||
},
|
||
{
|
||
"metadata": {
|
||
"id": "cFSDXAOzn1Y4",
|
||
"colab_type": "text"
|
||
},
|
||
"cell_type": "markdown",
|
||
"source": [
|
||
"# Chapter 13\n",
|
||
"\n",
|
||
"**Equation 13-1: Computing the output of a neuron in a convolutional layer**\n",
|
||
"\n",
|
||
"$\n",
|
||
"z_{i,j,k} = b_k + \\sum\\limits_{u = 0}^{f_h - 1} \\, \\, \\sum\\limits_{v = 0}^{f_w - 1} \\, \\, \\sum\\limits_{k' = 0}^{f_{n'} - 1} \\, \\, x_{i', j', k'} . w_{u, v, k', k}\n",
|
||
"\\quad \\text{with }\n",
|
||
"\\begin{cases}\n",
|
||
"i' = i \\times s_h + u \\\\\n",
|
||
"j' = j \\times s_w + v\n",
|
||
"\\end{cases}\n",
|
||
"$\n",
|
||
"\n",
|
||
"**Equation 13-2: Local response normalization**\n",
|
||
"\n",
|
||
"$\n",
|
||
"b_i = a_i \\left(k + \\alpha \\sum\\limits_{j=j_\\text{low}}^{j_\\text{high}}{{a_j}^2} \\right)^{-\\beta} \\quad \\text{with }\n",
|
||
"\\begin{cases}\n",
|
||
" j_\\text{high} = \\min\\left(i + \\dfrac{r}{2}, f_n-1\\right) \\\\\n",
|
||
" j_\\text{low} = \\max\\left(0, i - \\dfrac{r}{2}\\right)\n",
|
||
"\\end{cases}\n",
|
||
"$\n",
|
||
"\n",
|
||
"\n"
|
||
]
|
||
},
|
||
{
|
||
"metadata": {
|
||
"id": "1SiAf4FTn1Y4",
|
||
"colab_type": "text"
|
||
},
|
||
"cell_type": "markdown",
|
||
"source": [
|
||
"# Chapter 14\n",
|
||
"\n",
|
||
"**Equation 14-1: Output of a single recurrent neuron for a single instance**\n",
|
||
"\n",
|
||
"$\n",
|
||
"\\mathbf{y}_{(t)} = \\phi\\left({{\\mathbf{x}_{(t)}}^T \\cdot \\mathbf{w}_x} + {\\mathbf{y}_{(t-1)}}^T \\cdot {\\mathbf{w}_y} + b \\right)\n",
|
||
"$\n",
|
||
"\n",
|
||
"\n",
|
||
"**Equation 14-2: Outputs of a layer of recurrent neurons for all instances in a mini-batch**\n",
|
||
"\n",
|
||
"$\n",
|
||
"\\begin{split}\n",
|
||
"\\mathbf{Y}_{(t)} & = \\phi\\left(\\mathbf{X}_{(t)} \\cdot \\mathbf{W}_{x} + \\mathbf{Y}_{(t-1)}\\cdot \\mathbf{W}_{y} + \\mathbf{b} \\right) \\\\\n",
|
||
"& = \\phi\\left(\n",
|
||
"\\left[\\mathbf{X}_{(t)} \\quad \\mathbf{Y}_{(t-1)} \\right]\n",
|
||
" \\cdot \\mathbf{W} + \\mathbf{b} \\right) \\text{ with } \\mathbf{W}=\n",
|
||
"\\left[ \\begin{matrix}\n",
|
||
" \\mathbf{W}_x\\\\\n",
|
||
" \\mathbf{W}_y\n",
|
||
"\\end{matrix} \\right]\n",
|
||
"\\end{split}\n",
|
||
"$\n",
|
||
"\n",
|
||
"**In the text page 391:**\n",
|
||
"\n",
|
||
"Just like in regular backpropagation, there is a first forward pass through the unrolled network (represented by the dashed arrows); then the output sequence is evaluated using a cost function $ C(\\mathbf{Y}_{(t_\\text{min})}, \\mathbf{Y}_{(t_\\text{min}+1)}, \\dots, \\mathbf{Y}_{(t_\\text{max})}) $ (where $t_\\text{min}$ and $t_\\text{max}$ are the first and last output time steps, not counting the ignored outputs)[...]\n",
|
||
"\n",
|
||
"\n",
|
||
"**Equation 14-3: LSTM computations**\n",
|
||
"\n",
|
||
"$\n",
|
||
"\\begin{split}\n",
|
||
"\\mathbf{i}_{(t)}&=\\sigma({\\mathbf{W}_{xi}}^T \\cdot \\mathbf{x}_{(t)} + {\\mathbf{W}_{hi}}^T \\cdot \\mathbf{h}_{(t-1)} + \\mathbf{b}_i)\\\\\n",
|
||
"\\mathbf{f}_{(t)}&=\\sigma({\\mathbf{W}_{xf}}^T \\cdot \\mathbf{x}_{(t)} + {\\mathbf{W}_{hf}}^T \\cdot \\mathbf{h}_{(t-1)} + \\mathbf{b}_f)\\\\\n",
|
||
"\\mathbf{o}_{(t)}&=\\sigma({\\mathbf{W}_{xo}}^T \\cdot \\mathbf{x}_{(t)} + {\\mathbf{W}_{ho}}^T \\cdot \\mathbf{h}_{(t-1)} + \\mathbf{b}_o)\\\\\n",
|
||
"\\mathbf{g}_{(t)}&=\\operatorname{tanh}({\\mathbf{W}_{xg}}^T \\cdot \\mathbf{x}_{(t)} + {\\mathbf{W}_{hg}}^T \\cdot \\mathbf{h}_{(t-1)} + \\mathbf{b}_g)\\\\\n",
|
||
"\\mathbf{c}_{(t)}&=\\mathbf{f}_{(t)} \\otimes \\mathbf{c}_{(t-1)} \\, + \\, \\mathbf{i}_{(t)} \\otimes \\mathbf{g}_{(t)}\\\\\n",
|
||
"\\mathbf{y}_{(t)}&=\\mathbf{h}_{(t)} = \\mathbf{o}_{(t)} \\otimes \\operatorname{tanh}(\\mathbf{c}_{(t)})\n",
|
||
"\\end{split}\n",
|
||
"$\n",
|
||
"\n",
|
||
"\n",
|
||
"**Equation 14-4: GRU computations**\n",
|
||
"\n",
|
||
"$\n",
|
||
"\\begin{split}\n",
|
||
"\\mathbf{z}_{(t)}&=\\sigma({\\mathbf{W}_{xz}}^T \\cdot \\mathbf{x}_{(t)} + {\\mathbf{W}_{hz}}^T \\cdot \\mathbf{h}_{(t-1)}) \\\\\n",
|
||
"\\mathbf{r}_{(t)}&=\\sigma({\\mathbf{W}_{xr}}^T \\cdot \\mathbf{x}_{(t)} + {\\mathbf{W}_{hr}}^T \\cdot \\mathbf{h}_{(t-1)}) \\\\\n",
|
||
"\\mathbf{g}_{(t)}&=\\operatorname{tanh}\\left({\\mathbf{W}_{xg}}^T \\cdot \\mathbf{x}_{(t)} + {\\mathbf{W}_{hg}}^T \\cdot (\\mathbf{r}_{(t)} \\otimes \\mathbf{h}_{(t-1)})\\right) \\\\\n",
|
||
"\\mathbf{h}_{(t)}&=(1-\\mathbf{z}_{(t)}) \\otimes \\mathbf{h}_{(t-1)} + \\mathbf{z}_{(t)} \\otimes \\mathbf{g}_{(t)}\n",
|
||
"\\end{split}\n",
|
||
"$\n",
|
||
"\n",
|
||
"\n"
|
||
]
|
||
},
|
||
{
|
||
"metadata": {
|
||
"id": "5IiIkIG_n1Y5",
|
||
"colab_type": "text"
|
||
},
|
||
"cell_type": "markdown",
|
||
"source": [
|
||
"# Chapter 15\n",
|
||
"\n",
|
||
"**Equation 15-1: Kullback–Leibler divergence**\n",
|
||
"\n",
|
||
"$\n",
|
||
"D_{\\mathrm{KL}}(P\\|Q) = \\sum\\limits_{i} P(i) \\log \\dfrac{P(i)}{Q(i)}\n",
|
||
"$\n",
|
||
"\n",
|
||
"\n",
|
||
"**Equation: KL divergence between the target sparsity _p_ and the actual sparsity _q_**\n",
|
||
"\n",
|
||
"$\n",
|
||
"D_{\\mathrm{KL}}(p\\|q) = p \\, \\log \\dfrac{p}{q} + (1-p) \\log \\dfrac{1-p}{1-q}\n",
|
||
"$\n",
|
||
"\n",
|
||
"**In the text page 433:**\n",
|
||
"\n",
|
||
"One common variant is to train the encoder to output $\\gamma = \\log\\left(\\sigma^2\\right)$ rather than $\\sigma$.\n",
|
||
"Wherever we need $\\sigma$ we can just compute $ \\sigma = \\exp\\left(\\dfrac{\\gamma}{2}\\right) $.\n",
|
||
"\n",
|
||
"\n"
|
||
]
|
||
},
|
||
{
|
||
"metadata": {
|
||
"id": "wVr9eBb1n1Y6",
|
||
"colab_type": "text"
|
||
},
|
||
"cell_type": "markdown",
|
||
"source": [
|
||
"# Chapter 16\n",
|
||
"\n",
|
||
"**Equation 16-1: Bellman Optimality Equation**\n",
|
||
"\n",
|
||
"$\n",
|
||
"V^*(s) = \\underset{a}{\\max}\\sum\\limits_{s'}{T(s, a, s') [R(s, a, s') + \\gamma . V^*(s')]} \\quad \\text{for all }s\n",
|
||
"$\n",
|
||
"\n",
|
||
"**Equation 16-2: Value Iteration algorithm**\n",
|
||
"\n",
|
||
"$\n",
|
||
" V_{k+1}(s) \\gets \\underset{a}{\\max}\\sum\\limits_{s'}{T(s, a, s') [R(s, a, s') + \\gamma . V_k(s')]} \\quad \\text{for all }s\n",
|
||
"$\n",
|
||
"\n",
|
||
"\n",
|
||
"**Equation 16-3: Q-Value Iteration algorithm**\n",
|
||
"\n",
|
||
"$\n",
|
||
" Q_{k+1}(s, a) \\gets \\sum\\limits_{s'}{T(s, a, s') [R(s, a, s') + \\gamma . \\underset{a'}{\\max}\\,{Q_k(s',a')}]} \\quad \\text{for all } (s,a)\n",
|
||
"$\n",
|
||
"\n",
|
||
"**In the text page 458:**\n",
|
||
"\n",
|
||
"Once you have the optimal Q-Values, defining the optimal policy, noted $\\pi^{*}(s)$, is trivial: when the agent is in state $s$, it should choose the action with the highest Q-Value for that state: $ \\pi^{*}(s) = \\underset{a}{\\operatorname{argmax}} \\, Q^*(s, a) $.\n",
|
||
"\n",
|
||
"\n",
|
||
"**Equation 16-4: TD Learning algorithm**\n",
|
||
"\n",
|
||
"$\n",
|
||
"V_{k+1}(s) \\gets (1-\\alpha)V_k(s) + \\alpha\\left(r + \\gamma . V_k(s')\\right)\n",
|
||
"$\n",
|
||
"\n",
|
||
"\n",
|
||
"**Equation 16-5: Q-Learning algorithm**\n",
|
||
"\n",
|
||
"$\n",
|
||
"Q_{k+1}(s, a) \\gets (1-\\alpha)Q_k(s,a) + \\alpha\\left(r + \\gamma . \\underset{a'}{\\max} \\, Q_k(s', a')\\right)\n",
|
||
"$\n",
|
||
"\n",
|
||
"\n",
|
||
"**Equation 16-6: Q-Learning using an exploration function**\n",
|
||
"\n",
|
||
"$\n",
|
||
" Q(s, a) \\gets (1-\\alpha)Q(s,a) + \\alpha\\left(r + \\gamma . \\underset{\\alpha'}{\\max}f(Q(s', a'), N(s', a'))\\right)\n",
|
||
"$\n",
|
||
"\n",
|
||
"\n",
|
||
"**Equation 16-7: Deep Q-Learning cost function**\n",
|
||
"\n",
|
||
"$\n",
|
||
"\\begin{split}\n",
|
||
"& J(\\mathbf{\\theta}_\\text{critic}) = \\dfrac{1}{m}\\sum\\limits_{i=1}^m\\left(y^{(i)} - Q(s^{(i)},a^{(i)},\\mathbf{\\theta}_\\text{critic})\\right)^2 \\\\\n",
|
||
"& \\text{with } y^{(i)} = r^{(i)} + \\gamma . \\underset{a'}{\\max}Q(s'^{(i)},a',\\mathbf{\\theta}_\\text{actor})\n",
|
||
"\\end{split}\n",
|
||
"$\n"
|
||
]
|
||
},
|
||
{
|
||
"metadata": {
|
||
"id": "1U0nBdBvn1Y7",
|
||
"colab_type": "text"
|
||
},
|
||
"cell_type": "markdown",
|
||
"source": [
|
||
"# Appendix A\n",
|
||
"\n",
|
||
"Equations that appear in the text:\n",
|
||
"\n",
|
||
"$\n",
|
||
"\\mathbf{H} =\n",
|
||
"\\begin{pmatrix}\n",
|
||
"\\mathbf{H'} & 0 & \\cdots\\\\\n",
|
||
"0 & 0 & \\\\\n",
|
||
"\\vdots & & \\ddots\n",
|
||
"\\end{pmatrix}\n",
|
||
"$\n",
|
||
"\n",
|
||
"\n",
|
||
"$\n",
|
||
"\\mathbf{A} =\n",
|
||
"\\begin{pmatrix}\n",
|
||
"\\mathbf{A'} & \\mathbf{I}_m \\\\\n",
|
||
"\\mathbf{0} & -\\mathbf{I}_m\n",
|
||
"\\end{pmatrix}\n",
|
||
"$\n",
|
||
"\n",
|
||
"\n",
|
||
"$ 1 - \\frac{1}{5}^2 - \\frac{4}{5}^2 $\n",
|
||
"\n",
|
||
"\n",
|
||
"$ 1 - \\frac{1}{2}^2 - \\frac{1}{2}^2 $\n",
|
||
"\n",
|
||
"\n",
|
||
"$ \\frac{2}{5} \\times $\n",
|
||
"\n",
|
||
"\n",
|
||
"$ \\frac{3}{5} \\times 0 $"
|
||
]
|
||
},
|
||
{
|
||
"metadata": {
|
||
"id": "dphwGCobn1Y7",
|
||
"colab_type": "text"
|
||
},
|
||
"cell_type": "markdown",
|
||
"source": [
|
||
"# Appendix C"
|
||
]
|
||
},
|
||
{
|
||
"metadata": {
|
||
"id": "vH7f8-Min1Y7",
|
||
"colab_type": "text"
|
||
},
|
||
"cell_type": "markdown",
|
||
"source": [
|
||
"Equations that appear in the text:\n",
|
||
"\n",
|
||
"$ (\\hat{x}, \\hat{y}) $\n",
|
||
"\n",
|
||
"\n",
|
||
"$ \\hat{\\alpha} $\n",
|
||
"\n",
|
||
"\n",
|
||
"$ (\\hat{x}, \\hat{y}, \\hat{\\alpha}) $\n",
|
||
"\n",
|
||
"\n",
|
||
"$\n",
|
||
"\\begin{cases}\n",
|
||
"\\frac{\\partial}{\\partial x}g(x, y, \\alpha) = 2x - 3\\alpha\\\\\n",
|
||
"\\frac{\\partial}{\\partial y}g(x, y, \\alpha) = 2 - 2\\alpha\\\\\n",
|
||
"\\frac{\\partial}{\\partial \\alpha}g(x, y, \\alpha) = -3x - 2y - 1\\\\\n",
|
||
"\\end{cases}\n",
|
||
"$\n",
|
||
"\n",
|
||
"\n",
|
||
"$ 2\\hat{x} - 3\\hat{\\alpha} = 2 - 2\\hat{\\alpha} = -3\\hat{x} - 2\\hat{y} - 1 = 0 $\n",
|
||
"\n",
|
||
"\n",
|
||
"$ \\hat{x} = \\frac{3}{2} $\n",
|
||
"\n",
|
||
"\n",
|
||
"$ \\hat{y} = -\\frac{11}{4} $\n",
|
||
"\n",
|
||
"\n",
|
||
"$ \\hat{\\alpha} = 1 $\n",
|
||
"\n",
|
||
"\n",
|
||
"**Equation C-1: Generalized Lagrangian for the hard margin problem**\n",
|
||
"\n",
|
||
"$\n",
|
||
"\\begin{split}\n",
|
||
"\\mathcal{L}(\\mathbf{w}, b, \\mathbf{\\alpha}) = \\frac{1}{2}\\mathbf{w}^T \\cdot \\mathbf{w} - \\sum\\limits_{i=1}^{m}{\\alpha^{(i)} \\left(t^{(i)}(\\mathbf{w}^T \\cdot \\mathbf{x}^{(i)} + b) - 1\\right)} \\\\\n",
|
||
"\\text{with}\\quad \\alpha^{(i)} \\ge 0 \\quad \\text{for }i = 1, 2, \\dots, m\n",
|
||
"\\end{split}\n",
|
||
"$\n",
|
||
"\n",
|
||
"**More equations in the text:**\n",
|
||
"\n",
|
||
"$ (\\hat{\\mathbf{w}}, \\hat{b}, \\hat{\\mathbf{\\alpha}}) $\n",
|
||
"\n",
|
||
"\n",
|
||
"$ t^{(i)}((\\hat{\\mathbf{w}})^T \\cdot \\mathbf{x}^{(i)} + \\hat{b}) \\ge 1 \\quad \\text{for } i = 1, 2, \\dots, m $\n",
|
||
"\n",
|
||
"\n",
|
||
"$ {\\hat{\\alpha}}^{(i)} \\ge 0 \\quad \\text{for } i = 1, 2, \\dots, m $\n",
|
||
"\n",
|
||
"\n",
|
||
"$ {\\hat{\\alpha}}^{(i)} = 0 $\n",
|
||
"\n",
|
||
"\n",
|
||
"$ t^{(i)}((\\hat{\\mathbf{w}})^T \\cdot \\mathbf{x}^{(i)} + \\hat{b}) = 1 $\n",
|
||
"\n",
|
||
"\n",
|
||
"$ {\\hat{\\alpha}}^{(i)} = 0 $\n",
|
||
"\n",
|
||
"\n",
|
||
"**Equation C-2: Partial derivatives of the generalized Lagrangian**\n",
|
||
"\n",
|
||
"$\n",
|
||
"\\begin{split}\n",
|
||
"\\nabla_{\\mathbf{w}}\\mathcal{L}(\\mathbf{w}, b, \\mathbf{\\alpha}) = \\mathbf{w} - \\sum\\limits_{i=1}^{m}\\alpha^{(i)}t^{(i)}\\mathbf{x}^{(i)}\\\\\n",
|
||
"\\dfrac{\\partial}{\\partial b}\\mathcal{L}(\\mathbf{w}, b, \\mathbf{\\alpha}) = -\\sum\\limits_{i=1}^{m}\\alpha^{(i)}t^{(i)}\n",
|
||
"\\end{split}\n",
|
||
"$\n",
|
||
"\n",
|
||
"\n",
|
||
"**Equation C-3: Properties of the stationary points**\n",
|
||
"\n",
|
||
"$\n",
|
||
"\\begin{split}\n",
|
||
"\\hat{\\mathbf{w}} = \\sum_{i=1}^{m}{\\hat{\\alpha}}^{(i)}t^{(i)}\\mathbf{x}^{(i)}\\\\\n",
|
||
"\\sum_{i=1}^{m}{\\hat{\\alpha}}^{(i)}t^{(i)} = 0\n",
|
||
"\\end{split}\n",
|
||
"$\n",
|
||
"\n",
|
||
"\n",
|
||
"**Equation C-4: Dual form of the SVM problem**\n",
|
||
"\n",
|
||
"$\n",
|
||
"\\begin{split}\n",
|
||
"\\mathcal{L}(\\hat{\\mathbf{w}}, \\hat{b}, \\mathbf{\\alpha}) = \\dfrac{1}{2}\\sum\\limits_{i=1}^{m}{\n",
|
||
" \\sum\\limits_{j=1}^{m}{\n",
|
||
" \\alpha^{(i)} \\alpha^{(j)} t^{(i)} t^{(j)} {\\mathbf{x}^{(i)}}^T \\cdot \\mathbf{x}^{(j)}\n",
|
||
" }\n",
|
||
"} \\quad - \\quad \\sum\\limits_{i=1}^{m}{\\alpha^{(i)}}\\\\\n",
|
||
"\\text{with}\\quad \\alpha^{(i)} \\ge 0 \\quad \\text{for }i = 1, 2, \\dots, m\n",
|
||
"\\end{split}\n",
|
||
"$\n",
|
||
"\n",
|
||
"**Some more equations in the text:**\n",
|
||
"\n",
|
||
"$ \\hat{\\mathbf{\\alpha}} $\n",
|
||
"\n",
|
||
"\n",
|
||
"$ {\\hat{\\alpha}}^{(i)} \\ge 0 $\n",
|
||
"\n",
|
||
"\n",
|
||
"$ \\hat{\\mathbf{\\alpha}} $\n",
|
||
"\n",
|
||
"\n",
|
||
"$ \\hat{\\mathbf{w}} $\n",
|
||
"\n",
|
||
"\n",
|
||
"$ \\hat{b} $\n",
|
||
"\n",
|
||
"\n",
|
||
"$ \\hat{b} = 1 - t^{(k)}({\\hat{\\mathbf{w}}}^T \\cdot \\mathbf{x}^{(k)}) $\n",
|
||
"\n",
|
||
"\n",
|
||
"**Equation C-5: Bias term estimation using the dual form**\n",
|
||
"\n",
|
||
"$\n",
|
||
"\\hat{b} = \\dfrac{1}{n_s}\\sum\\limits_{\\scriptstyle i=1 \\atop {\\scriptstyle {\\hat{\\alpha}}^{(i)} > 0}}^{m}{\\left[1 - t^{(i)}({\\hat{\\mathbf{w}}}^T \\cdot \\mathbf{x}^{(i)})\\right]}\n",
|
||
"$"
|
||
]
|
||
},
|
||
{
|
||
"metadata": {
|
||
"id": "r57A4yyKn1Y8",
|
||
"colab_type": "text"
|
||
},
|
||
"cell_type": "markdown",
|
||
"source": [
|
||
"# Appendix D"
|
||
]
|
||
},
|
||
{
|
||
"metadata": {
|
||
"id": "KwAsMn0Rn1Y9",
|
||
"colab_type": "text"
|
||
},
|
||
"cell_type": "markdown",
|
||
"source": [
|
||
"**Equation D-1: Partial derivatives of $f(x,y)$**\n",
|
||
"\n",
|
||
"$\n",
|
||
"\\begin{split}\n",
|
||
"\\dfrac{\\partial f}{\\partial x} & = \\dfrac{\\partial(x^2y)}{\\partial x} + \\dfrac{\\partial y}{\\partial x} + \\dfrac{\\partial 2}{\\partial x} = y \\dfrac{\\partial(x^2)}{\\partial x} + 0 + 0 = 2xy \\\\\n",
|
||
"\\dfrac{\\partial f}{\\partial y} & = \\dfrac{\\partial(x^2y)}{\\partial y} + \\dfrac{\\partial y}{\\partial y} + \\dfrac{\\partial 2}{\\partial y} = x^2 + 1 + 0 = x^2 + 1 \\\\\n",
|
||
"\\end{split}\n",
|
||
"$\n",
|
||
"\n",
|
||
"**In the text:**\n",
|
||
"\n",
|
||
"$ \\frac{\\partial g}{\\partial x} = 0 + (0 \\times x + y \\times 1) = y $\n",
|
||
"\n",
|
||
"\n",
|
||
"$ \\frac{\\partial x}{\\partial x} = 1 $\n",
|
||
"\n",
|
||
"\n",
|
||
"$ \\frac{\\partial y}{\\partial x} = 0 $\n",
|
||
"\n",
|
||
"\n",
|
||
"$ \\frac{\\partial (u \\times v)}{\\partial x} = \\frac{\\partial v}{\\partial x} \\times u + \\frac{\\partial u}{\\partial x} \\times u $\n",
|
||
"\n",
|
||
"\n",
|
||
"$ \\frac{\\partial g}{\\partial x} = 0 + (0 \\times x + y \\times 1) $\n",
|
||
"\n",
|
||
"\n",
|
||
"$ \\frac{\\partial g}{\\partial x} = y $\n",
|
||
"\n",
|
||
"\n",
|
||
"**Equation D-2: Derivative of a function _h_(_x_) at point _x_~0~**\n",
|
||
"\n",
|
||
"$\n",
|
||
"\\begin{split}\n",
|
||
"h'(x) & = \\underset{\\textstyle x \\to x_0}{\\lim}\\dfrac{h(x) - h(x_0)}{x - x_0}\\\\\n",
|
||
" & = \\underset{\\textstyle \\epsilon \\to 0}{\\lim}\\dfrac{h(x_0 + \\epsilon) - h(x_0)}{\\epsilon}\n",
|
||
"\\end{split}\n",
|
||
"$\n",
|
||
"\n",
|
||
"\n",
|
||
"**Equation D-3: A few operations with dual numbers**\n",
|
||
"\n",
|
||
"$\n",
|
||
"\\begin{split}\n",
|
||
"&\\lambda(a + b\\epsilon) = \\lambda a + \\lambda b \\epsilon\\\\\n",
|
||
"&(a + b\\epsilon) + (c + d\\epsilon) = (a + c) + (b + d)\\epsilon \\\\\n",
|
||
"&(a + b\\epsilon) \\times (c + d\\epsilon) = ac + (ad + bc)\\epsilon + (bd)\\epsilon^2 = ac + (ad + bc)\\epsilon\\\\\n",
|
||
"\\end{split}\n",
|
||
"$\n",
|
||
"\n",
|
||
"**In the text:**\n",
|
||
"\n",
|
||
"$ \\frac{\\partial f}{\\partial x}(3, 4) $\n",
|
||
"\n",
|
||
"\n",
|
||
"$ \\frac{\\partial f}{\\partial y}(3, 4) $\n",
|
||
"\n",
|
||
"\n",
|
||
"**Equation D-4: Chain rule**\n",
|
||
"\n",
|
||
"$\n",
|
||
"\\dfrac{\\partial f}{\\partial x} = \\dfrac{\\partial f}{\\partial n_i} \\times \\dfrac{\\partial n_i}{\\partial x}\n",
|
||
"$\n",
|
||
"\n",
|
||
"**In the text:**\n",
|
||
"\n",
|
||
"$ \\frac{\\partial f}{\\partial n_7} = 1 $\n",
|
||
"\n",
|
||
"\n",
|
||
"$ \\frac{\\partial f}{\\partial n_5} = \\frac{\\partial f}{\\partial n_7} \\times \\frac{\\partial n_7}{\\partial n_5} $\n",
|
||
"\n",
|
||
"\n",
|
||
"$ \\frac{\\partial f}{\\partial n_7} = 1 $\n",
|
||
"\n",
|
||
"\n",
|
||
"$ \\frac{\\partial n_7}{\\partial n_5} $\n",
|
||
"\n",
|
||
"\n",
|
||
"$ \\frac{\\partial n_7}{\\partial n_5} = 1 $\n",
|
||
"\n",
|
||
"\n",
|
||
"$ \\frac{\\partial f}{\\partial n_5} = 1 \\times 1 = 1 $\n",
|
||
"\n",
|
||
"\n",
|
||
"$ \\frac{\\partial f}{\\partial n_4} = \\frac{\\partial f}{\\partial n_5} \\times \\frac{\\partial n_5}{\\partial n_4} $\n",
|
||
"\n",
|
||
"\n",
|
||
"$ \\frac{\\partial n_5}{\\partial n_4} = n_2 $\n",
|
||
"\n",
|
||
"\n",
|
||
"$ \\frac{\\partial f}{\\partial n_4} = 1 \\times n_2 = 4 $\n",
|
||
"\n",
|
||
"\n",
|
||
"$ \\frac{\\partial f}{\\partial x} = 24 $\n",
|
||
"\n",
|
||
"\n",
|
||
"$ \\frac{\\partial f}{\\partial y} = 10 $"
|
||
]
|
||
},
|
||
{
|
||
"metadata": {
|
||
"id": "Nn7LHmpCn1Y-",
|
||
"colab_type": "text"
|
||
},
|
||
"cell_type": "markdown",
|
||
"source": [
|
||
"# Appendix E"
|
||
]
|
||
},
|
||
{
|
||
"metadata": {
|
||
"id": "XOtBQA4-n1Y-",
|
||
"colab_type": "text"
|
||
},
|
||
"cell_type": "markdown",
|
||
"source": [
|
||
"**Equation E-1: Probability that the i^th^ neuron will output 1**\n",
|
||
"\n",
|
||
"$\n",
|
||
"p\\left(s_i^{(\\text{next step})} = 1\\right) \\, = \\, \\sigma\\left(\\frac{\\textstyle \\sum\\limits_{j = 1}^N{w_{i,j}s_j + b_i}}{\\textstyle T}\\right)\n",
|
||
"$\n",
|
||
"\n",
|
||
"**In the text:**\n",
|
||
"\n",
|
||
"$ \\dot{\\mathbf{x}} $\n",
|
||
"\n",
|
||
"\n",
|
||
"$ \\dot{\\mathbf{h}} $\n",
|
||
"\n",
|
||
"\n",
|
||
"**Equation E-2: Contrastive divergence weight update**\n",
|
||
"\n",
|
||
"$\n",
|
||
"w_{i,j}^{(\\text{next step})} = w_{i,j} + \\eta(\\mathbf{x}\\mathbf{h}^T - \\dot{\\mathbf{x}} \\dot {\\mathbf{h}}^T)\n",
|
||
"$"
|
||
]
|
||
},
|
||
{
|
||
"metadata": {
|
||
"id": "el71w1NKn1Y-",
|
||
"colab_type": "text"
|
||
},
|
||
"cell_type": "markdown",
|
||
"source": [
|
||
"# Glossary\n",
|
||
"\n",
|
||
"In the text:\n",
|
||
"\n",
|
||
"$\\ell _1$\n",
|
||
"\n",
|
||
"\n",
|
||
"$\\ell _2$\n",
|
||
"\n",
|
||
"\n",
|
||
"$\\ell _k$\n",
|
||
"\n",
|
||
"\n",
|
||
"$ \\chi^2 $\n"
|
||
]
|
||
},
|
||
{
|
||
"metadata": {
|
||
"id": "2Bo7NhHLn1ZA",
|
||
"colab_type": "text"
|
||
},
|
||
"cell_type": "markdown",
|
||
"source": [
|
||
"Just in case your eyes hurt after all these equations, let's finish with the single most beautiful equation in the world. No, it's not $E = mc²$, it's obviously Euler's identity:"
|
||
]
|
||
},
|
||
{
|
||
"metadata": {
|
||
"id": "s31L5v1mn1ZB",
|
||
"colab_type": "text"
|
||
},
|
||
"cell_type": "markdown",
|
||
"source": [
|
||
"$e^{i\\pi}+1=0$"
|
||
]
|
||
},
|
||
{
|
||
"metadata": {
|
||
"id": "i3tXVx1zn1ZB",
|
||
"colab_type": "code",
|
||
"colab": {}
|
||
},
|
||
"cell_type": "code",
|
||
"source": [
|
||
""
|
||
],
|
||
"execution_count": 0,
|
||
"outputs": []
|
||
}
|
||
]
|
||
} |