diff --git a/04_training_linear_models.ipynb b/04_training_linear_models.ipynb index f75e0d6..a86a8a3 100644 --- a/04_training_linear_models.ipynb +++ b/04_training_linear_models.ipynb @@ -29,7 +29,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "**4장 – 선형 모델**" + "**4장 – 모델 훈련**" ] }, { @@ -265,14 +265,26 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "The `LinearRegression` class is based on the scipy.linalg.lstsq() function (the name stands for \"least squares\"), which you could call directly:" + "`LinearRegression` 클래스는 scipy.linalg.lstsq() 함수(\"least squares\"의 약자)를 사용하므로 직접 호출할 수 있습니다:" ] }, { "cell_type": "code", - "execution_count": null, + "execution_count": 11, "metadata": {}, - "outputs": [], + "outputs": [ + { + "data": { + "text/plain": [ + "array([[4.21509616],\n", + " [2.77011339]])" + ] + }, + "execution_count": 11, + "metadata": {}, + "output_type": "execute_result" + } + ], "source": [ "theta_best_svd, residuals, rank, s = np.linalg.lstsq(X_b, y, rcond=1e-6)\n", "theta_best_svd" @@ -282,46 +294,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "This function computes $\\mathbf{X}^+\\mathbf{y}$, where $\\mathbf{X}^{+}$ is the _pseudoinverse_ of $\\mathbf{X}$ (specifically the Moore-Penrose inverse). You can use `np.linalg.pinv()` to compute the pseudoinverse directly:" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "np.linalg.pinv(X_b).dot(y)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "**Note**: the first releases of the book implied that the `LinearRegression` class was based on the Normal Equation. This was an error, my apologies: as explained above, it is based on the pseudoinverse, which ultimately relies on the SVD matrix decomposition of $\\mathbf{X}$ (see chapter 8 for details about the SVD decomposition). Its time complexity is $O(n^2)$ and it works even when $m < n$ or when some features are linear combinations of other features (in these cases, $\\mathbf{X}^T \\mathbf{X}$ is not invertible so the Normal Equation fails), see [issue #184](https://github.com/ageron/handson-ml/issues/184) for more details. However, this does not change the rest of the description of the `LinearRegression` class, in particular, it is based on an analytical solution, it does not scale well with the number of features, it scales linearly with the number of instances, all the data must fit in memory, it does not require feature scaling and the order of the instances in the training set does not matter." - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "# 경사 하강법을 사용한 선형 회귀" - ] - }, - { - "cell_type": "code", - "execution_count": 11, - "metadata": {}, - "outputs": [], - "source": [ - "eta = 0.1\n", - "n_iterations = 1000\n", - "m = 100\n", - "theta = np.random.randn(2,1)\n", - "\n", - "for iteration in range(n_iterations):\n", - " gradients = 2/m * X_b.T.dot(X_b.dot(theta) - y)\n", - " theta = theta - eta * gradients" + "이 함수는 $\\mathbf{X}^+\\mathbf{y}$을 계산합니다. $\\mathbf{X}^{+}$는 $\\mathbf{X}$의 _유사역행렬_(pseudoinverse)입니다(Moore–Penrose 유사역행렬입니다). `np.linalg.pinv()`을 사용해서 유사역행렬을 직접 계산할 수 있습니다:" ] }, { @@ -342,13 +315,57 @@ } ], "source": [ - "theta" + "np.linalg.pinv(X_b).dot(y)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# 경사 하강법을 사용한 선형 회귀" ] }, { "cell_type": "code", "execution_count": 13, "metadata": {}, + "outputs": [], + "source": [ + "eta = 0.1\n", + "n_iterations = 1000\n", + "m = 100\n", + "theta = np.random.randn(2,1)\n", + "\n", + "for iteration in range(n_iterations):\n", + " gradients = 2/m * X_b.T.dot(X_b.dot(theta) - y)\n", + " theta = theta - eta * gradients" + ] + }, + { + "cell_type": "code", + "execution_count": 14, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "array([[4.21509616],\n", + " [2.77011339]])" + ] + }, + "execution_count": 14, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "theta" + ] + }, + { + "cell_type": "code", + "execution_count": 15, + "metadata": {}, "outputs": [ { "data": { @@ -357,7 +374,7 @@ " [9.75532293]])" ] }, - "execution_count": 13, + "execution_count": 15, "metadata": {}, "output_type": "execute_result" } @@ -368,7 +385,7 @@ }, { "cell_type": "code", - "execution_count": 14, + "execution_count": 16, "metadata": {}, "outputs": [], "source": [ @@ -394,7 +411,7 @@ }, { "cell_type": "code", - "execution_count": 15, + "execution_count": 17, "metadata": {}, "outputs": [ { @@ -431,7 +448,7 @@ }, { "cell_type": "code", - "execution_count": 16, + "execution_count": 18, "metadata": {}, "outputs": [], "source": [ @@ -442,7 +459,7 @@ }, { "cell_type": "code", - "execution_count": 17, + "execution_count": 19, "metadata": {}, "outputs": [ { @@ -489,7 +506,7 @@ }, { "cell_type": "code", - "execution_count": 18, + "execution_count": 20, "metadata": {}, "outputs": [ { @@ -499,7 +516,7 @@ " [2.74856079]])" ] }, - "execution_count": 18, + "execution_count": 20, "metadata": {}, "output_type": "execute_result" } @@ -510,7 +527,7 @@ }, { "cell_type": "code", - "execution_count": 19, + "execution_count": 21, "metadata": {}, "outputs": [ { @@ -523,7 +540,7 @@ " warm_start=False)" ] }, - "execution_count": 19, + "execution_count": 21, "metadata": {}, "output_type": "execute_result" } @@ -536,7 +553,7 @@ }, { "cell_type": "code", - "execution_count": 20, + "execution_count": 22, "metadata": {}, "outputs": [ { @@ -545,7 +562,7 @@ "(array([4.16782089]), array([2.72603052]))" ] }, - "execution_count": 20, + "execution_count": 22, "metadata": {}, "output_type": "execute_result" } @@ -563,7 +580,7 @@ }, { "cell_type": "code", - "execution_count": 21, + "execution_count": 23, "metadata": {}, "outputs": [], "source": [ @@ -596,7 +613,7 @@ }, { "cell_type": "code", - "execution_count": 22, + "execution_count": 24, "metadata": {}, "outputs": [ { @@ -606,7 +623,7 @@ " [2.7896408 ]])" ] }, - "execution_count": 22, + "execution_count": 24, "metadata": {}, "output_type": "execute_result" } @@ -617,7 +634,7 @@ }, { "cell_type": "code", - "execution_count": 23, + "execution_count": 25, "metadata": {}, "outputs": [], "source": [ @@ -628,7 +645,7 @@ }, { "cell_type": "code", - "execution_count": 24, + "execution_count": 26, "metadata": {}, "outputs": [ { @@ -664,7 +681,7 @@ }, { "cell_type": "code", - "execution_count": 25, + "execution_count": 27, "metadata": {}, "outputs": [], "source": [ @@ -676,7 +693,7 @@ }, { "cell_type": "code", - "execution_count": 26, + "execution_count": 28, "metadata": {}, "outputs": [], "source": [ @@ -687,7 +704,7 @@ }, { "cell_type": "code", - "execution_count": 27, + "execution_count": 29, "metadata": {}, "outputs": [ { @@ -712,7 +729,7 @@ }, { "cell_type": "code", - "execution_count": 28, + "execution_count": 30, "metadata": {}, "outputs": [ { @@ -721,7 +738,7 @@ "array([-0.75275929])" ] }, - "execution_count": 28, + "execution_count": 30, "metadata": {}, "output_type": "execute_result" } @@ -735,7 +752,7 @@ }, { "cell_type": "code", - "execution_count": 29, + "execution_count": 31, "metadata": {}, "outputs": [ { @@ -744,7 +761,7 @@ "array([-0.75275929, 0.56664654])" ] }, - "execution_count": 29, + "execution_count": 31, "metadata": {}, "output_type": "execute_result" } @@ -755,7 +772,7 @@ }, { "cell_type": "code", - "execution_count": 30, + "execution_count": 32, "metadata": {}, "outputs": [ { @@ -764,7 +781,7 @@ "(array([1.78134581]), array([[0.93366893, 0.56456263]]))" ] }, - "execution_count": 30, + "execution_count": 32, "metadata": {}, "output_type": "execute_result" } @@ -777,7 +794,7 @@ }, { "cell_type": "code", - "execution_count": 31, + "execution_count": 33, "metadata": {}, "outputs": [ { @@ -807,7 +824,7 @@ }, { "cell_type": "code", - "execution_count": 32, + "execution_count": 34, "metadata": {}, "outputs": [ { @@ -849,7 +866,7 @@ }, { "cell_type": "code", - "execution_count": 33, + "execution_count": 35, "metadata": {}, "outputs": [], "source": [ @@ -875,7 +892,7 @@ }, { "cell_type": "code", - "execution_count": 34, + "execution_count": 36, "metadata": {}, "outputs": [ { @@ -899,7 +916,7 @@ }, { "cell_type": "code", - "execution_count": 35, + "execution_count": 37, "metadata": {}, "outputs": [ { @@ -936,7 +953,7 @@ }, { "cell_type": "code", - "execution_count": 36, + "execution_count": 38, "metadata": {}, "outputs": [ { @@ -990,7 +1007,7 @@ }, { "cell_type": "code", - "execution_count": 37, + "execution_count": 39, "metadata": {}, "outputs": [ { @@ -999,7 +1016,7 @@ "array([[1.55071465]])" ] }, - "execution_count": 37, + "execution_count": 39, "metadata": {}, "output_type": "execute_result" } @@ -1013,7 +1030,7 @@ }, { "cell_type": "code", - "execution_count": 38, + "execution_count": 40, "metadata": {}, "outputs": [ { @@ -1022,7 +1039,7 @@ "array([1.13500145])" ] }, - "execution_count": 38, + "execution_count": 40, "metadata": {}, "output_type": "execute_result" } @@ -1035,7 +1052,7 @@ }, { "cell_type": "code", - "execution_count": 39, + "execution_count": 41, "metadata": {}, "outputs": [ { @@ -1044,7 +1061,7 @@ "array([[1.5507201]])" ] }, - "execution_count": 39, + "execution_count": 41, "metadata": {}, "output_type": "execute_result" } @@ -1057,7 +1074,7 @@ }, { "cell_type": "code", - "execution_count": 40, + "execution_count": 42, "metadata": {}, "outputs": [ { @@ -1087,7 +1104,7 @@ }, { "cell_type": "code", - "execution_count": 41, + "execution_count": 43, "metadata": {}, "outputs": [ { @@ -1096,7 +1113,7 @@ "array([1.53788174])" ] }, - "execution_count": 41, + "execution_count": 43, "metadata": {}, "output_type": "execute_result" } @@ -1110,7 +1127,7 @@ }, { "cell_type": "code", - "execution_count": 42, + "execution_count": 44, "metadata": {}, "outputs": [ { @@ -1119,7 +1136,7 @@ "array([1.54333232])" ] }, - "execution_count": 42, + "execution_count": 44, "metadata": {}, "output_type": "execute_result" } @@ -1133,9 +1150,9 @@ }, { "cell_type": "code", - "execution_count": 43, + "execution_count": 45, "metadata": { - "scrolled": true + "scrolled": false }, "outputs": [ { @@ -1205,7 +1222,7 @@ }, { "cell_type": "code", - "execution_count": 44, + "execution_count": 46, "metadata": {}, "outputs": [], "source": [ @@ -1217,9 +1234,9 @@ "best_epoch = None\n", "best_model = None\n", "for epoch in range(1000):\n", - " sgd_reg.fit(X_train_poly_scaled, y_train) # continues where it left off\n", + " sgd_reg.fit(X_train_poly_scaled, y_train) # 이어서 학습합니다\n", " y_val_predict = sgd_reg.predict(X_val_poly_scaled)\n", - " val_error = mean_squared_error(y_val_predict, y_val)\n", + " val_error = mean_squared_error(y_val, y_val_predict)\n", " if val_error < minimum_val_error:\n", " minimum_val_error = val_error\n", " best_epoch = epoch\n", @@ -1228,7 +1245,7 @@ }, { "cell_type": "code", - "execution_count": 45, + "execution_count": 47, "metadata": {}, "outputs": [ { @@ -1241,7 +1258,7 @@ " warm_start=True))" ] }, - "execution_count": 45, + "execution_count": 47, "metadata": {}, "output_type": "execute_result" } @@ -1252,7 +1269,7 @@ }, { "cell_type": "code", - "execution_count": 46, + "execution_count": 48, "metadata": {}, "outputs": [], "source": [ @@ -1263,7 +1280,7 @@ }, { "cell_type": "code", - "execution_count": 47, + "execution_count": 49, "metadata": {}, "outputs": [], "source": [ @@ -1290,7 +1307,7 @@ }, { "cell_type": "code", - "execution_count": 113, + "execution_count": 50, "metadata": {}, "outputs": [ { @@ -1367,7 +1384,7 @@ }, { "cell_type": "code", - "execution_count": 49, + "execution_count": 51, "metadata": {}, "outputs": [ { @@ -1399,16 +1416,16 @@ }, { "cell_type": "code", - "execution_count": 50, + "execution_count": 52, "metadata": {}, "outputs": [ { "data": { "text/plain": [ - "['data', 'target', 'DESCR', 'target_names', 'feature_names']" + "['data', 'feature_names', 'target', 'DESCR', 'target_names']" ] }, - "execution_count": 50, + "execution_count": 52, "metadata": {}, "output_type": "execute_result" } @@ -1421,7 +1438,7 @@ }, { "cell_type": "code", - "execution_count": 51, + "execution_count": 53, "metadata": {}, "outputs": [ { @@ -1500,7 +1517,7 @@ }, { "cell_type": "code", - "execution_count": 52, + "execution_count": 54, "metadata": {}, "outputs": [], "source": [ @@ -1510,7 +1527,7 @@ }, { "cell_type": "code", - "execution_count": 53, + "execution_count": 55, "metadata": {}, "outputs": [ { @@ -1522,7 +1539,7 @@ " verbose=0, warm_start=False)" ] }, - "execution_count": 53, + "execution_count": 55, "metadata": {}, "output_type": "execute_result" } @@ -1535,7 +1552,7 @@ }, { "cell_type": "code", - "execution_count": 54, + "execution_count": 56, "metadata": {}, "outputs": [ { @@ -1573,7 +1590,7 @@ }, { "cell_type": "code", - "execution_count": 55, + "execution_count": 57, "metadata": {}, "outputs": [ { @@ -1582,7 +1599,7 @@ "array([1.61561562])" ] }, - "execution_count": 55, + "execution_count": 57, "metadata": {}, "output_type": "execute_result" } @@ -1593,7 +1610,7 @@ }, { "cell_type": "code", - "execution_count": 56, + "execution_count": 58, "metadata": {}, "outputs": [ { @@ -1602,7 +1619,7 @@ "array([1, 0])" ] }, - "execution_count": 56, + "execution_count": 58, "metadata": {}, "output_type": "execute_result" } @@ -1613,7 +1630,7 @@ }, { "cell_type": "code", - "execution_count": 57, + "execution_count": 59, "metadata": {}, "outputs": [ { @@ -1668,7 +1685,7 @@ }, { "cell_type": "code", - "execution_count": 58, + "execution_count": 60, "metadata": {}, "outputs": [ { @@ -1680,7 +1697,7 @@ " tol=0.0001, verbose=0, warm_start=False)" ] }, - "execution_count": 58, + "execution_count": 60, "metadata": {}, "output_type": "execute_result" } @@ -1695,7 +1712,7 @@ }, { "cell_type": "code", - "execution_count": 59, + "execution_count": 61, "metadata": {}, "outputs": [ { @@ -1744,7 +1761,7 @@ }, { "cell_type": "code", - "execution_count": 60, + "execution_count": 62, "metadata": {}, "outputs": [ { @@ -1753,7 +1770,7 @@ "array([2])" ] }, - "execution_count": 60, + "execution_count": 62, "metadata": {}, "output_type": "execute_result" } @@ -1764,7 +1781,7 @@ }, { "cell_type": "code", - "execution_count": 61, + "execution_count": 63, "metadata": {}, "outputs": [ { @@ -1773,7 +1790,7 @@ "array([[6.33134078e-07, 5.75276067e-02, 9.42471760e-01]])" ] }, - "execution_count": 61, + "execution_count": 63, "metadata": {}, "output_type": "execute_result" } @@ -1815,12 +1832,12 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "Let's start by loading the data. We will just reuse the Iris dataset we loaded earlier." + "먼저 데이터를 로드합니다. 앞서 사용했던 Iris 데이터셋을 재사용하겠습니다." ] }, { "cell_type": "code", - "execution_count": 62, + "execution_count": 64, "metadata": {}, "outputs": [], "source": [ @@ -1832,12 +1849,12 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "We need to add the bias term for every instance ($x_0 = 1$):" + "모든 샘플에 편향을 추가합니다 ($x_0 = 1$):" ] }, { "cell_type": "code", - "execution_count": 63, + "execution_count": 65, "metadata": {}, "outputs": [], "source": [ @@ -1848,12 +1865,12 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "And let's set the random seed so the output of this exercise solution is reproducible:" + "결과를 일정하게 유지하기 위해 랜덤 시드를 지정합니다:" ] }, { "cell_type": "code", - "execution_count": 64, + "execution_count": 66, "metadata": {}, "outputs": [], "source": [ @@ -1864,12 +1881,12 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "The easiest option to split the dataset into a training set, a validation set and a test set would be to use Scikit-Learn's `train_test_split()` function, but the point of this exercise is to try understand the algorithms by implementing them manually. So here is one possible implementation:" + "데이터셋을 훈련 세트, 검증 세트, 테스트 세트로 나누는 가장 쉬운 방법은 사이킷런의 `train_test_split()` 함수를 사용하는 것입니다. 이 연습문제의 목적은 직접 만들어 보면서 알고리즘을 이해하는 것이므로 가능한 한가지 방법은 다음과 같습니다:" ] }, { "cell_type": "code", - "execution_count": 65, + "execution_count": 67, "metadata": {}, "outputs": [], "source": [ @@ -1895,12 +1912,12 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "The targets are currently class indices (0, 1 or 2), but we need target class probabilities to train the Softmax Regression model. Each instance will have target class probabilities equal to 0.0 for all classes except for the target class which will have a probability of 1.0 (in other words, the vector of class probabilities for ay given instance is a one-hot vector). Let's write a small function to convert the vector of class indices into a matrix containing a one-hot vector for each instance:" + "타깃은 클래스 인덱스(0, 1 그리고 2)이지만 소프트맥스 회귀 모델을 훈련시키기 위해 필요한 것은 타깃 클래스의 확률입니다. 각 샘플에서 확률이 1인 타깃 클래스를 제외한 다른 클래스의 확률은 0입니다(다른 말로하면 주어진 샘플에 대한 클래스 확률이 원-핫 벡터입니다). 클래스 인덱스를 원-핫 벡터로 바꾸는 간단한 함수를 작성하겠습니다:" ] }, { "cell_type": "code", - "execution_count": 66, + "execution_count": 68, "metadata": {}, "outputs": [], "source": [ @@ -1916,12 +1933,12 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "Let's test this function on the first 10 instances:" + "10개 샘플만 넣어 이 함수를 테스트해 보죠:" ] }, { "cell_type": "code", - "execution_count": 67, + "execution_count": 69, "metadata": {}, "outputs": [ { @@ -1930,7 +1947,7 @@ "array([0, 1, 2, 1, 1, 0, 1, 1, 1, 0])" ] }, - "execution_count": 67, + "execution_count": 69, "metadata": {}, "output_type": "execute_result" } @@ -1941,7 +1958,7 @@ }, { "cell_type": "code", - "execution_count": 68, + "execution_count": 70, "metadata": {}, "outputs": [ { @@ -1959,7 +1976,7 @@ " [1., 0., 0.]])" ] }, - "execution_count": 68, + "execution_count": 70, "metadata": {}, "output_type": "execute_result" } @@ -1972,12 +1989,12 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "Looks good, so let's create the target class probabilities matrix for the training set and the test set:" + "잘 되네요, 이제 훈련 세트와 테스트 세트의 타깃 클래스 확률을 담은 행렬을 만들겠습니다:" ] }, { "cell_type": "code", - "execution_count": 69, + "execution_count": 71, "metadata": {}, "outputs": [], "source": [ @@ -1990,14 +2007,14 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "Now let's implement the Softmax function. Recall that it is defined by the following equation:\n", + "이제 소프트맥스 함수를 만듭니다. 다음 공식을 참고하세요:\n", "\n", "$\\sigma\\left(\\mathbf{s}(\\mathbf{x})\\right)_k = \\dfrac{\\exp\\left(s_k(\\mathbf{x})\\right)}{\\sum\\limits_{j=1}^{K}{\\exp\\left(s_j(\\mathbf{x})\\right)}}$" ] }, { "cell_type": "code", - "execution_count": 70, + "execution_count": 72, "metadata": {}, "outputs": [], "source": [ @@ -2011,12 +2028,12 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "We are almost ready to start training. Let's define the number of inputs and outputs:" + "훈련을 위한 준비를 거의 마쳤습니다. 입력과 출력의 개수를 정의합니다:" ] }, { "cell_type": "code", - "execution_count": 71, + "execution_count": 73, "metadata": {}, "outputs": [], "source": [ @@ -2028,23 +2045,23 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "Now here comes the hardest part: training! Theoretically, it's simple: it's just a matter of translating the math equations into Python code. But in practice, it can be quite tricky: in particular, it's easy to mix up the order of the terms, or the indices. You can even end up with code that looks like it's working but is actually not computing exactly the right thing. When unsure, you should write down the shape of each term in the equation and make sure the corresponding terms in your code match closely. It can also help to evaluate each term independently and print them out. The good news it that you won't have to do this everyday, since all this is well implemented by Scikit-Learn, but it will help you understand what's going on under the hood.\n", + "이제 좀 복잡한 훈련 파트입니다! 이론적으로는 간단합니다. 그냥 수학 공식을 파이썬 코드로 바꾸기만 하면 됩니다. 하지만 실제로는 꽤 까다로운 면이 있습니다. 특히, 항과 인덱스가 뒤섞이기 쉽습니다. 제대로 작동할 것처럼 코드를 작성했더라도 실제 제대로 계산하지 못합니다. 확실하지 않을 때는 각 항의 크기를 기록하고 이에 상응하는 코드가 같은 크기를 만드는지 확인합니다. 각 항을 독립적으로 평가해서 출력해 보는 것도 좋습니다. 사실 사이킷런에 이미 잘 구현되어 있기 때문에 이렇게 할 필요는 없습니다. 직접 만들어 보면 어떻게 작동하는지 이해하는데 도움이 됩니다.\n", "\n", - "So the equations we will need are the cost function:\n", + "구현할 공식은 비용함수입니다:\n", "\n", "$J(\\mathbf{\\Theta}) =\n", "- \\dfrac{1}{m}\\sum\\limits_{i=1}^{m}\\sum\\limits_{k=1}^{K}{y_k^{(i)}\\log\\left(\\hat{p}_k^{(i)}\\right)}$\n", "\n", - "And the equation for the gradients:\n", + "그리고 그래디언트 공식입니다:\n", "\n", "$\\nabla_{\\mathbf{\\theta}^{(k)}} \\, J(\\mathbf{\\Theta}) = \\dfrac{1}{m} \\sum\\limits_{i=1}^{m}{ \\left ( \\hat{p}^{(i)}_k - y_k^{(i)} \\right ) \\mathbf{x}^{(i)}}$\n", "\n", - "Note that $\\log\\left(\\hat{p}_k^{(i)}\\right)$ may not be computable if $\\hat{p}_k^{(i)} = 0$. So we will add a tiny value $\\epsilon$ to $\\log\\left(\\hat{p}_k^{(i)}\\right)$ to avoid getting `nan` values." + "$\\hat{p}_k^{(i)} = 0$이면 $\\log\\left(\\hat{p}_k^{(i)}\\right)$를 계산할 수 없습니다. `nan` 값을 피하기 위해 $\\log\\left(\\hat{p}_k^{(i)}\\right)$에 아주 작은 값 $\\epsilon$을 추가하겠습니다." ] }, { "cell_type": "code", - "execution_count": 72, + "execution_count": 74, "metadata": {}, "outputs": [ { @@ -2088,12 +2105,12 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "And that's it! The Softmax model is trained. Let's look at the model parameters:" + "바로 이겁니다! 소프트맥스 모델을 훈련시켰습니다. 모델 파라미터를 확인해 보겠습니다:" ] }, { "cell_type": "code", - "execution_count": 73, + "execution_count": 75, "metadata": {}, "outputs": [ { @@ -2104,7 +2121,7 @@ " [-0.72087779, -0.083875 , 1.48587045]])" ] }, - "execution_count": 73, + "execution_count": 75, "metadata": {}, "output_type": "execute_result" } @@ -2117,12 +2134,12 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "Let's make predictions for the validation set and check the accuracy score:" + "검증 세트에 대한 예측과 정확도를 확인해 보겠습니다:" ] }, { "cell_type": "code", - "execution_count": 74, + "execution_count": 76, "metadata": {}, "outputs": [ { @@ -2131,7 +2148,7 @@ "0.9666666666666667" ] }, - "execution_count": 74, + "execution_count": 76, "metadata": {}, "output_type": "execute_result" } @@ -2149,12 +2166,12 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "Well, this model looks pretty good. For the sake of the exercise, let's add a bit of $\\ell_2$ regularization. The following training code is similar to the one above, but the loss now has an additional $\\ell_2$ penalty, and the gradients have the proper additional term (note that we don't regularize the first element of `Theta` since this corresponds to the bias term). Also, let's try increasing the learning rate `eta`." + "와우, 이 모델이 매우 잘 작동하는 것 같습니다. 연습을 위해서 $\\ell_2$ 규제를 조금 추가해 보겠습니다. 다음 코드는 위와 거의 동일하지만 손실에 $\\ell_2$ 페널티가 추가되었고 그래디언트에도 항이 추가되었습니다(`Theta`의 첫 번째 원소는 편향이므로 규제하지 않습니다). 학습률 `eta`도 증가시켜 보겠습니다." ] }, { "cell_type": "code", - "execution_count": 75, + "execution_count": 77, "metadata": {}, "outputs": [ { @@ -2201,12 +2218,12 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "Because of the additional $\\ell_2$ penalty, the loss seems greater than earlier, but perhaps this model will perform better? Let's find out:" + "추가된 $\\ell_2$ 페널티 때문에 이전보다 손실이 조금 커보이지만 더 잘 작동하는 모델이 되었을까요? 확인해 보죠:" ] }, { "cell_type": "code", - "execution_count": 76, + "execution_count": 78, "metadata": {}, "outputs": [ { @@ -2215,7 +2232,7 @@ "1.0" ] }, - "execution_count": 76, + "execution_count": 78, "metadata": {}, "output_type": "execute_result" } @@ -2233,19 +2250,19 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "Cool, perfect accuracy! We probably just got lucky with this validation set, but still, it's pleasant." + "와우, 완벽한 정확도네요! 운이 좋은 검증 세트일지 모르지만 잘 된 것은 맞습니다." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ - "Now let's add early stopping. For this we just need to measure the loss on the validation set at every iteration and stop when the error starts growing." + "이제 조기 종료를 추가해 보죠. 이렇게 하려면 매 반복에서 검증 세트에 대한 손실을 계산해서 오차가 증가하기 시작할 때 멈춰야 합니다." ] }, { "cell_type": "code", - "execution_count": 77, + "execution_count": 79, "metadata": {}, "outputs": [ { @@ -2300,7 +2317,7 @@ }, { "cell_type": "code", - "execution_count": 78, + "execution_count": 80, "metadata": {}, "outputs": [ { @@ -2309,7 +2326,7 @@ "1.0" ] }, - "execution_count": 78, + "execution_count": 80, "metadata": {}, "output_type": "execute_result" } @@ -2327,20 +2344,22 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "Still perfect, but faster." + "그래도 완벽하고 더 빠릅니다." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ - "Now let's plot the model's predictions on the whole dataset:" + "이제 전체 데이터셋에 대한 모델의 예측을 그래프로 나타내 보겠습니다:" ] }, { "cell_type": "code", - "execution_count": 79, - "metadata": {}, + "execution_count": 81, + "metadata": { + "scrolled": true + }, "outputs": [ { "data": { @@ -2390,12 +2409,12 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "And now let's measure the final model's accuracy on the test set:" + "이제 테스트 세트에 대한 모델의 최종 정확도를 측정해 보겠습니다:" ] }, { "cell_type": "code", - "execution_count": 80, + "execution_count": 82, "metadata": {}, "outputs": [ { @@ -2404,7 +2423,7 @@ "0.9333333333333333" ] }, - "execution_count": 80, + "execution_count": 82, "metadata": {}, "output_type": "execute_result" } @@ -2422,7 +2441,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "Our perfect model turns out to have slight imperfections. This variability is likely due to the very small size of the dataset: depending on how you sample the training set, validation set and the test set, you can get quite different results. Try changing the random seed and running the code again a few times, you will see that the results will vary." + "완벽했던 최종 모델의 성능이 조금 떨어졌습니다. 이런 차이는 데이터셋이 작기 때문일 것입니다. 훈련 세트와 검증 세트, 테스트 세트를 어떻게 샘플링했는지에 따라 매우 다른 결과를 얻을 수 있습니다. 몇 번 랜덤 시드를 바꾸고 이 코드를 다시 실행해 보면 결과가 달라지는 것을 확인할 수 있습니다." ] } ], diff --git a/images/training_linear_models/early_stopping_plot.png b/images/training_linear_models/early_stopping_plot.png index b4eca59..8c87ef7 100644 Binary files a/images/training_linear_models/early_stopping_plot.png and b/images/training_linear_models/early_stopping_plot.png differ diff --git a/images/training_linear_models/generated_data_plot.png b/images/training_linear_models/generated_data_plot.png index 4abf6ee..aba57ef 100644 Binary files a/images/training_linear_models/generated_data_plot.png and b/images/training_linear_models/generated_data_plot.png differ diff --git a/images/training_linear_models/gradient_descent_paths_plot.png b/images/training_linear_models/gradient_descent_paths_plot.png index 9bb303f..f3f7742 100644 Binary files a/images/training_linear_models/gradient_descent_paths_plot.png and b/images/training_linear_models/gradient_descent_paths_plot.png differ diff --git a/images/training_linear_models/gradient_descent_plot.png b/images/training_linear_models/gradient_descent_plot.png index 8065ed0..6286f2c 100644 Binary files a/images/training_linear_models/gradient_descent_plot.png and b/images/training_linear_models/gradient_descent_plot.png differ diff --git a/images/training_linear_models/high_degree_polynomials_plot.png b/images/training_linear_models/high_degree_polynomials_plot.png index 095e990..6192fe6 100644 Binary files a/images/training_linear_models/high_degree_polynomials_plot.png and b/images/training_linear_models/high_degree_polynomials_plot.png differ diff --git a/images/training_linear_models/lasso_regression_plot.png b/images/training_linear_models/lasso_regression_plot.png index d87efef..79cc0e0 100644 Binary files a/images/training_linear_models/lasso_regression_plot.png and b/images/training_linear_models/lasso_regression_plot.png differ diff --git a/images/training_linear_models/lasso_vs_ridge_plot.png b/images/training_linear_models/lasso_vs_ridge_plot.png index ccab481..76cb7e3 100644 Binary files a/images/training_linear_models/lasso_vs_ridge_plot.png and b/images/training_linear_models/lasso_vs_ridge_plot.png differ diff --git a/images/training_linear_models/learning_curves_plot.png b/images/training_linear_models/learning_curves_plot.png index c0a4143..023a5c0 100644 Binary files a/images/training_linear_models/learning_curves_plot.png and b/images/training_linear_models/learning_curves_plot.png differ diff --git a/images/training_linear_models/linear_model_predictions.png b/images/training_linear_models/linear_model_predictions.png index 99a9e05..e9fe435 100644 Binary files a/images/training_linear_models/linear_model_predictions.png and b/images/training_linear_models/linear_model_predictions.png differ diff --git a/images/training_linear_models/logistic_function_plot.png b/images/training_linear_models/logistic_function_plot.png index 6ab9250..fe11dda 100644 Binary files a/images/training_linear_models/logistic_function_plot.png and b/images/training_linear_models/logistic_function_plot.png differ diff --git a/images/training_linear_models/logistic_regression_contour_plot.png b/images/training_linear_models/logistic_regression_contour_plot.png index 7de80f9..b42c78f 100644 Binary files a/images/training_linear_models/logistic_regression_contour_plot.png and b/images/training_linear_models/logistic_regression_contour_plot.png differ diff --git a/images/training_linear_models/logistic_regression_plot.png b/images/training_linear_models/logistic_regression_plot.png index 5c72f74..9cce0b3 100644 Binary files a/images/training_linear_models/logistic_regression_plot.png and b/images/training_linear_models/logistic_regression_plot.png differ diff --git a/images/training_linear_models/quadratic_data_plot.png b/images/training_linear_models/quadratic_data_plot.png index 2cb073e..9f07416 100644 Binary files a/images/training_linear_models/quadratic_data_plot.png and b/images/training_linear_models/quadratic_data_plot.png differ diff --git a/images/training_linear_models/quadratic_predictions_plot.png b/images/training_linear_models/quadratic_predictions_plot.png index 81bcc64..5b6cacc 100644 Binary files a/images/training_linear_models/quadratic_predictions_plot.png and b/images/training_linear_models/quadratic_predictions_plot.png differ diff --git a/images/training_linear_models/ridge_regression_plot.png b/images/training_linear_models/ridge_regression_plot.png index 763fb1e..cf16c56 100644 Binary files a/images/training_linear_models/ridge_regression_plot.png and b/images/training_linear_models/ridge_regression_plot.png differ diff --git a/images/training_linear_models/sgd_plot.png b/images/training_linear_models/sgd_plot.png index 33720a9..5c42302 100644 Binary files a/images/training_linear_models/sgd_plot.png and b/images/training_linear_models/sgd_plot.png differ diff --git a/images/training_linear_models/softmax_regression_contour_plot.png b/images/training_linear_models/softmax_regression_contour_plot.png index 395dd15..178c978 100644 Binary files a/images/training_linear_models/softmax_regression_contour_plot.png and b/images/training_linear_models/softmax_regression_contour_plot.png differ diff --git a/images/training_linear_models/underfitting_learning_curves_plot.png b/images/training_linear_models/underfitting_learning_curves_plot.png index c5f9c1f..8afe48f 100644 Binary files a/images/training_linear_models/underfitting_learning_curves_plot.png and b/images/training_linear_models/underfitting_learning_curves_plot.png differ