机器学习之学习曲线

大家好，欢迎来到IT知识分享网。

第一部分：基本含义

学习曲线（Learning Curve）是机器学习中用于评估模型表现的一个重要工具，它可以帮助我们直观地理解模型的学习过程，并诊断模型的状态，例如是否存在欠拟合（underfitting）或过拟合（overfitting）。学习曲线通常显示训练误差和验证（测试）误差随着训练集大小的变化情况。

在正常拟合的情况下，训练集和测试集的损失值随之训练样本增加的变化情况大致如下

第二部分：代码实现

（1）导包

from sklearn.linear_model import LinearRegression from sklearn.preprocessing import PolynomialFeatures import numpy as np import matplotlib.pyplot as plt from sklearn.model_selection import train_test_split

（2）创建数据集

# 第二部分：获取数据集 np.random.seed(233) x = np.random.uniform(-4, 2, size=(100)) y = x 2 + 4 * x + 3 + 2 * np.random.randn(100) x = x.reshape(-1, 1) #plt.scatter(x, y) #plt.show()

（3）划分数据集

# 第三部分：划分数据集 x_train, x_test, y_train, y_test = train_test_split(x, y, train_size=0.7, random_state=233)

（4）开始画出学习曲线

#第四部分：开始绘制学习曲线 import matplotlib.pyplot as plt plt.rcParams["figure.figsize"] = (12, 8) degrees = [1, 2, 5, 20] for i, degree in enumerate(degrees): polynomial_features = PolynomialFeatures(degree=degree) X_poly_train = polynomial_features.fit_transform(x_train.reshape(-1, 1)) X_poly_test = polynomial_features.fit_transform(x_test.reshape(-1, 1)) train_error, test_error = [], [] for k in range(1,len(x_train)): linear_regression = LinearRegression() linear_regression.fit(X_poly_train[:k + 1], y_train[:k + 1]) train_error.append(linear_regression.score(X_poly_train[:k + 1], y_train[:k + 1])) test_error.append(linear_regression.score(X_poly_test, y_test)) plt.subplot(2, 2, i + 1) plt.title("Degree: {0}".format(degree)) plt.ylim(-1, 1) plt.plot([k + 1 for k in range(1,len(x_train))], train_error, color="red", label='train') plt.plot([k + 1 for k in range(1,len(x_train))], test_error, color="blue", label='test') plt.tight_layout() # 自动调整子图间距 plt.show()

（5）完整pycharm代码实现

from sklearn.linear_model import LinearRegression from sklearn.preprocessing import PolynomialFeatures import numpy as np import matplotlib.pyplot as plt from sklearn.model_selection import train_test_split # 第二部分：获取数据集 np.random.seed(233) x = np.random.uniform(-4, 2, size=(100)) y = x 2 + 4 * x + 3 + 2 * np.random.randn(100) x = x.reshape(-1, 1) plt.scatter(x, y) # 第三部分：划分数据集 x_train, x_test, y_train, y_test = train_test_split(x, y, train_size=0.7, random_state=233) #第四部分：开始绘制学习曲线 import matplotlib.pyplot as plt plt.rcParams["figure.figsize"] = (12, 8) degrees = [1, 2, 5, 20] for i, degree in enumerate(degrees): polynomial_features = PolynomialFeatures(degree=degree) X_poly_train = polynomial_features.fit_transform(x_train.reshape(-1, 1)) X_poly_test = polynomial_features.fit_transform(x_test.reshape(-1, 1)) train_error, test_error = [], [] for k in range(1,len(x_train)): linear_regression = LinearRegression() linear_regression.fit(X_poly_train[:k + 1], y_train[:k + 1]) train_error.append(linear_regression.score(X_poly_train[:k + 1], y_train[:k + 1])) test_error.append(linear_regression.score(X_poly_test, y_test)) plt.subplot(2, 2, i + 1) plt.title("Degree: {0}".format(degree)) plt.ylim(-1, 1) plt.plot([k + 1 for k in range(1,len(x_train))], train_error, color="red", label='train') plt.plot([k + 1 for k in range(1,len(x_train))], test_error, color="blue", label='test') plt.tight_layout() # 自动调整子图间距 plt.show()

从上述图中的运行结果来看：

Degree=1的时候，蓝色线（训练集train）不是从小到大进行的，degree=5和degree=20的时候波动很大，只有degree=2的时候效果好，刚好也符合我们的二次函数的数据集本身的特性（y = x 2 + 4 * x + 3 + 2 * np.random.randn(100)）

免责声明：本站所有文章内容,图片，视频等均是来源于用户投稿和互联网及文摘转载整编而成，不代表本站观点，不承担相关法律责任。其著作权各归其原作者或其出版社所有。如发现本站有涉嫌抄袭侵权/违法违规的内容,侵犯到您的权益，请在线联系站长,一经查实,本站将立刻删除。本文来自网络,若有侵权，请联系删除，如若转载，请注明出处：https://haidsoft.com/151690.html