欢迎访问 生活随笔!

生活随笔

当前位置: 首页 >

回归:预测燃油效率

发布时间:2024/10/8 48 豆豆
生活随笔 收集整理的这篇文章主要介绍了 回归:预测燃油效率 小编觉得挺不错的,现在分享给大家,帮大家做个参考.

回归:预测燃油效率

在一个回归问题中,我们的目标是预测一个连续值的输出,比如价格或概率。这与一个分类问题形成对比,我们的目标是从一系列类中选择一个类(例如,一张图片包含一个苹果或一个橘子,识别图片中的水果)。

本笔记本使用经典的[auto-mpg](https://archive.ics.uci.edu/ml/datasets/auto+mpg)数据集,建立了预测70年代末和80年代初汽车燃油效率的模型。为了做到这一点,我们将为该模型提供从那个时期开始的许多汽车的描述。此描述包括以下属性:气缸、排量、马力和重量。

此示例使用“tf.keras”API,有关详细信息,请参阅[本指南](https://www.tensorflow.org/guide/keras)。

import pathlib import matplotlib.pyplot as plt import pandas as pd import seaborn as sns import keras from keras import layers %matplotlib inline

The Auto MPG dataset

The dataset is available from the UCI Machine Learning Repository.

Get the data

First download the dataset.

dataset_path = keras.utils.get_file("auto-mpg.data", "https://archive.ics.uci.edu/ml/machine-learning-databases/auto-mpg/auto-mpg.data") dataset_path 'C:\\Users\\YIUYE\\.keras\\datasets\\auto-mpg.data'

Import it using pandas

column_names = ['MPG','Cylinders','Displacement','Horsepower','Weight','Acceleration', 'Model Year', 'Origin'] raw_dataset = pd.read_csv(dataset_path, names=column_names,na_values = "?", comment='\t',sep=" ", skipinitialspace=True)dataset = raw_dataset.copy() dataset.tail() MPGCylindersDisplacementHorsepowerWeightAccelerationModel YearOrigin393394395396397
27.04140.086.02790.015.6821
44.0497.052.02130.024.6822
32.04135.084.02295.011.6821
28.04120.079.02625.018.6821
31.04119.082.02720.019.4821

Clean the data

The dataset contains a few unknown values.

dataset.isnull().sum() MPG 0 Cylinders 0 Displacement 0 Horsepower 6 Weight 0 Acceleration 0 Model Year 0 Origin 0 dtype: int64

To keep this initial tutorial simple drop those rows.

dataset = dataset.dropna()

The "Origin" column is really categorical, not numeric. So convert that to a one-hot:

origin = dataset.pop('Origin') dataset['USA'] = (origin == 1)*1.0 dataset['Europe'] = (origin == 2)*1.0 dataset['Japan'] = (origin == 3)*1.0 dataset.tail() MPGCylindersDisplacementHorsepowerWeightAccelerationModel YearUSAEuropeJapan393394395396397
27.04140.086.02790.015.6821.00.00.0
44.0497.052.02130.024.6820.01.00.0
32.04135.084.02295.011.6821.00.00.0
28.04120.079.02625.018.6821.00.00.0
31.04119.082.02720.019.4821.00.00.0

现在将数据集拆分为一个训练集和一个测试集。

我们将在模型的最终评估中使用测试集。

train_dataset = dataset.sample(frac=0.8,random_state=0) test_dataset = dataset.drop(train_dataset.index) sns.pairplot(train_dataset[[ "Cylinders", "Displacement", "Weight"]], diag_kind="kde") sns.set()

Also look at the overall statistics:

train_stats = train_dataset.describe() train_stats.pop("MPG") train_stats = train_stats.transpose() train_stats countmeanstdmin25%50%75%maxCylindersDisplacementHorsepowerWeightAccelerationModel YearUSAEuropeJapan
314.05.4777071.6997883.04.004.08.008.0
314.0195.318471104.33158968.0105.50151.0265.75455.0
314.0104.86942738.09621446.076.2594.5128.00225.0
314.02990.251592843.8985961649.02256.502822.53608.005140.0
314.015.5592362.7892308.013.8015.517.2024.8
314.075.8980893.67564270.073.0076.079.0082.0
314.00.6242040.4851010.00.001.01.001.0
314.00.1783440.3834130.00.000.00.001.0
314.00.1974520.3987120.00.000.00.001.0

Split features from labels

Separate the target value, or “label”, from the features. This label is the value that you will train the model to predict.

train_labels = train_dataset.pop('MPG') test_labels = test_dataset.pop('MPG')

Normalize the data

Look again at the train_stats block above and note how different the ranges of each feature are.

规范化使用不同尺度和范围的特征是一个很好的实践。虽然模型可能在没有特征规范化的情况下收敛,但它使训练变得更加困难,并且使生成的模型依赖于输入中使用的单元的选择。

注意:尽管我们有意只从训练数据集生成这些统计信息,但这些统计信息也将用于规范化测试数据集。我们需要这样做,以将测试数据集投影到模型所训练的相同分发中。

def norm(x):return (x - train_stats['mean']) / train_stats['std'] normed_train_data = norm(train_dataset) normed_test_data = norm(test_dataset) def build_model():model = keras.Sequential([layers.Dense(64, activation=tf.nn.relu, input_shape=[len(train_dataset.keys())]),layers.Dense(64, activation=tf.nn.relu),layers.Dense(1)])optimizer = keras.optimizers.RMSprop(0.001)model.compile(loss='mean_squared_error',optimizer=optimizer,metrics=['mean_absolute_error', 'mean_squared_error'])return model model = build_model() model.summary() _________________________________________________________________ Layer (type) Output Shape Param # ================================================================= dense_10 (Dense) (None, 64) 640 _________________________________________________________________ dense_11 (Dense) (None, 64) 4160 _________________________________________________________________ dense_12 (Dense) (None, 1) 65 ================================================================= Total params: 4,865 Trainable params: 4,865 Non-trainable params: 0 _________________________________________________________________

Now try out the model. Take a batch of 10 examples from the training data and call model.predict on it.

example_batch = normed_train_data[:10] example_result = model.predict(example_batch) example_result array([[-0.03468257],[-0.01342154],[-0.15384783],[-0.18010283],[ 0.03922582],[-0.12172151],[ 0.10603201],[ 0.2442987 ],[ 0.00099315],[ 0.18530795]], dtype=float32)

It seems to be working, and it produces a result of the expected shape and type.

Train the model

Train the model for 1000 epochs, and record the training and validation accuracy in the history object.

# Display training progress by printing a single dot for each completed epoch class PrintDot(keras.callbacks.Callback):def on_epoch_end(self, epoch, logs):if epoch % 100 == 0: print('')print('.', end='')EPOCHS = 1000history = model.fit(normed_train_data, train_labels,epochs=EPOCHS, validation_split = 0.2, verbose=0,callbacks=[PrintDot()]) .................................................................................................... .................................................................................................... .................................................................................................... .................................................................................................... .................................................................................................... .................................................................................................... .................................................................................................... .................................................................................................... .................................................................................................... .................................................................................................... hist = pd.DataFrame(history.history) hist['epoch'] = history.epoch hist.tail() lossmean_absolute_errormean_squared_errorval_lossval_mean_absolute_errorval_mean_squared_errorepoch995996997998999
2.0755180.9409432.0755188.9137262.3518398.913726995
2.1301110.9535612.1301119.7698842.4382829.769884996
2.2210400.9512582.2210409.6647082.3828889.664708997
2.3018700.9804072.3018709.9343112.4255059.934311998
2.0025800.8876442.0025809.4849822.4147429.484982999
def plot_history(history):hist = pd.DataFrame(history.history)hist['epoch'] = history.epochplt.figure()plt.xlabel('Epoch')plt.ylabel('Mean Abs Error [MPG]')plt.plot(hist['epoch'], hist['mean_absolute_error'],label='Train Error')plt.plot(hist['epoch'], hist['val_mean_absolute_error'],label = 'Val Error')plt.ylim([0,5])plt.legend()plt.figure()plt.xlabel('Epoch')plt.ylabel('Mean Square Error [$MPG^2$]')plt.plot(hist['epoch'], hist['mean_squared_error'],label='Train Error')plt.plot(hist['epoch'], hist['val_mean_squared_error'],label = 'Val Error')plt.ylim([0,20])plt.legend()plt.show()plot_history(history)

此图显示在大约100个周期后,验证错误几乎没有改善,甚至恶化。让我们更新“model.fit”调用,以便在验证分数没有提高时自动停止培训。我们将使用一个早期的回调来测试每个时代的训练条件。如果一个设定的时间段没有显示出改善,那么自动停止训练。

model = build_model()# The patience parameter is the amount of epochs to check for improvement early_stop = keras.callbacks.EarlyStopping(monitor='val_loss', patience=10)history = model.fit(normed_train_data, train_labels, epochs=EPOCHS,validation_split = 0.2, verbose=0, callbacks=[early_stop, PrintDot()])plot_history(history) .................................................

loss, mae, mse = model.evaluate(normed_test_data, test_labels, verbose=0)print("Testing set Mean Abs Error: {:5.2f} MPG".format(mae)) Testing set Mean Abs Error: 1.79 MPG

Make predictions

Finally, predict MPG values using data in the testing set:

test_predictions = model.predict(normed_test_data).flatten()plt.scatter(test_labels, test_predictions) plt.xlabel('True Values [MPG]') plt.ylabel('Predictions [MPG]') plt.axis('equal') plt.axis('square') plt.xlim([0,plt.xlim()[1]]) plt.ylim([0,plt.ylim()[1]]) _ = plt.plot([-100, 100], [-100, 100])

error = test_predictions - test_labels plt.hist(error, bins = 25) plt.xlabel("Prediction Error [MPG]") _ = plt.ylabel("Count")

它不是很高斯的,但是我们可以预期,因为样本的数量非常小。

总结

以上是生活随笔为你收集整理的回归:预测燃油效率的全部内容,希望文章能够帮你解决所遇到的问题。

如果觉得生活随笔网站内容还不错,欢迎将生活随笔推荐给好友。