当前位置：首页 >

回归：预测燃油效率

发布时间：2024/10/8 48 豆豆

生活随笔收集整理的这篇文章主要介绍了回归：预测燃油效率小编觉得挺不错的,现在分享给大家,帮大家做个参考.

回归：预测燃油效率

在一个回归问题中，我们的目标是预测一个连续值的输出，比如价格或概率。这与一个分类问题形成对比，我们的目标是从一系列类中选择一个类（例如，一张图片包含一个苹果或一个橘子，识别图片中的水果）。

本笔记本使用经典的[auto-mpg]（https://archive.ics.uci.edu/ml/datasets/auto+mpg）数据集，建立了预测70年代末和80年代初汽车燃油效率的模型。为了做到这一点，我们将为该模型提供从那个时期开始的许多汽车的描述。此描述包括以下属性：气缸、排量、马力和重量。

此示例使用“tf.keras”API，有关详细信息，请参阅[本指南]（https://www.tensorflow.org/guide/keras）。

import pathlib import matplotlib.pyplot as plt import pandas as pd import seaborn as sns import keras from keras import layers %matplotlib inline

The Auto MPG dataset

The dataset is available from the UCI Machine Learning Repository.

Get the data

First download the dataset.

dataset_path = keras.utils.get_file("auto-mpg.data", "https://archive.ics.uci.edu/ml/machine-learning-databases/auto-mpg/auto-mpg.data") dataset_path 'C:\\Users\\YIUYE\\.keras\\datasets\\auto-mpg.data'

Import it using pandas

column_names = ['MPG','Cylinders','Displacement','Horsepower','Weight','Acceleration', 'Model Year', 'Origin'] raw_dataset = pd.read_csv(dataset_path, names=column_names,na_values = "?", comment='\t',sep=" ", skipinitialspace=True)dataset = raw_dataset.copy() dataset.tail() MPGCylindersDisplacementHorsepowerWeightAccelerationModel YearOrigin393394395396397

27.0	4	140.0	86.0	2790.0	15.6	82	1
44.0	4	97.0	52.0	2130.0	24.6	82	2
32.0	4	135.0	84.0	2295.0	11.6	82	1
28.0	4	120.0	79.0	2625.0	18.6	82	1
31.0	4	119.0	82.0	2720.0	19.4	82	1

Clean the data

The dataset contains a few unknown values.

dataset.isnull().sum() MPG 0 Cylinders 0 Displacement 0 Horsepower 6 Weight 0 Acceleration 0 Model Year 0 Origin 0 dtype: int64

To keep this initial tutorial simple drop those rows.

dataset = dataset.dropna()

The "Origin" column is really categorical, not numeric. So convert that to a one-hot:

origin = dataset.pop('Origin') dataset['USA'] = (origin == 1)*1.0 dataset['Europe'] = (origin == 2)*1.0 dataset['Japan'] = (origin == 3)*1.0 dataset.tail() MPGCylindersDisplacementHorsepowerWeightAccelerationModel YearUSAEuropeJapan393394395396397

27.0	4	140.0	86.0	2790.0	15.6	82	1.0	0.0
44.0	4	97.0	52.0	2130.0	24.6	82	0.0	1.0
32.0	4	135.0	84.0	2295.0	11.6	82	1.0	0.0
28.0	4	120.0	79.0	2625.0	18.6	82	1.0	0.0
31.0	4	119.0	82.0	2720.0	19.4	82	1.0	0.0

现在将数据集拆分为一个训练集和一个测试集。

我们将在模型的最终评估中使用测试集。

train_dataset = dataset.sample(frac=0.8,random_state=0) test_dataset = dataset.drop(train_dataset.index) sns.pairplot(train_dataset[[ "Cylinders", "Displacement", "Weight"]], diag_kind="kde") sns.set()

Also look at the overall statistics:

train_stats = train_dataset.describe() train_stats.pop("MPG") train_stats = train_stats.transpose() train_stats countmeanstdmin25%50%75%maxCylindersDisplacementHorsepowerWeightAccelerationModel YearUSAEuropeJapan

314.0	5.477707	1.699788	3.0	4.00	4.0	8.00	8.0
314.0	195.318471	104.331589	68.0	105.50	151.0	265.75	455.0
314.0	104.869427	38.096214	46.0	76.25	94.5	128.00	225.0
314.0	2990.251592	843.898596	1649.0	2256.50	2822.5	3608.00	5140.0
314.0	15.559236	2.789230	8.0	13.80	15.5	17.20	24.8
314.0	75.898089	3.675642	70.0	73.00	76.0	79.00	82.0
314.0	0.624204	0.485101	0.0	0.00	1.0	1.00	1.0
314.0	0.178344	0.383413	0.0	0.00	0.0	0.00	1.0
314.0	0.197452	0.398712	0.0	0.00	0.0	0.00	1.0

Split features from labels

Separate the target value, or “label”, from the features. This label is the value that you will train the model to predict.

train_labels = train_dataset.pop('MPG') test_labels = test_dataset.pop('MPG')

Normalize the data

Look again at the train_stats block above and note how different the ranges of each feature are.

规范化使用不同尺度和范围的特征是一个很好的实践。虽然模型可能在没有特征规范化的情况下收敛，但它使训练变得更加困难，并且使生成的模型依赖于输入中使用的单元的选择。

注意：尽管我们有意只从训练数据集生成这些统计信息，但这些统计信息也将用于规范化测试数据集。我们需要这样做，以将测试数据集投影到模型所训练的相同分发中。

def norm(x):return (x - train_stats['mean']) / train_stats['std'] normed_train_data = norm(train_dataset) normed_test_data = norm(test_dataset) def build_model():model = keras.Sequential([layers.Dense(64, activation=tf.nn.relu, input_shape=[len(train_dataset.keys())]),layers.Dense(64, activation=tf.nn.relu),layers.Dense(1)])optimizer = keras.optimizers.RMSprop(0.001)model.compile(loss='mean_squared_error',optimizer=optimizer,metrics=['mean_absolute_error', 'mean_squared_error'])return model model = build_model() model.summary() _________________________________________________________________ Layer (type) Output Shape Param # ================================================================= dense_10 (Dense) (None, 64) 640 _________________________________________________________________ dense_11 (Dense) (None, 64) 4160 _________________________________________________________________ dense_12 (Dense) (None, 1) 65 ================================================================= Total params: 4,865 Trainable params: 4,865 Non-trainable params: 0 _________________________________________________________________

Now try out the model. Take a batch of 10 examples from the training data and call model.predict on it.

example_batch = normed_train_data[:10] example_result = model.predict(example_batch) example_result array([[-0.03468257],[-0.01342154],[-0.15384783],[-0.18010283],[ 0.03922582],[-0.12172151],[ 0.10603201],[ 0.2442987 ],[ 0.00099315],[ 0.18530795]], dtype=float32)

It seems to be working, and it produces a result of the expected shape and type.

Train the model

Train the model for 1000 epochs, and record the training and validation accuracy in the history object.

# Display training progress by printing a single dot for each completed epoch class PrintDot(keras.callbacks.Callback):def on_epoch_end(self, epoch, logs):if epoch % 100 == 0: print('')print('.', end='')EPOCHS = 1000history = model.fit(normed_train_data, train_labels,epochs=EPOCHS, validation_split = 0.2, verbose=0,callbacks=[PrintDot()]) .................................................................................................... .................................................................................................... .................................................................................................... .................................................................................................... .................................................................................................... .................................................................................................... .................................................................................................... .................................................................................................... .................................................................................................... .................................................................................................... hist = pd.DataFrame(history.history) hist['epoch'] = history.epoch hist.tail() lossmean_absolute_errormean_squared_errorval_lossval_mean_absolute_errorval_mean_squared_errorepoch995996997998999

2.075518	0.940943	2.075518	8.913726	2.351839	8.913726	995
2.130111	0.953561	2.130111	9.769884	2.438282	9.769884	996
2.221040	0.951258	2.221040	9.664708	2.382888	9.664708	997
2.301870	0.980407	2.301870	9.934311	2.425505	9.934311	998
2.002580	0.887644	2.002580	9.484982	2.414742	9.484982	999

def plot_history(history):hist = pd.DataFrame(history.history)hist['epoch'] = history.epochplt.figure()plt.xlabel('Epoch')plt.ylabel('Mean Abs Error [MPG]')plt.plot(hist['epoch'], hist['mean_absolute_error'],label='Train Error')plt.plot(hist['epoch'], hist['val_mean_absolute_error'],label = 'Val Error')plt.ylim([0,5])plt.legend()plt.figure()plt.xlabel('Epoch')plt.ylabel('Mean Square Error [$MPG^2$]')plt.plot(hist['epoch'], hist['mean_squared_error'],label='Train Error')plt.plot(hist['epoch'], hist['val_mean_squared_error'],label = 'Val Error')plt.ylim([0,20])plt.legend()plt.show()plot_history(history)

此图显示在大约100个周期后，验证错误几乎没有改善，甚至恶化。让我们更新“model.fit”调用，以便在验证分数没有提高时自动停止培训。我们将使用一个早期的回调来测试每个时代的训练条件。如果一个设定的时间段没有显示出改善，那么自动停止训练。

model = build_model()# The patience parameter is the amount of epochs to check for improvement early_stop = keras.callbacks.EarlyStopping(monitor='val_loss', patience=10)history = model.fit(normed_train_data, train_labels, epochs=EPOCHS,validation_split = 0.2, verbose=0, callbacks=[early_stop, PrintDot()])plot_history(history) .................................................

loss, mae, mse = model.evaluate(normed_test_data, test_labels, verbose=0)print("Testing set Mean Abs Error: {:5.2f} MPG".format(mae)) Testing set Mean Abs Error: 1.79 MPG

Make predictions

Finally, predict MPG values using data in the testing set:

test_predictions = model.predict(normed_test_data).flatten()plt.scatter(test_labels, test_predictions) plt.xlabel('True Values [MPG]') plt.ylabel('Predictions [MPG]') plt.axis('equal') plt.axis('square') plt.xlim([0,plt.xlim()[1]]) plt.ylim([0,plt.ylim()[1]]) _ = plt.plot([-100, 100], [-100, 100])

error = test_predictions - test_labels plt.hist(error, bins = 25) plt.xlabel("Prediction Error [MPG]") _ = plt.ylabel("Count")

它不是很高斯的，但是我们可以预期，因为样本的数量非常小。

总结

以上是生活随笔为你收集整理的回归：预测燃油效率的全部内容，希望文章能够帮你解决所遇到的问题。

如果觉得生活随笔网站内容还不错，欢迎将生活随笔推荐给好友。

上一篇：有谁可以说下阿克蒂思卫浴AQ390智能马
下一篇： kaggle房价预测问题