US Gasoline Price vs Oil Price¶
C. Liner, U Arkansas, 12 Mar 2022
How do US gasoline prices vary with cost of oil? To investigate this question, we need data on gas and oil prices on a weekly basis for several years. Both are available from the St Louis Federal Reserve where we can find the gas price history and the oil price history. We choose the time period of August 1990 to current and these web pages allow us to download the data as comma separated value (csv) files.
To be specific, the gas price is the US weekly average regular gasoline price in dollars/gallon as reported on Monday of each week, while the oil price is the west Texas intermediate (WTI) spot oil price in dollars/barrel as reported on Friday of each week. We will ignore the Mon-Fri difference and consider each data set to give us a weekly snapshot.
The goal here is to build a basic neural network to predict gasoline price from oil price. Here we go.
import pandas as pd
import matplotlib.pyplot as plt
# load regular gas price and WTI oil spot price downloaded St Louis FRED
# gas: https://fred.stlouisfed.org/series/GASREGW
# oil: https://fred.stlouisfed.org/series/WCOILWTICO
# CSV files combined in excel.
# note dates do not line up since gas reporting week ends on Monday
# and oil week ends on Friday. Last value in gasreg has been droped
# to make equal length series.
df = pd.read_csv('GASREG-OILWTI-w.csv')
df
OIL_wti | GAS_reg | |
---|---|---|
0 | 30.08 | 1.191 |
1 | 27.13 | 1.245 |
2 | 29.67 | 1.242 |
3 | 30.99 | 1.252 |
4 | 34.21 | 1.266 |
... | ... | ... |
1641 | 89.60 | 3.368 |
1642 | 90.61 | 3.444 |
1643 | 92.89 | 3.487 |
1644 | 92.18 | 3.530 |
1645 | 106.80 | 3.608 |
1646 rows × 2 columns
#plot each column in the dataframe with a grid
df.plot(subplots=True,grid=True)
plt.xlabel('Weeks since Aug1990')
# safe the figure as pdf file
plt.savefig('01-oil-gas.pdf')
# show the plot
plt.show()
df.plot.scatter(x='OIL_wti',y='GAS_reg',c='black',alpha=.6)
plt.grid()
plt.title('Oil price vs Gasoline price\n(1990-Mar2022)')
plt.xlabel('WTI Oil $/bbl')
plt.ylabel('US Reg Gas $/gal')
plt.savefig('02-oil-gas-scatter.pdf')
plt.show()
NN to predict gasoline price from oil price¶
import numpy as np
from sklearn.model_selection import train_test_split
import keras
import keras.backend as K
from tensorflow.keras import Sequential
from tensorflow.keras.layers import Dense
from tensorflow.keras.layers import BatchNormalization
from tensorflow.keras.callbacks import EarlyStopping
# split into input (oil price) and output (gasoline price)
# X,Y = df.values[:,1], df.values[:,2]
X, Y = df.values[:, :-1], df.values[:, -1]
print(X,Y)
[[ 30.08] [ 27.13] [ 29.67] ... [ 92.89] [ 92.18] [106.8 ]] [1.191 1.245 1.242 ... 3.487 3.53 3.608]
# split into train (80%) and test (20%) datasets
X_train, X_test, Y_train, Y_test = train_test_split(X, Y, test_size=0.20)
print(X_train.shape, X_test.shape, Y_train.shape, Y_test.shape)
(1316, 1) (330, 1) (1316,) (330,)
# determine the number of input features
n_features = X_train.shape[1]
print('n_features =',n_features)
n_features = 1
# clear NN model
keras.backend.clear_session()
# define model
model = Sequential()
model.add(Dense(16, activation='relu', kernel_initializer='he_normal', input_shape=(n_features,)))
model.add(Dense(16, activation='relu', kernel_initializer='he_normal'))
# seems not useful for regession; tested in various positions
# model.add(BatchNormalization())
model.add(Dense(1))
# summarize the model
model.summary()
Model: "sequential" _________________________________________________________________ Layer (type) Output Shape Param # ================================================================= dense (Dense) (None, 16) 32 _________________________________________________________________ dense_1 (Dense) (None, 16) 272 _________________________________________________________________ dense_2 (Dense) (None, 1) 17 ================================================================= Total params: 321 Trainable params: 321 Non-trainable params: 0 _________________________________________________________________
# compile the model
model.compile(optimizer='adam', loss='mse', metrics=['mae','accuracy'])
# configure early stopping callback
es = EarlyStopping(monitor='val_accuracy', patience=150)
# fit the model
history = model.fit(X_train, Y_train, epochs=50, batch_size=1, \
verbose=0, validation_data=(X_test, Y_test), callbacks=[])
history_dict = history.history
# evaluate the model
error = model.evaluate(X_test, Y_test, verbose=0)
print('MSE: %.3f, RMSE: %.3f' % (error[0], np.sqrt(error[0])))
MSE: 0.041, RMSE: 0.201
# make a prediction
# oil price $/bbl
row = [37]
# gasoline price $/gal
yhat = model.predict([row])
print('Median value predicted: %.2f' % (yhat),\
'\nMedian value actual ~ $1.8')
Median value predicted: 1.74 Median value actual ~ $1.8
# plot figure
# loss curves
f = plt.figure(figsize=(12,4))
plt.subplot(121)
plt.title('Loss')
plt.xlabel('Epoch')
plt.ylabel('mean_square_error')
plt.plot(history.history['loss'], label='train')
plt.plot(history.history['val_loss'], label='val')
plt.grid()
plt.legend()
# accuracy curves
plt.subplot(122)
plt.title('Accuracy')
plt.xlabel('Epoch')
plt.ylabel('mean_absolute_error ($/gas)')
plt.plot(history.history['mae'], label='train')
plt.plot(history.history['val_mae'], label='val')
plt.grid()
f.subplots_adjust(wspace=.3)
plt.legend()
f.savefig("03-oil-gas-NNloss.pdf")
plt.show()
# make a prediction
# oil price $/bbl
x = [50]
# gasoline price $/gal
Yhat = model.predict([x])
print('NN prediction: For oil at',x[0],'$/bbl the US avg regular gas price will be %.2f' % (Yhat),'$/gal')
x = [100]
# gasoline price $/gal
Yhat = model.predict([x])
print('NN prediction: For oil at',x[0],'$/bbl the US avg regular gas price will be %.2f' % (Yhat),'$/gal')
x = [150]
# gasoline price $/gal
Yhat = model.predict([x])
print('NN prediction: For oil at',x[0],'$/bbl the US avg regular gas price will be %.2f' % (Yhat),'$/gal')
x = [200]
# gasoline price $/gal
Yhat = model.predict([x])
print('NN prediction: For oil at',x[0],'$/bbl the US avg regular gas price will be %.2f' % (Yhat),'$/gal')
NN prediction: For oil at 50 $/bbl the US avg regular gas price will be 2.24 $/gal NN prediction: For oil at 100 $/bbl the US avg regular gas price will be 3.42 $/gal NN prediction: For oil at 150 $/bbl the US avg regular gas price will be 4.59 $/gal NN prediction: For oil at 200 $/bbl the US avg regular gas price will be 5.76 $/gal
# make prediction for oil price $10-$200/bbl
row = np.arange(10.,200.,1.)
# gasoline price $/gal
Yhat = model.predict([row])
# print(yhat)
# print('Median value predicted: %.2f' % (yhat),\
# '\nMedian value actual ~ $1.8')
df.plot.scatter(x='OIL_wti',y='GAS_reg',c='black',alpha=.6)
plt.plot(row,Yhat,'r--')
# plt.yscale('log')
plt.title('US Gasoline Price vs WTI Oil Price\n(1990-Mar 2022, NN in red)')
plt.xlabel('WTI Oil Price $/BBL')
plt.ylabel('Reg Gasoline Price $/GAL')
plt.ylim(0.5,6.5)
plt.grid()
plt.savefig("04-oil-gas-NNfit.pdf")
plt.show()