Recently, we studied the deep learning algorithm to predict the future trend of stocks , I've seen many cases shared by others on the Internet , It is also tested in practice , Sensory use LSTM
The algorithm is more applicable . Short and long term memory network (LSTM,Long Short-Term
Memory) It is a time cycle neural network , To solve the general problem RNN( Cyclic neural network ) Specially designed to solve the problem of long-term dependence ,
Look at the picture above LSTM It's profound , In fact, the simple understanding is to explore the law between the past data set based on time series and the prediction target data ,LSTM
It will combine the data from a long time ago (long) And recent data (short-term) Make a comprehensive judgment , Discover internal laws , Form prediction model .
Take the prediction of stock prices as an example , We can take today's closing price of a stock as a prediction target , It started yesterday and went straight ahead 60 The closing price of the last trading day is used as the input data , That is, put the front 60
The closing price is used as the basis of machine learning X input , Today's closing price is y output . According to this rule, prepare forward in turn X and y data , For example, yesterday's closing price was a new one y, Before yesterday 60
The closing price of a trading day as a new X. According to the price data you can collect , Can prepare a lot of X and y, As training LSTM Algorithmic data .
The following is a detailed explanation in combination with the program .
1. Import required packages
1 2 3 4 import numpy as np import pandas as pd import matplotlib.pyplot as plt
import tushare as ts # We are using tushare To download stock data
2. Download stock data
1 2 3 4 5 6 7 ts.set_token('xxx') # Need in tushare Apply for an account on the official website , Then get token
Data can only be obtained through the data interface pro = ts.pro_api() # Here is 000001 Ping An Bank as an example , Download from 2015-1-1 Stock price data to a recent day
df = pro.daily(ts_code=‘000001.SZ’, start_date=‘2015-01-01’,
end_date=‘2020-02-25’) df.head() # use df.head() You can view the downloaded stock price data , The displayed data are as follows :
3. Prepare data
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23
# Turn the data in chronological order , Put the latest in the back , from tushare The downloaded data is the latest in the front , For later preparation X,y Convenient data df = df.iloc[::-1]
df.reset_index(inplace=True) # Only use the data of the closing price field in the data , You can also test using more price fields as forecast input data training_set
= df.loc[:, ['close']] # Get price data only , Don't include header and other contents training_set = training_set.values
# Regular processing of data , All in proportion 0 reach 1 Data between , This is to avoid too large or too small real data affecting model judgment from sklearn.preprocessing
import MinMaxScaler sc = MinMaxScaler(feature_range = (0, 1))
training_set_scaled = sc.fit_transform(training_set) # prepare X and y
data , Just like the previous explanation , First use the closing price of the latest trading day as the first price y, Then before this trading day 60 The closing price of the last trading day is taken as X.
# Push forward in this order , For example, the second closing price recently is the second y, And before the latest second closing price 60 The closing price of the first trading day is taken as the second trading day X, Go ahead in turn and prepare a large number of X and
y, For later training . X_train = [] y_train = [] for i in range(60,
len(training_set_scaled)): X_train.append(training_set_scaled[i-60:i])
y_train.append(training_set_scaled[i, training_set_scaled.shape[1] - 1])
X_train, y_train = np.array(X_train), np.array(y_train)
4. establish LSTM Model and train
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 # Here is the use
Keras,Keras Greatly simplifies model creation , The real algorithm behind the implementation is TensorFlow Or other . from keras.models import
Sequential from keras.layers import Dense from keras.layers import LSTM from
keras.layers import Dropout regressor = Sequential() regressor.add(LSTM(units =
50, return_sequences = True, input_shape = (X_train.shape[1],
X_train.shape[2]))) regressor.add(Dropout(0.2)) regressor.add(LSTM(units = 50,
return_sequences = True)) regressor.add(Dropout(0.2)) regressor.add(LSTM(units
= 50, return_sequences = True)) regressor.add(Dropout(0.2))
regressor.add(LSTM(units = 50)) regressor.add(Dropout(0.2))
regressor.add(Dense(units = 1)) regressor.compile(optimizer = 'adam', loss =
'mean_squared_error') regressor.fit(X_train, y_train, epochs = 100, batch_size
= 32)
The whole training process needs to last for some time , The training time is also different according to the amount of data , The interface output is roughly as follows :
5. Forecast future prices
Let's get the data for prediction first , For example, after getting today's closing price , Follow the front again 59 The closing price of a trading day constitutes a X, Then the model trained above is used for prediction y value , this y
Value is tomorrow's forecast stock price .
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29
30 31 32 33 34 35 36 37 38 39 40 41 42 43 import tushare as ts
ts.set_token('xxx') pro = ts.pro_api() df_test =
pro.index_daily(ts_code='000001.SZ', start_date='2020-02-26',
end_date='2020-02-26') # It is also the order of data transfer , Put the latest in the back df_test = df_test.iloc[::-1]
df_test.reset_index(inplace=True) # only need close Closing price field dataset_test =
df_test.loc[:, ['close']] # Then integrate the test data with the previous training data dataset_total = pd.concat(
(df_test[['close']],df[['close']]), axis = 0) # Also, only specific values are taken , Remove the header and other information inputs =
dataset_total[len(dataset_total) - len(dataset_test) - 60:].values
# Here we need to make an array deformation according to the specific format requirements ,Keras There are specific requirements for data format inputs = inputs.reshape(-1,
dataset_test.shape[1]) # The data should also be processed regularly inputs = sc.transform(inputs)
predicted_stock_price = [] # Prepare test data , It is to combine the data to be tested with the data of previous training to assemble the data to be tested X, Because you want to use the past 60
Data of last trading day , It is not enough to rely on the closing price of one trading day X_test = [] for i in range(60, 60 + len(dataset_test)):
X_test.append(inputs[i-60:i]) X_test = np.array(X_test) # An array deformation processing is also performed on the prediction data X_test
= np.reshape(X_test, (X_test.shape[0], X_test.shape[1], dataset_test.shape[1]))
# Use the previously trained model to predict the price , What comes out is from 0 reach 1 Normalized values between predicted_stock_price =
regressor.predict(X_test) # Then turn the regular data back into normal price data , Now you can get the predicted closing price for the next trading day
predicted_stock_price = sc.inverse_transform(predicted_stock_price)
If the above prediction process is recycled to the future several times ( Take the predicted data of the next trading day as a new input to predict the price of the next trading day ), We can predict the stock price in the next few days , The following figure can be illustrated .
Technology