Financial Time Series Forecasting with Deep Learning : A Systematic Literature Review: 2005-2019

  • 2019-11-29 18:43:18
  • Omer Berat Sezer, Mehmet Ugur Gudelek, Ahmet Murat Ozbayoglu
  • 33

Abstract

Financial time series forecasting is, without a doubt, the top choice ofcomputational intelligence for finance researchers from both academia andfinancial industry due to its broad implementation areas and substantialimpact. Machine Learning (ML) researchers came up with various models and avast number of studies have been published accordingly. As such, a significantamount of surveys exist covering ML for financial time series forecastingstudies. Lately, Deep Learning (DL) models started appearing within the field,with results that significantly outperform traditional ML counterparts. Eventhough there is a growing interest in developing models for financial timeseries forecasting research, there is a lack of review papers that were solelyfocused on DL for finance. Hence, our motivation in this paper is to provide acomprehensive literature review on DL studies for financial time seriesforecasting implementations. We not only categorized the studies according totheir intended forecasting implementation areas, such as index, forex,commodity forecasting, but also grouped them based on their DL model choices,such as Convolutional Neural Networks (CNNs), Deep Belief Networks (DBNs),Long-Short Term Memory (LSTM). We also tried to envision the future for thefield by highlighting the possible setbacks and opportunities, so theinterested researchers can benefit.

 

Quick Read (beta)

Financial Time Series Forecasting with Deep Learning : A Systematic Literature Review: 2005-2019

Omer Berat Sezer M. Ugur Gudelek Ahmet Murat Ozbayoglu Department of Computer Engineering, TOBB University of Economics and Technology, Ankara, Turkey
Abstract

Financial time series forecasting is, without a doubt, the top choice of computational intelligence for finance researchers from both academia and financial industry due to its broad implementation areas and substantial impact. \glsml researchers came up with various models and a vast number of studies have been published accordingly. As such, a significant amount of surveys exist covering \glsml for financial time series forecasting studies. Lately, \glsdl models started appearing within the field, with results that significantly outperform traditional \glsml counterparts. Even though there is a growing interest in developing models for financial time series forecasting research, there is a lack of review papers that were solely focused on \glsdl for finance. Hence, our motivation in this paper is to provide a comprehensive literature review on \glsdl studies for financial time series forecasting implementations. We not only categorized the studies according to their intended forecasting implementation areas, such as index, forex, commodity forecasting, but also grouped them based on their \glsdl model choices, such as \glsplcnn, \glspldbn, \glslstm. We also tried to envision the future for the field by highlighting the possible setbacks and opportunities, so the interested researchers can benefit.

keywords:
deep learning, finance, computational intelligence, machine learning, time series forecasting, CNN, LSTM, RNN
\setglossarystyle

mcolindex \makeglossaries\newacronymmlMLMachine Learning \newacronymdlDLDeep Learning \newacronymaiAIArtificial Intelligence \newacronym[plural=CNNs]cnnCNNConvolutional Neural Network \newacronym[plural=DBNs]dbnDBNDeep Belief Network \newacronymlstmLSTMLong-Short Term Memory \newacronymcrpsCRPSContinuous Ranked Probability Score \newacronym[plural=RNNs]rnnRNNRecurrent Neural Network \newacronym[plural=ANNs]annANNArtificial Neural Network \newacronym[plural=DNNs]dnnDNNDeep Neural Network \newacronymmlpMLPMultilayer Perceptron \newacronym[plural=DMLPs]dmlpDMLPDeep Multilayer Perceptron \newacronym[plural=AEs]aeAEAutoencoder \newacronymhciHCIHuman-Computer Interaction \newacronym[plural=GPs]gpGPGenetic Programming \newacronym[plural=GAs]gaGAGenetic Algorithm \newacronym[plural=ECs]ecECEvolutionary Computation \newacronym[plural=MOEAs]moeaMOEAMultiobjective Evolutionary Algorithm \newacronymreluReLURectified Linear Unit \newacronymsgdSGDStochastic Gradient Descent \newacronymadagradAdaGradAdaptive Gradient Algorithm \newacronymrmspropRMSPropRoot Mean Square Propagation \newacronymadamADAMAdaptive Moment Estimation \newacronymcdCDContrastive Divergence \newacronymkldivergenceKL-DivergenceKullback Leibler Divergence \newacronymaisAISAnnealed Importance Sampling \newacronymrlRLReinforcement learning \newacronymdpDPDynamic Programming \newacronymmcMCMonte Carlo \newacronymtdTDTemporal Difference \newacronymdwnnDWNNDeep and Wide Neural Network \newacronymsrnnSRNNStacked Recurrent Neural Network \newacronymarmaARMAAutoregressive Moving Average \newacronymelmELMExtreme Learning Machine \newacronymgbtGBTGradient Boosted Trees \newacronymgan-fdGAN-FDGAN for minimizing Forecast error loss and Direction prediction loss \newacronymrcnnRCNNRecurrent CNN \newacronymarARAutoregressive \newacronymrfRFRandom Forest \newacronymsp500S&P500Standard’s & Poor’s 500 Index \newacronymniftyNIFTYNational Stock Exchange of India \newacronymsseSSEShanghai Stock Exchange \newacronymhsiHSIHong Kong Hang Seng Index \newacronymtaiexTAIEXTaiwan Capitalization Weighted Stock Index \newacronymdow30DOW30Dow Jones Industrial Average 30 \newacronymkospiKOSPIThe Korea Composite Stock Price Index \newacronymvxnVXNNASDAQ100 Volatility Index \newacronymbovespaBovespaBrazilian Stock Exchange \newacronymomxOMXStockholm Stock Exchange \newacronymegarchEGARCHExponential GARCH \newacronymbi-lstmBi-LSTMBidirectional LSTM \newacronymharHARHeterogeneous Autoregressive Process \newacronymgasvrGASVR\acrshortga with a \acrshortsvr \newacronymflannFLANNFunctional Link Neural network \newacronymeaEAEvolutionary Algorithm \newacronymldaLDALatent Dirichlet Allocation \newacronymcdbnCDBNContinuous-valued Deep Belief Networks \newacronymcrbmCRBMContinuous Restricted Boltzman machine \newacronympnnPNNProbabilistic Neural Network \newacronymxgboostXGBoosteXtreme Gradient Boosting \newacronymhmmHMMHidden Markov Model \newacronymtgruTGRUTwo-stream GRU \newacronymhar-gasvrHAR-GASVR\acrshorthar with a \acrshortgasvr \newacronymrmdnRMDNRecurrent Mixture Density Network \newacronymrmdn-garchRMDN-GARCH\acrshortrmdn with a \acrshortgarch \newacronymrwRWRandom Walk \newacronymmrsMRSMarkov Regime Switching \newacronymwilliamrWilliam%RWilliams Percent Range \newacronympposcPPOSCPercentage Price Oscillator \newacronymgmlGMLGeneralized Linear Model \newacronymdfnnDFNNDeep Feedforward Neural Network \newacronymffnnFFNNFeedforward Neural Network \newacronymnnNNNeural Network \newacronymaarAARAnnual Rate of Return \newacronymacACAutocorrelation \newacronymamexAMEXAmerican Stock Exchange \newacronymareturnARActive Return \newacronymarchARCHAutoregressive Conditional Heteroscedasticity \newacronymarimaARIMAAutoregressive Integrated Moving Average \newacronymatrATRAverage True Range \newacronymaucAUCArea Under the Curve \newacronymaurocAUROCArea Under the Receiver Operating Characteristics \newacronymbaBABalanced Accuracy \newacronymbelmBELMBasic Extreme Learning Machine \newacronymbetcBETCBreak Even Transaction Cost \newacronymbistBISTIstanbul Stock Exchange Index \newacronymbi-gruBi-GRUBidirectional Gated Recurrent Unit \newacronymbollBOLLBollinger Band \newacronymbpBPBack Propagation \newacronymbpttBPTTBack Propagation Through Time \newacronymbseBSEBombay Stock Exchange \newacronymcagrCAGRCompound Annual Growth Rate \newacronymcarCARCumulative Abnormal Return \newacronymcartCARTClassification and Regression Trees \newacronymccCCCorrelation Coefficient \newacronymcciCCICommodity Channel Index \newacronymcdaxCDAXGerman Stock Market Index Calculated by Deutsche Börse \newacronymcdbn-fgCDBN-FGFuzzy Granulation with Continuous-valued Deep Belief Networks \newacronymcdsCDSCredit Default Swaps \newacronymcewCEWEmerging Markets Currency Index \newacronymcganCGANConditional \acrshortgan \newacronymcmeCMEChicago Mercantile Exchange \newacronymcoefficientcoefficient \newacronymcrspCRSPCenter for Research in Security Prices \newacronymcseCSEColombo Stock Exchange \newacronymcsiCSIChina Securities Index \newacronymcwncWNConditional Wavenet \newacronymdaDADirection Accuracy \newacronymdaxDAXThe Deutscher Aktienindex \newacronymdcnnDCNNDeep Convolutional Neural Network \newacronymddpgDDPGDeep Deterministic Policy Gradient \newacronymdeDEDifferential Evolution \newacronymdeep-faspDeep-FASPThe Financial Aspect and Sentiment Prediction task with Deep neural networks \newacronymdeepcnlDeepCNLDeep Co-investment Network Learning \newacronymdffnDFFNDeep Feed Forward Network \newacronymdgmDGMDeep Neural Generative Model \newacronymdjiaDJIADow Jones Industrial Average \newacronymdlrDLRDeep Learning Representation \newacronymdmiDMIDirectional Movement Index \newacronymdofDOFDegrees of Freedom \newacronymdpaDPADirection Prediction Accuracy \newacronymdqlDQLDeep Q-Learning \newacronymdrlDRLDeep Reinforcement Learning \newacronymdrseDRSEDeep Random Subspace Ensembles \newacronymdtwDTWDynamic Time Warping \newacronymemaEMAExponential Moving Average \newacronymemd2fnnEMD2FNNEmpirical Mode Decomposition and Factorization Machine based Neural Network \newacronymetfETFExchange-Traded Fund \newacronymfarFARFalse Acceptance Rate \newacronymfddrFDDRFuzzy Deep Direct Reinforcement Learning \newacronymfe-qarFE-QARFixed Effects Quantile VAR \newacronymfiqaFiQAFinancial Opinion Mining and Question Answering Challange \newacronymfnFNFalse Negative \newacronymfnnFNNFully Connected Neural Network \newacronymfcnnFCNNFully Connected Neural Network \newacronymfpFPFalse Positive \newacronymfpeFPEAkaike’s Minimum Final Prediction Error \newacronymfpgaFPGAField Programmable Gate Array \newacronymfrrFRRFalse Rejection Rate \newacronymftseFTSELondon Financial Times Stock Exchange Index \newacronymg-meanG-meanGeometric Mean \newacronymgafGAFGramian Angular Field \newacronymganGANGenerative Adversarial Network \newacronymgarchGARCHGeneralised Auto-Regressive Conditional Heteroscedasticity \newacronymgbdtGBDTGradient-Boosted-DecisionTrees \newacronymglmGLMGeneralized Linear Model \newacronymgpuGPUGraphic Processing Unit \newacronymgruGRUGated-Recurrent Unit \newacronymgspcGSPCS&P500 Commodity Price Index \newacronymhanHANHybrid Attention Network \newacronymhftHFTHigh Frequency Trading \newacronymhitHITHit Rate \newacronymhmrpsoHMRPSOModified Version of PSO \newacronymhsHSChina Shanghai Shenzhen Stock Index \newacronymibbIBBiShares Nasdaq Biotechnology ETF \newacronymicICInformation Coeffiencient \newacronymirIRInformation Ratio \newacronymise100ISE100Istanbul Stock Exchange Index \newacronymixicIXICNASDAQ Composite Index \newacronymkelmKELMKernel Extreme Learning Machine \newacronymksKSKolmogorov–Smirnov \newacronymlarLARLinear Auto-regression Predictor \newacronymlfmLFMLookahead Factor Models \newacronymlobLOBLimit Order Book Data \newacronymlrnfisLRNFISLocally Recurrent Neuro-fuzzy Information System \newacronymmaMAMoving Average \newacronymmacdMACDMoving Average Convergence and Divergence \newacronymmadMADMean Absolute Deviation \newacronymmadrMADRMoving Average Deviation Rate \newacronymmaeMAEMean Absolute Error \newacronymmamMAMMoving Average Mapping \newacronymmapMAPMaximum Absolute Percentage Error \newacronymmapeMAPEMean Absolute Percentage Error \newacronymmarMARMean Abnormal Return \newacronymmaseMASEMean Standard Deviation \newacronymmccMCCMatthew Correlation Coefficient \newacronymmdaMDAMultilinear Discriminant Analysis \newacronymmddMDDMaximum Drawdown \newacronymmdpMDPMarkov Decision Process \newacronymmfiMFIMoney Flow Index \newacronymmiMIMutual Information \newacronymmodrlMODRLMulti-objective Deep Reinforcement Learning \newacronymmoeMoEMixture of Experts \newacronymmseMSEMean Squared Error \newacronymmsfeMSFEMean Squared Forecast Error \newacronymmspeMSPEMean Squared Prediction Error \newacronymmtmMTMMomentum \newacronymnarmaxNARMAXNonlinear Autoregressive Moving Average model with exogenous inputs \newacronymnasdaqNASDAQNational Association of Securities Dealers Automated Quotations \newacronymnesNESNatural Evolution Strategies \newacronymnikkeiNIKKEITokyo Nikkei Index \newacronymnlpNLPNatural Language Processing \newacronymnmaeNMAENormalized Mean Absolute Error \newacronymnmseNMSENormalized Mean Square Error \newacronymnymexNYMEXNew York Mercantile Exchange \newacronymnyseNYSENew York Stock Exchange \newacronymobvOBVOn Balance Volume \newacronymochlOCHLOpen,Close,High, Low \newacronymochlvOCHLVOpen,Close,High, Low, Volume \newacronympcaPCAPrincipal Component Analysis \newacronympccPCCPearson’s Correlation Coefficient \newacronympcdPCDPercentage of Correct Direction \newacronymplrPLRPiecewise Linear Representation \newacronympocidPOCIDPercentage of Change in Direction \newacronymppoPPOProximal Policy Optimization \newacronymprofitPROFITAverage Annual Profit of the Model \newacronympsnPSNPsi-Sigma Network \newacronympsoPSOParticle Swarm Optimization \newacronymr-sqR2Squared correlation, Non-linear regression multiple correlation \newacronymr1r1Correlation coefficient between actual value and prediction value \newacronymr2r2Correlation coefficient between actual return and prediction return \newacronymraRARolling Average \newacronymrafRAFRandom Forests \newacronymrbfRBFRadial Basis Function Neural Network \newacronymrbmRBMRestricted Boltzmann Machine \newacronymrceflannRCEFLANNRecurrent Computationally Efficient Functional Link Neural Network \newacronymrciRCIRank Correlation Index \newacronymreturnRETURNAverage Annual Returns of the Model \newacronymrmseRMSERoot Mean Square Error \newacronymrmsreRMSRERoot Mean Square Relative Error \newacronymroaROAReturn on Assets \newacronymrocROCPrice of Change \newacronymrseRSERelative Squared Error \newacronymrsiRSIRelative Strength Index \newacronymsaeSAEStacked Autoencoder \newacronymsarSARParabolic Stop and Reverse \newacronymsciSCISSE Composite Index \newacronymsdSDStandard Deviation (also referred as the Greek letter r) \newacronymsdaeSDAEStacked Denoising Autoencoders \newacronymsfmSFMState Frequency Memory \newacronymsiSIStochastic Index \newacronymslpSLPSingle Layer Perceptron \newacronymsmapeSMAPESymmetric Mean Absolute Percentage Error \newacronymsomSOMSelf-Organising Map \newacronymsrSRSharpe-ratio \newacronymsvdSVDSingular Value Decomposition \newacronymsvmSVMSupport Vector Machine \newacronymsvrSVRSupport Vector Regressor \newacronymszseSZSEShenzhen Stock Exchange Composite Index \newacronymtalibTALIBTechnical Analysis Library Package \newacronymtarTARThreshold Autoregressive \newacronymvecVECVector Error Correction model \newacronymrheRHERecurrent Hybrid Elman \newacronymtdnnTDNNTimedelay Neural Network \newacronymtheil-uTHEIL-UTheil’s inequality coefficient \newacronymtnTNTrue Negative \newacronymtpTPTrue Positive \newacronymtrTRTotal Return \newacronymtseTSETokyo Stock Exchange \newacronymtunindexTUNINDEXTunisian Stock Market Index \newacronymtwseTWSETaiwan Stock Exchange \newacronymuwnuWNUnconditional WaveNet \newacronymvarVARVector Auto Regression \newacronymvixVIXS&P500 Volatility Index \newacronymvrVRVariance Reduction \newacronymvwlVWLWL Kernel-based Method \newacronymvxdVXDDow Jones Industrial Average Volatility Index \newacronymwbaWBAWeighted Balanced Accuracy \newacronymwekaWEKAWaikato Environment for Knowledge Analysis \newacronymwhrWHRWeighted Hit Rate \newacronymwmtrWMTRWeighted Multichannel Time-series Regression \newacronymwprWPRWilliam % R \newacronymwsurtWSURTWilcoxon Sum-rank Test \newacronymwtWTWavelet Transforms \newacronymtrueTRUETrue Range of Price Movements \newacronymnseNSENational Stock Exchange of India \newacronymnorm-rmsenorm-RMSENormalized \acrshortrmse \newacronymtaqTAQTrade and Quote \newacronymhrHRHit Rate \newacronymstdSTDStandard Deviation \newacronymiseISEIstanbul Stock Exchange Index \newacronymgdaxGDAXGlobal Digital Asset Exchange \newacronymwtiWTIWest Texas Intermediate \newacronymmmMMMarkov Model \newacronymhmaeHMAEHeteroscedasticity Adjusted MAE \newacronymhmseHMSEHeteroscedasticity Adjusted MSE \newacronymspySPYSPDR S&P 500 ETF \newacronymssecSSECShanghai Stock Exchange Composite \newacronymkseKSEKorea Stock Exchange \newacronymibovespaIBOVESPAIndice Bolsa de Valores de Sao Paulo \newacronymdjiDJIDow Jones Index \newacronymtfidfTF-IDFTerm Frequency-Inverse Document Frequency \newacronymlrLRLogistic Regression \newacronymtemaTEMATriple Exponential Moving Average \newacronymb-hB&HBuy and Hold \newacronymwcnWCNWavenet Convolution Network \newacronymfhsFHSFirefly Harmony Search \newacronymmanualsearchMSManual Search \newacronymgridsearchGSGrid Search \newacronymrandomsearchRSRandomSearch \newacronymsmbgoSMBGOSequential Model-Based Global Optimization \newacronymgpaGPAThe Gaussian Process Approach \newacronymtspeaTSPEATree-structured Parzen Estimator Approach \newacronymfhsoFHSOFirefly Harmony Search Optimization

1 Introduction

The finance industry has always been interested in successful prediction of financial time series data. Numerous studies have been published that were based on \glsml models with relatively better performances compared to classical time series forecasting techniques. Meanwhile, the widespread application of automated electronic trading systems coupled with increasing demand for higher yields keeps forcing the researchers and practitioners to continue working on searching for better models. Hence, new publications and implementations keep pouring into finance and computational intelligence literature.

In the last few years, \glsdl started emerging strongly as the best performing predictor class within the \glsml field in various implementation areas. Financial time series forecasting is no exception, as such, an increasing number of prediction models based on various \glsdl techniques were introduced in the appropriate conferences and journals in recent years. Despite the existence of the vast amount of survey papers covering financial time series forecasting and trading systems using traditional soft computing techniques, to the best of our knowledge, no reviews have been performed in literature for \glsdl. Hence, we decided to work on such a comprehensive study focusing on \glsdl implementations of financial time series forecasting. Our motivation is two-fold such that we not only aimed at providing the state-of-the-art snapshot of academic and industry perspectives of the developed \glsdl models but also pinpointing the important and distinctive characteristics of each studied model to prevent researchers and practitioners to make unsatisfactory choices during their system development phase. We also wanted to envision where the industry is heading by indicating possible future directions.

Our fundamental motivation in this paper was to come up with answers for the following research questions:

  • 1.

    Which \glsdl models are used for financial time series forecasting ?

  • 2.

    How is the performance of \glsdl models compared with traditional \glsml counterparts ?

  • 3.

    What is the future direction for \glsdl research for financial time series forecasting ?

Our focus was solely on \glsdl implementations for financial time series forecasting. For other \glsdl based financial applications such as risk assessment, portfolio management, etc., interested readers can check the recent survey paper Ozbayoglu_2019. Since we singled out financial time series prediction studies in our survey, we omitted other time series forecasting studies that were not focused on financial data. Meanwhile, we included time-series research papers that had financial use cases or examples even though the papers themselves were not directly intended for financial time series forecasting. Also, we decided to include algorithmic trading papers that were based on financial forecasting, but ignore the ones that did not have a time series forecasting component.

We reviewed journals and conferences for our survey, however, we also included Masters and PhD theses, book chapters, arXiv papers and noteworthy technical publications that came up in web searches. We decided to only include the articles in the English language.

During our survey through the papers, we realized that most of the papers using the term “deep learning" in their description were published in the last 5 years. However, we also encountered some older studies that implemented deep models; such as \glsplrnn, Jordan-Elman networks. However, at their time of publication, the term “deep learning" was not in common usage. So, we decided to also include those papers.

According to our findings, this will be one of the first comprehensive “financial time series forecasting" survey papers focusing on \glsdl. A lot of \glsml reviews for financial time series forecasting exist in the literature, meanwhile, we have not encountered any study on \glsdl. Hence, we wanted to fill this gap by analyzing the developed models and applications accordingly. We hope, as a result of this paper, the researchers and model developers will have a better idea of how they can implement \glsdl models for their studies.

We structured the rest of the paper as follows. Following this brief introduction, in Section 2, the existing surveys that are focused on \glsml and soft computing studies for financial time series forecasting are mentioned. In Section 3, we will cover the existing \glsdl models that are used, such as \glscnn, \glslstm, \glsdrl. Section 4 will focus on the various financial time series forecasting implementation areas using \glsdl, namely stock forecasting, index forecasting, trend forecasting, commodity forecasting, volatility forecasting, foreign exchange forecasting, cryptocurrency forecasting. In each subsection, the problem definition will be given, followed by the particular \glsdl implementations. In Section 5, overall statistical results about our findings will be presented including histograms about the yearly distribution of different subfields, models, publication types, etc. As a result, the state-of-the-art snapshot for financial time series forecasting studies will be given through these statistics. At the same time, it will also show the areas that are already mature, compared against promising or new areas that still have room for improvement. Section 6 will provide discussions about what has been done through academic and industrial achievements and expectations through what might be needed in the future. The section will include highlights about the open areas that need further research. Finally, we will conclude in Section 7 by summarizing our findings.

2 Financial Time Series Forecasting with ML

Financial time series forecasting and associated applications have been studied extensively for many years. When \glsml started gaining popularity, financial prediction applications based on soft computing models also became available accordingly. Even though our focus is particularly on \glsdl implementations of financial time series prediction studies, it will be beneficial to briefly mention about the existing surveys covering \glsml-based financial time series forecasting studies in order to gain historical perspective.

In our study, we did not include any survey papers that were focused on specific financial application areas other than forecasting studies. However, we were faced with some review publications that included not only financial time-series studies but also other financial applications. We decided to include those papers in order to maintain the comprehensiveness of our coverage.

Examples of these aforementioned publications are provided here. There were published books on stock market forecasting Aliev_2004, trading system development Dymowa_2011, practical examples of forex and market forecasting applications Kovalerchuk_2000 using \glsml models like \glsplann, \glsplec, \glsgp and Agent-based models Brabazon_2008.

There were also some existing journal and conference surveys. Bahrammirzaee et. al. Bahrammirzaee_2010 surveyed financial prediction and planning studies along with other financial applications using various \glsai techniques like \glsann, Expert Systems, hybrid models. The authors of Zhang_2004 also compared \glsml methods in different financial applications including stock market prediction studies. In Mochn_2007, soft computing models for the market, forex prediction and trading systems were analyzed. Mullainathan and Spies Mullainathan_2017 surveyed the prediction process in general from an econometric perspective.

There were also a number of survey papers concentrated on a single particular \glsml model. Even though these papers focused on one technique, the implementation areas generally spanned various financial applications including financial time series forecasting studies. Among those soft computing methods, \glsec and \glsann had the most overall interest.

For the \glsec studies, Chen wrote a book on \glsplga and \glsgp in Computational Finance Chen_2002s. Later, \glsplmoea were extensively surveyed on various financial applications including financial time series prediction Castillo_Tapia_2007; Ponsich_2013; Aguilar_Rivera_2015. Meanwhile, Rada reviewed \glsec applications along with Expert Systems for financial investing models RADA_2008.

For the \glsann studies, Li and Ma reviewed implementations of \glsann for stock price forecasting and some other financial applications Li_2010. The authors of Tkac_2016 surveyed different implementations of \glsann in financial applications including stock price forecasting. Recently, Elmsili and Outtaj contained \glsann applications in economics and management research including economic time series forecasting in their survey Elmsili_2018.

There were also several text mining surveys focused on financial applications (which included financial time series forecasting). Mittermayer and Knolmayer compared various text mining implementations that extract market response to news for prediction Mittermayer_2006. The authors of Mitra_2012 focused on news analytics studies for prediction of abnormal returns for trading strategies in their survey. Nassirtoussi et. al. reviewed text mining studies for stock or forex market prediction Nassirtoussi_2014. The authors of Kearney_2014 also surveyed text mining-based time series forecasting and trading strategies using textual sentiment. Similarly, Kumar and Ravi Kumar_2016 reviewed text mining studies for forex and stock market prediction. Lately, Xing et. al. Xing_2017 surveyed natural language-based financial forecasting studies.

Finally, there were application-specific survey papers that focused on particular financial time series forecasting implementations. Among these studies, stock market forecasting had the most interest. A number of surveys were published for stock market forecasting studies based on various soft computing methods at different times Vanstone_2003; Hajizadeh_2010; Nair_2014; Cavalcante_2016; Krollner_2010; Yoo; Preethi_2012; Atsalakis_2009. Chatterjee et. al. Chatterjee_2000 and Katarya and Mahajan Katarya_2017 concentrated on \glsann-based financial market prediction studies whereas Hu et. al. Hu_2015 focused on \glsec implementations for stock forecasting and algorithmic trading models. In a different time series forecasting application, researchers surveyed forex prediction studies using \glsann Huang_2004 and various other soft computing techniques Pradeepkumar_2018.

Even though, many surveys exist for \glsml implementations of financial time series forecasting, \glsdl has not been surveyed comprehensively so far despite the existence of various \glsdl implementations in recent years. Hence, this was our main motivation for the survey. At this point, we would like to cover the various \glsdl models used in financial time series forecasting studies.

3 Deep Learning

\gls

dl is a type of \glsann that consists of multiple processing layers and enables high-level abstraction to model data. The key advantage of \glsdl models is extracting the good features of input data automatically using a general-purpose learning procedure. Therefore, in the literature, \glsdl models are used in lots of applications: image, speech, video, audio reconstruction, natural language understanding (particularly topic classification), sentiment analysis, question answering and language translation LeCun2015. The historical improvements on \glsdl models are surveyed in Schmidhuber_2015.

Financial time series forecasting has been very popular among \glsml researchers for more than 40 years. The financial community got a new boost lately with the introduction of \glsdl models for financial prediction research and a lot of new publications appeared accordingly. The success of \glsdl over \glsml models is the major attractive point for the finance researchers. With more financial time series data and different deep architectures, new \glsdl methods will be proposed. In our survey, we found that in the vast majority of the studies, \glsdl models were better than \glsml counterparts.

In literature, there are different kinds of \glsdl models: \glsdmlp, \glsrnn, \glslstm, \glscnn, \glsplrbm, \glsdbn, \glsae, and \glsdrl LeCun2015; Schmidhuber_2015. Throughout the literature, financial time series forecasting was mostly considered as a regression problem. However, there were also a significant number of studies, in particular trend prediction, that used classification models to tackle financial forecasting problems. In Section 4, different \glsdl implementations are provided along with their model choices.

3.1 Deep Multi Layer Perceptron (DMLP)

\glspl

dmlp is one of the first developed \glsplann. The difference from shallow nets is that \glsdmlp contains more layers. Even though particular model architectures might have variations depending on different problem requirements, \glsdmlp models consist of mainly three layers: input, hidden and output. The number of neurons in each layer and the number of layers are the hyperparameters of the network. In general, each neuron in the hidden layers has input (x), weight (w) and bias (b) terms. In addition, each neuron has a nonlinear activation function which produces a cumulative output of the preceding neurons. Equation 1 Goodfellow-et-al-2016 illustrates an output of a single neuron in the \glsnn. There are different types of nonlinear activation functions. Most commonly used nonlinear activation functions are: sigmoid (Equation 2) Cybenko_1989, hyperbolic tangent (Equation 3) Kalman_1992, \glsrelu (Equation 4) Nair_2010, leaky-\glsrelu (Equation 5) Maas_2013, swish (Equation 6) Ramachandran_2017, and softmax (Equation 7) Goodfellow-et-al-2016. The comparison of the nonlinear activations are studied in Ramachandran_2017.

yi=σ(iWixi+bi) (1)
σ(z)=11+e-z (2)
tanh(z)=ez-e-zez+e-z (3)
R(z)=max(0,z) (4)
R(z)=1(x<0)(αx)+1(x0)(x) (5)
f(x)=xσ(βx) (6)
softmax(zi)=expzijexpzj (7)
\gls

dmlp models have been appearing in various application areas Deng_2014_App; LeCun2015 . Using a \glsdmlp model has advantages and disadvantages depending on the problem requirements. Through \glsdmlp models, problems such as regression and classification can be solved by modeling the input data Gardner_1998. However, if the number of the input features is increased (e.g. image as input), the parameter size in the network will increase accordingly due to the fully connected nature of the model and it will jeopardize the computation performance and create storage problems. To overcome this issue, different types of \glsdnn methods are proposed (such as \glscnn) LeCun2015. With \glsdmlp, much more efficient classification and regression processes are performed. In Figure 1, a \glsdmlp model, layers, neurons in layers, weights between neurons are shown.

Figure 1: Deep Multi Layer Neural Network Forward Pass and Backpropagation LeCun2015
\gls

dmlp learning stage is implemented through backpropagation. The amount of error in the neurons in the output layer is propagated back to the preceeding layers. Optimization algorithms are used to find the optimum parameters/variables of the \glsplnn. They are used to update the weights of the connections between the layers. There are different optimization algorithms that are developed: \glssgd, \glssgd with Momentum, \glsadagrad, \glsrmsprop, \glsadam Robbins_1951; Sutskever_2013; Duchi_2011; Tieleman_2012; Kingma_2014. Gradient descent is an iterative method to find optimum parameters of the function that minimizes the cost function. \glssgd is an algorithm that randomly selects a few samples instead of the whole data set for each iteration Robbins_1951. \glssgd with Momentum remembers the update in each iteration that accelerates gradient descent method Sutskever_2013. \glsadagrad is a modified \glssgd that improves convergence performance over standard \glssgd algorithm Duchi_2011. \glsrmsprop is an optimization algorithm that provides the adaptation of the learning rate for each of the parameters. In \glsrmsprop, the learning rate is divided by a running average of the magnitudes of recent gradients for that weight Tieleman_2012. \glsadam is updated version of \glsrmsprop that uses running averages of both the gradients and the second moments of the gradients. \glsadam combines advantages of the \glsrmsprop (works well in online and non-stationary settings) and \glsadagrad (works well with sparse gradients) Kingma_2014.

As shown in Figure 1, the effect of the backpropagation is transferred to the previous layers. If the effect of \glssgd is gradually lost when the effect reaches the early layers during backpropagation, this problem is called vanishing gradient problem in the literature Bengio_1994. In this case, updates between the early layers become unavailable and the learning process stops. The high number of layers in the neural network and the increasing complexity cause the vanishing gradient problem.

The important issue in the \glsdmlp are the hyperparameters of the networks and method of tuning these hyperparameters. Hyperparameters are the variables of the network that affect the network architecture, and the performance of the networks. The number of hidden layers, the number of units in each layer, regularization techniques (dropout, L1, L2), network weight initialization (zero, random, He He_2015, Xavier Glorot_2010), activation functions (Sigmoid, \glsrelu, hyperbolic tangent, etc.), learning rate, decay rate, momentum values, number of epochs, batch size (minibatch size), and optimization algorithms (\glssgd, \glsadagrad, \glsrmsprop, \glsadam, etc.) are the hyperparameters of \glsdmlp. Choosing better hyperparameter values/variables for the network result in better performance. So, finding the best hyperparameters for the network is a significant issue. In literature, there are different methods to find best hyperparameters: \glsmanualsearch, \glsgridsearch, \glsrandomsearch, Bayesian Methods (\glssmbgo, \glsgpa, \glstspea) Bergstra_2011; Bergstra_2012.

3.2 Recurrent Neural Network (RNN)

\gls

rnn is another type of \glsdl network that is used for time series or sequential data, such as language and speech. \glsplrnn are also used in traditional \glsml models (\glsbptt, Jordan-Elman networks, etc.), however, the time lengths in such models are generally less than the models used in deep \glsrnn models. Deep \glsplrnn are preferred due to their ability to include longer time periods. Unlike \glsplfnn, \glsplrnn use internal memory to process incoming inputs. \glsplrnn are used in the analysis of time series data in various fields (handwriting recognition, speech recognition, etc. As stated in the literature, \glsplrnn are good at predicting the next character in the text, language translation applications, sequential data processing Deng_2014_App; LeCun2015.

\gls

rnn model architecture consists of different number of layers and different type of units in each layer. The main difference between \glsrnn and \glsfnn is that each \glsrnn unit takes the current and previous input data at the same time. The output depends on the previous data in \glsrnn model. The \glsplrnn process input sequences one by one at any given time, during their operation. In the units on the hidden layer, they hold information about the history of the input in the “state vector". When the output of the units in the hidden layer is divided into different discrete time steps, the \glsplrnn are converted into a \glsdmlp LeCun2015. In Figure 2, the information flow in the \glsrnn’s hidden layer is divided into discrete times. The status of the node S at different times of t is shown as st, the input value x at different times is xt, and the output value o at different times is shown as ot. Parameter values (U,W,V) are always used in the same step.

Figure 2: RNN cell through timeLeCun2015
\glspl

rnn can be trained using the \glsbptt algorithm. Optimization algorithms (\glssgd, \glsrmsprop, \glsadam) are used for weight adjustment process. With the \glsbptt learning method, the error change at any t time is reflected in the input and weights of the previous t times. The difficulty of training \glsrnn is due to the fact that the \glsrnn structure has a backward dependence over time. Therefore, \glsplrnn become very complex in terms of the learning period. Although the main aim of using \glsrnn is to learn long-term dependencies, studies in the literature show that when knowledge is stored for long time periods, it is not easy to learn with \glsrnn (training difficulties on \glsrnn) Pascanu_2013. In order to solve this particular problem, \glspllstm with different structures of \glsann were developed LeCun2015. Equations 89 illustrate simpler \glsrnn formulations. Equation 10 shows the total error which is the sum of each error at time step t11 1 Richard Socher, CS224d: Deep Learning for Natural Language Processing, Lecture Notes.

ht=Wf(ht-1)+W(hx)x[t] (8)
yt=W(S)f(ht) (9)
EW=t=1TEtW (10)

Hyperparameters of \glsrnn also define the network architecture and the performance of the network is affected by the parameter choices as was in \glsdmlp case. The number of hidden layers, the number of units in each layer, regularization techniques, network weight initialization, activation functions, learning rate, momentum values, number of epochs, batch size (minibatch size), decay rate, optimization algorithms, model of \glsrnn (Vanilla \glsrnn, \glsgru, \glslstm), sequence length for \glsrnn are the hyperparameters of \glsrnn. Finding the best hyperparameters for the network is a significant issue. In literature, there are different methods to find best hyperparameters: \glsmanualsearch, \glsgridsearch, \glsrandomsearch, Bayesian Methods (\glssmbgo, \glsgpa, \glstspea) Bergstra_2011; Bergstra_2012.

3.3 Long Short Term Memory (LSTM)

\gls

lstm hochreiter1997lstm is a type of \glsrnn where the network can remember both short term and long term values. \glslstm networks are the preferred choice of many \glsdl model developers when tackling complex problems like automatic speech recognition, and handwritten character recognition. \glslstm models are mostly used with time-series data. It is used in different applications such as \glsnlp, language modeling, language translation, speech recognition, sentiment analysis, predictive analysis, financial time series analysis, etc. Wu_2016; Greff_2016. With attention modules and \glsae structures, \glslstm networks can be more successful on time series data analysis such as language translation Wu_2016.

\gls

lstm networks consist of \glslstm units. Each \glslstm unit merges to form an \glslstm layer. An \glslstm unit is composed of cells having input gate, output gate and forget gate. Three gates regulate the information flow. With these features, each cell remembers the desired values over arbitrary time intervals. Equations 11-15 show the form of the forward pass of the \glslstm unit hochreiter1997lstm (xt: input vector to the \glslstm unit, ft: forget gate’s activation vector, it: input gate’s activation vector, ot: output gate’s activation vector, ht: output vector of the \glslstm unit, ct: cell state vector, σg: sigmoid function, σc , σh: hyperbolic tangent function, *: element-wise (Hadamard) product, W , U: weight matrices that need to be learned, b: bias vector parameters that need to be learned) Greff_2016.

ft=σg(Wfxt+Ufht-1+bf) (11)
it=σg(Wixt+Uiht-1+bi) (12)
ot=σg(Woxt+Uoht-1+bo) (13)
ct=ft*ct-1+it*σc(Wcxt+Ucht-1+bc) (14)
ht=ot*σh(ct) (15)
\gls

lstm is a specialized version of \glsrnn. Therefore, the weight updates and preferred optimization methods are the same. In addition, the hyperparameters of \glslstm are just like \glsrnn: the number of hidden layers, the number of units in each layer, network weight initialization, activation functions, learning rate, momentum values, the number of epochs, batch size (minibatch size), decay rate, optimization algorithms, sequence length for \glslstm, gradient clipping , gradient normalization, and dropoutReimers_2017; Greff_2016. In order to find the best hyperparameters of \glslstm, the hyperparameter optimization methods that are used for \glsrnn are also applicable to \glslstm Bergstra_2011; Bergstra_2012.

3.4 Convolutional Neural Networks (CNNs)

\gls

cnn is a type of \glsdnn that consists of convolutional layers that are based on the convolutional operation. Meanwhile, \glscnn is the most common model that is frequently used for vision or image processing based classification problems (image classification, object detection, image segmentation, etc.) Ji_2012; Szegedy_2013; Long_2015. The advantage of the usage of \glscnn is the number of parameters when comparing the vanilla \glsdl models such as \glsdmlp. Filtering with kernel window function gives an advantage of image processing to \glscnn architectures with fewer parameters that are beneficial for computing and storage. In \glscnn architectures, there are different layers: convolutional, max-pooling, dropout and fully connected \glsmlp layer. The convolutional layer consists of the convolution (filtering) operation. Basic convolution operation is shown in Equation 16 (t denotes time, s denotes feature map, w denotes kernel, x denotes input, a denotes variable). In addition, the convolution operation is implemented on two-dimensional images. Equation 17 shows the convolution operation of two-dimensional image (I denotes input image, K denotes the kernel, m and n denote the dimension of images, i and j denote variables). Besides, consecutive convolutional and max-pooling layers construct the deep network. Equation 18 provides the details about the \glsnn architecture (W denotes weights, x denotes input, b denotes bias, z denotes the output of neurons). At the end of the network, the softmax function is used to get the output. Equation 19 and 20 illustrate the softmax function (y denotes output) Goodfellow-et-al-2016.

s(t)=(x*w)(t)=a=-x(a)w(t-a) (16)
S(i,j)=(I*K)(i,j)=mnI(m,n)K(i-m,j-n). (17)
zi=jWi,jxj+bi. (18)
y=softmax(z) (19)
softmax(zi)=exp(zi)jexp(zj) (20)

The backpropagation process is used for model learning of \glscnn. Most commonly used optimization algorithms (\glssgd, \glsrmsprop) are used to find optimum parameters of \glscnn. Hyperparameters of \glscnn are similar to other \glsdl model hyperparameters: the number of hidden layers, the number of units in each layer, network weight initialization, activation functions, learning rate, momentum values, the number of epochs, batch size (minibatch size), decay rate, optimization algorithms, dropout, kernel size, and filter size. In order to find the best hyperparameters of \glscnn, usual search algorithms are used: \glsmanualsearch, \glsgridsearch, \glsrandomsearch, and Bayesian Methods. Bergstra_2011; Bergstra_2012.

3.5 Restricted Boltzmann Machines (RBMs)

\gls

rbm is a productive stochastic \glsann that can learn probability distribution on the input set Qiu2014. \glsplrbm are mostly used for unsupervised learning Hrasko_2015. \glsplrbm are used in applications such as dimension reduction, classification, feature learning, collaborative filtering Salakhutdinov_2007. The advantage of the \glsplrbm is to find hidden patterns with an unsupervised method. The disadvantage of \glsplrbm is its difficult training process. “\glsplrbm are tricky because although there are good estimators of the log-likelihood gradient, there are no known cheap ways of estimating the log-likelihood itself" Bengio_2012.

Figure 3: RBM Visible and Hidden Layers Qiu2014
\gls

rbm is a two-layer, bipartite, and undirected graphical model that consists of two layers; visible and hidden layers (Figure 3). The layers are not connected among themselves. Each cell is a computational point that processes the input and makes stochastic decisions about whether this nerve node will transmit the input. Inputs are multiplied by specific weights, certain threshold values (bias) are added to input values, then calculated values are passed through an activation function. In reconstruction stage, the results in the outputs re-enter the network as the input, then they exit from the visible layer as the output. The values of the previous input and the values after the processes are compared. The purpose of the comparison is to reduce the difference.

Equation 21 illustrates the probabilistic semantics for an \glsrbm by using its energy function (P denotes the probabilistic semantics for an \glsrbm, Z denotes the partition function, E denotes the energy function, h denotes hidden units, v denotes visible units).Equation 22 illustrates the partition function or the normalizing constant. Equation 23 shows the energy of a configuration (in matrix notation) of the standard type of \glsrbm that has binary-valued hidden and visible units (a denotes bias weights (offsets) for the visible units, b denotes bias weights for the hidden units, W denotes matrix weight of the connection between hidden and visible units, T denotes the transpose of matrix, v denotes visible units, h denotes hidden units) mohamed2009deep; lee2009convolutional.

P(v,h)=1Zexp(-E(v,h)) (21)
Z=vhexp(-E(v,h)) (22)
E(v,h)=-aTv-bTh-vTWh (23)

The learning is performed multiple times on the network Qiu2014. The training of \glsplrbm is implemented through minimizing the negative log-likelihood of the model and data. \glscd algorithm is used for the stochastic approximation algorithm which replaces the model expectation for an estimation using Gibbs Sampling with a limited number of iterations Hrasko_2015. In the \glscd algorithm, the \glskldivergence algorithm is used to measure the distance between its reconstructed probability distribution and the original probability distribution of the input Van_2009.

Momentum, learning rate, weight-cost (decay rate), batch size (minibatch size), regularization method, the number of epochs, the number of layers, initialization of weights, size of visible units, size of hidden units, type of activation units (sigmoid, softmax, \glsrelu, Gaussian units, etc.), loss function, and optimization algorithms are the hyperparameters of \glsplrbm. Similar to the other deep networks, the hyperparameters are searched with \glsmanualsearch, \glsgridsearch, \glsrandomsearch, and bayesian methods (Gaussian process). In addition to these, \glsais is used to estimate the partition function. \glscd algorithm is also used for the optimization of \glsplrbm Bergstra_2011; Bergstra_2012; Yao_2016; Carreira_2005.

3.6 Deep Belief Networks (DBNs)

\gls

dbn is a type of deep \glsann and consists of a stack of \glsrbm networks (Figure 4). \glsdbn is a probabilistic generative model that consists of latent variables. In \glsdbn, there is no link between units in each layer. \glspldbn are used to find discriminate independent features in the input set using unsupervised learning mohamed2009deep. The ability to encode the higher-order network structures and fast inference are the advantages of the DBNs Tamilselvan_2013. \glspldbn have disadvantages of training like \glsplrbm which is mentioned in the \glsrbm section, (\glspldbn are composed of \glsplrbm).

Figure 4: Deep Belief Network Qiu2014

When \glsdbn is trained on the training set in an unsupervised manner, it can learn to reconstruct the input set in a probabilistic way. Then the layers on the network begin to detect discriminating features in the input. After this learning step, supervised learning is carried out to perform the classification Hinton2006. Equation 24 illustrates the probability of generating a visible vector (W: matrix weight of connection between hidden unit h and visible unit v, p(h|W): the prior distribution over hidden vectors) mohamed2009deep.

p(v)=hp(h|W)p(v|h,W) (24)
\gls

dbn training process can be divided into two steps: stacked \glsrbm learning and backpropagation learning. In stacked \glsrbm learning, iterative \glscd algorithm is used Hrasko_2015. In backpropagation learning, optimization algorithms (\glssgd, \glsrmsprop, \glsadam) are used to train network Tamilselvan_2013. \glspldbn’ hyperparameters are similar to RBMs’ hyperparameters. Momentum, learning rate, weight-cost (decay rate), regularization method, batch size (minibatch size), the number of epochs, the number of layers, initialization of weights, the number of \glsrbm stacks, size of visible units in \glsplrbm’ layers, size of hidden units in \glsplrbm’ layer, type of units (sigmoid, softmax, rectified, Gaussian units, etc.), network weight initialization, and optimization algorithms are the hyperparameters of DBNs. Similar to the other deep networks, the hyperparameters are searched with \glsmanualsearch, \glsgridsearch, \glsrandomsearch, and Bayesian methods. \glscd algorithm is also used for the optimization of \glspldbn Bergstra_2011; Bergstra_2012; Yao_2016; Carreira_2005.

3.7 Autoencoders (AEs)

\gls

ae networks are \glsann types that are used as unsupervised learning models. In addition, \glsae networks are commonly used in \glsdl models, wherein they remap the inputs (features) such that the inputs are more representative for classification. In other words, \glsae networks perform an unsupervised feature learning process, which fits very well with the \glsdl theme. A representation of a data set is learned by reducing the dimensionality with \glsplae. \glsplae are similar to \glsplffnn’ architecture. They consist of an input layer, an output layer and one or more hidden layers that connect them together. The number of nodes in the input layer and the number of nodes in the output layer are equal to each other in \glsplae, and they have a symmetrical structure. The most notable advantages of \glsplae are dimensionality reduction and feature learning. Meanwhile, reducing dimensionality and feature extraction in \glsplae cause some drawbacks. Focusing on minimizing the loss of data relationship in encoding of \glsae cause the loss of some significant data relationships. Hence, this may be considered as a drawback of \glsplaeMeng_2017.

In general, \glsplae contain two components: encoder and decoder. The input x[0,1]d is converted through function f(x) (W1 denotes a weight matrix of encoder, b1 denotes a bias vector of encoder, σ1 element-wise sigmoid activation function of encoder). Output h is the encoded part of \glsplae (code), latent variables, or latent representation. The inverse of function f(x), called function g(h), produces the reconstruction of output r (W2 denotes a weight matrix of decoder, b2 denotes a bias vector of decoder, σ2 element-wise sigmoid activation function of decoder). Equations 25 and  26 illustrate the simple AE process Vincent_2008. Equation 27 shows the loss function of the \glsae, the \glsmse. In the literature, \glsplae have been used for feature extraction and dimensionality reduction Goodfellow-et-al-2016; Vincent_2008.

h=f(x)=σ1(W1x+b1) (25)
r=g(h)=σ2(W2h+b2) (26)
L(x,r)=||x-r||2 (27)
\glspl

ae are a specialized version of \glsplffnn. The backpropagation learning is used for the update of the weights in the networkGoodfellow-et-al-2016. Optimization algorithms (\glssgd, \glsrmsprop, \glsadam) are used for the learning process of \glsplae. \glsmse is used as a loss function in \glsplae. In addition, recirculation algorithms may also be used for the training of the \glsplae Goodfellow-et-al-2016. \glsplae’ hyperparameters are similar to \glsdl hyperparameters. Learning rate, weight-cost (decay rate), dropout fraction, batch size (minibatch size), the number of epochs, the number of layers, the number of nodes in each encoder layers, type of activation functions, number of nodes in each decoder layers, network weight initialization, optimization algorithms, and the number of nodes in the code layer (size of latent representation) are the hyperparameters of \glsplae. Similar to the other deep networks, the hyperparameters are searched with \glsmanualsearch, \glsgridsearch, \glsrandomsearch, and Bayesian methods Bergstra_2011; Bergstra_2012.

3.8 Deep Reinforcement Learning (DRL)

\gls

rl is a type of learning method that differs from supervised and unsupervised learning models. It does not need a preliminary data set which is labeled or clustered before. \glsrl is an ML approach inspired by learning action/behavior, which deals with what actions should be taken by subjects to achieve the highest reward in an environment. There are different application areas that are used: game theory, control theory, multi-agent systems, operations research, robotics, information theory, managing investment portfolio, simulation-based optimization, playing Atari games, and statistics sutton1998introduction. Some of the advantages of using \glsrl for control problems are that an agent can be easily re-trained to adapt to changes in the environment and that the system is continually improved while training is constantly performed. An \glsrl agent learns by interacting with its surroundings and observing the results of these interactions. This learning method mimics the basic way of how people learn.

\gls

rl is mainly based on \glsmdp. \glsmdp is used to formalize the \glsrl environment. \glsmdp consists of five tuples: state (finite set of states), action (finite set of actions), reward function (scalar feedback signal), state transition probability matrix (p(s,r|s,a), s denotes next state, r denotes reward function, s denotes state, a denotes action), discount factor (γ, present value of future rewards). The aim of the agent is to maximize the cumulative reward. The return (Gt) is the total discounted reward. Equation 28 illustrates the total return (Gt denotes total discounted reward, R denotes rewards, t denotes time, k denotes variable in time).

Gt=Rt+1+γRt+2+γ2Rt+3+=k=0γkRt+k+1 (28)

The value function is the prediction of the future values. It informs about how good is state/action. Equation 29 illustrates the formulation of the value function (v(s) denotes the value function, E[.] denotes the expectation function, Gt denotes the total discounted reward, s denotes the given state, R denotes the rewards, S denotes the set of states, t denotes time).

v(s)=E[Gt|St=s]=E[Rt+1+γv(St+1)|St=s] (29)

Policy (π) is the agent’s behavior strategy. It is like a map from state to action. There are two types of value functions to express the actions in the policy: state-value function (vπ(s)), action-value function (qπ(s,a)). The state-value function (Equation 30) is the expected return of starting from s to following policy π (Eπ[.] denotes expectation function). The action-value function (Equation 31) is the expected return of starting from s, taking action a to following policy π (A denotes the set of actions, a denotes the given action).

vπ(s)=Eπ[Gt|St=s]=Eπ[k=0γkRt+k+1|St=s] (30)
qπ(s,a)=Eπ[Gt|St=s,At=a] (31)

The optimal state-value function (Equation 32) is the maximum value function over all policies. The optimal action-value function (Equation 33) is the maximum action-value function over all policies.

v*(s)=max(vπ(s)) (32)
q*(s,a)=max(qπ(s,a)) (33)

The \glsrl solutions and methods in the literature are too broad to review in this paper. So, we summarized the important issues of \glsrl, important \glsrl solutions and methods. \glsrl methods are mainly divided into two sections: Model-based methods and model-free methods. The model-based method uses a model that is known by the agent before, value/policy and experience. The experience can be real (sample from the environment) or simulated (sample from the model). Model-based methods are mostly used in the application of robotics, and control algorithms Nguyen_2011. Model-free methods are mainly divided into two groups: Value-based and policy-based methods. In value-based methods, a policy is produced directly from the value function (e.g. epsilon-greedy). In policy-based methods, the policy is parametrized directly. In value-based methods, there are three main solutions for \glsmdp problems: \glsdp, \glsmc, and \glstd.

In \glsdp method, problems are solved with optimal substructure and overlapping subproblems. The full model is known and it is used for planning in \glsmdp. There are two iterations (learning algorithms) in \glsdp: policy iteration and value iteration. \glsmc method learns experience directly by running an episode of game/simulation. \glsmc is a type of model-free method that does not need \glsmdp transitions/rewards. It collects states, returns and it gets mean of returns for the value function. \glstd is also a model-free method that learns the experience directly by running the episode. In addition, \glstd learns incomplete episodes like the \glsdp method by using bootstrapping. \glstd method combines \glsmc and \glsdp methods. SARSA (state, action, reward, state, action; St, At, Rt, St+1, At+1) is a type of \glstd control algorithm. Q-value (action-value function) is updated with the agent actions. It is an on-policy learning model that learns from actions according to the current policy π. Equation 34 illustrates the update of the action-value function in SARSA algorithm (St denotes current state, At denotes current action, t denotes time, R denotes reward, α denotes learning rate, γ denotes discount factor). Q-learning is another \glstd control algorithm. It is an off-policy learning model that learns from different actions that do not need the policy π at all. Equation 35 illustrates the update of the action-value function in Q-Learning algorithm (The whole algorithms can be reached in sutton1998introduction, a denotes action).

Q(St,At)=Q(St,At)+α[R(t+1)+γQ(St+1,At+1)-Q(St,At)] (34)
Q(St,At)=Q(St,At)+α[R(t+1)+γmaxaQ(St+1,a)-Q(St,At)] (35)

In the value-based methods, a policy can be generated directly from the value function (e.g. using epsilon-greedy). The policy-based method uses the policy directly instead of using the value function. It has advantages and disadvantages over the value-based methods. The policy-based methods are more effective in high-dimensional or continuous action spaces, and have better convergence properties when compared against the value-based methods. It can also learn the stochastic policies. On the other hand, the policy-based method evaluates a policy that is typically inefficient and has high variance. It typically converges to a local rather than the global optimum. In the policy-based methods, there are also different solutions: Policy gradient, Reinforce (Monte-Carlo Policy Gradient), Actor-Critic sutton1998introduction (Details of policy-based methods can be reached in sutton1998introduction).

\gls

drl methods contain \glsplnn. Therefore, \glsdrl hyperparameters are similar to \glsdl hyperparameters. Learning rate, weight-cost (decay rate), dropout fraction, regularization method, batch size (minibatch size), the number of epochs, the number of layers, the number of nodes in each layer, type of activation functions, network weight initialization, optimization algorithms, discount factor, and the number of episodes are the hyperparameters of \glsdrl. Similar to the other deep networks, the hyperparameters are searched with \glsmanualsearch, \glsgridsearch, \glsrandomsearch and bayesian methods Bergstra_2011; Bergstra_2012.

4 Financial Time Series Forecasting

The most widely studied financial application area is forecasting of a given financial time series, in particular asset price forecasting. Even though some variations exist, the main focus is on predicting the next movement of the underlying asset. More than half of the existing implementations of \glsdl were focused on this area. Even though there are several subtopics of this general problem including Stock price forecasting, Index prediction, forex price prediction, commodity (oil, gold, etc) price prediction, bond price forecasting, volatility forecasting, cryptocurrency price forecasting, the underlying dynamics are the same in all of these applications.

The studies can also be clustered into two main groups based on their expected outputs: price prediction and price movement (trend) prediction. Even though price forecasting is basically a regression problem, in most of the financial time series forecasting applications, correct prediction of the price is not perceived as important as correctly identifying the directional movement. As a result, researchers consider trend prediction, i.e. forecasting which way the price will change, a more crucial study area compared with exact price prediction. In that sense, trend prediction becomes a classification problem. In some studies, only up or down movements are taken into consideration (2-class problem), whereas up, down or neutral movements (3-class problem) also exist.

\gls

lstm and its variations along with some hybrid models dominate the financial time series forecasting domain. \glslstm, by its nature utilizes the temporal characteristics of any time series signal, hence forecasting financial time series is a well-studied and successful implementation of \glslstm. However, some researchers prefer to either extract appropriate features from the time series or transform the time series in such a way that, the resulting financial data becomes stationary from a temporal perspective, meaning even if we shuffle the data order, we will still be able to properly train the model and achieve successful out-of-sample test performance. For those implementations, \glscnn and \glsdfnn were the most commonly chosen \glsdl models.

Various financial time series forecasting implementations using \glsdl models exist in literature. We will cover each of these aforementioned implementation areas in the following subsections. In this survey paper, we examined the papers using the following criteria:

  • 1.

    First, we grouped the articles according to their subjects.

  • 2.

    Then, we grouped the related papers according to their feature set.

  • 3.

    Finally, we grouped each subgroup according to \glsdl models/methods.

For each implementation area, the related papers will be subgrouped and tabulated. Each table will have the following fields to provide the information about the implementation details for the papers within the group: Article (Art.) and Data Set are trivial, Period refers to the time period for training and testing. Feature Set lists the input features used in the study. Lag has the time length of the input vector (e.g. 30d means the input vector has a 30 day window) and horizon shows how far out into the future is predicted by the model. Some abbreviations are used for these two aforementioned fields: min is minutes, h is hours, d is days, w is weeks, m is months, y is years, s is steps, * is mixed. Method shows the \glsdl models that are used in the study. Performance criteria provides the evaluation metrics, and finally the Environment (Env.) lists the development framework/software/tools. Some column values might be empty, indicating there was no relevant information in the paper for the corresponding field.

4.1 Stock Price Forecasting

Price prediction of any given stock is the most studied financial application of all. We observed the same trend within the \glsdl implementations. Depending on the prediction time horizon, different input parameters are chosen varying from \glshft and intraday price movements to daily, weekly or even monthly stock close prices. Also, technical, fundamental analysis, social media feeds, sentiment, etc. are among the different parameters that are used for the prediction models.

Table 1: Stock Price Forecasting Using Only Raw Time Series Data
Art. Data Set Period Feature Set Lag Horizon Method Performance Criteria Env.
Chong_2017 38 stocks in \acrshortkospi 2010-2014 Lagged stock returns 50min 5min \acrshortdnn \acrshortnmse, \acrshortrmse, \acrshortmae, \acrshortmi -
Chen_2015 China stock market, 3049 Stocks 1990-2015 \acrshortochlv 30d 3d \acrshortlstm Accuracy Theano, Keras
Dezsi_2016 Daily returns of ‘BRD’ stock in Romanian Market 2001-2016 \acrshortochlv - 1d \acrshortlstm \acrshortrmse, \acrshortmae Python, Theano
Samarawickrama_2017 297 listed companies of \acrshortcse 2012-2013 \acrshortochlv 2d 1d \acrshortlstm, \acrshortsrnn, \acrshortgru \acrshortmad, \acrshortmape Keras
M_2018 5 stock in \acrshortnse 1997-2016 \acrshortochlv, Price data, turnover and number of trades. 200d 1..10d \acrshortlstm, \acrshortrnn, \acrshortcnn, \acrshortmlp \acrshortmape -
Selvin_2017 Stocks of Infosys, TCS and CIPLA from \acrshortnse 2014 Price data - - \acrshortrnn, \acrshortlstm and \acrshortcnn Accuracy -
Lee_2018 10 stocks in \acrshortsp500 1997-2016 \acrshortochlv, Price data 36m 1m \acrshortrnn, \acrshortlstm, \acrshortgru Accuracy, Monthly return Keras, Tensorflow
Li_2017 Stocks data from \acrshortsp500 2011-2016 \acrshortochlv 1d 1d \acrshortdbn \acrshortmse, \acrshortnorm-rmse, \acrshortmae -
Chen_2018 High-frequency transaction data of the \acrshortcsi300 futures 2017 Price data - 1min \acrshortdnn, \acrshortelm, \acrshortrbf \acrshortrmse, \acrshortmape, Accuracy Matlab
Krauss_2017 Stocks in the \acrshortsp500 1990-2015 Price data 240d 1d \acrshortdnn, \acrshortgbt, \acrshortrf Mean return, \acrshortmdd, Calmar ratio H2O
Chandra_2016 ACI Worldwide, Staples, and Seagate in \acrshortnasdaq 2006-2010 Daily closing prices 17d 1d \acrshortrnn, \acrshortann \acrshortrmse -
Liu_2017 Chinese Stocks 2007-2017 \acrshortochlv 30d 1..5d \acrshortcnn + \acrshortlstm Annualized Return, Mxm Retracement Python
Heaton_2016 20 stocks in \acrshortsp500 2010-2015 Price data - - \acrshortae + \acrshortlstm Weekly Returns -
Batres_2015 \acrshortsp500 1985-2006 Monthly and daily log-returns * 1d \acrshortdbn+\acrshortmlp Validation, Test Error Theano, Python, Matlab
Yuan_2018 12 stocks from \acrshortsse Composite Index 2000-2017 \acrshortochlv 60d 1..7d \acrshortdwnn \acrshortmse Tensorflow
Zhang_2017 50 stocks from \acrshortnyse 2007-2016 Price data - 1d, 3d, 5d \acrshortsfm \acrshortmse -

In this survey, first, we grouped the stock price forecasting articles according to their feature set such as studies using only the raw time series data (price data, \glsochlv) for price prediction; studies using various other data and papers that used text mining techniques. Regarding the first group, the corresponding \glsdl models were directly implemented using the raw time series for price prediction. Table 1 tabulates the stock price forecasting papers that used only raw time series data in the literature. In Table 1, different methods/models are also listed based on four sub-groups: \glsdnn (networks that are deep but without any given topology details) and \glslstm models; multi models; hybrid models; novel methods.

\gls

dnn and \glslstm models were solely used in 3 papers. In Chong_2017, \glsdnn and lagged stock returns were used to predict the stock prices in \glskospi. Chen et. al. Chen_2015, Dezsi and Nistor Dezsi_2016 applied the raw price data as the input to \glslstm models.

Meanwhile, there were some studies implementing multiple \glsdl models for performance comparison using only the raw price (\glsochlv) data for forecasting. Among the noteworthy studies, the authors in Samarawickrama_2017 compared \glsrnn, \glssrnn, \glslstm and \glsgru. Hiransha et. al. M_2018 compared \glslstm, \glsrnn, \glscnn, \glsmlp, whereas in Selvin_2017 \glsrnn, \glslstm, \glscnn, \glsarima were preferred, Lee and Yoo Lee_2018 compared 3 \glsrnn models (\glssrnn, \glslstm, \glsgru) for stock price prediction and then constructed a threshold based portfolio with selecting stocks according to the predictions and Li et. al. Li_2017 implemented \glsdbn. Finally, the authors of Chen_2018 compared 4 different \glsml models (1 \glsdl model - \glsae and \glsrbm), \glsmlp, \glsrbf and \glselm for predicting the next price in 1-minute price data. They also compared the results with different sized datasets. The authors of Krauss_2017 used price data and \glsdnn, \glsgbt, \glsrf methods for the prediction of the stocks in the \glssp500. In Chandra and Chan Chandra_2016, co-operative neuro-evolution, \glsrnn (Elman network) and \glsdfnn were used for the prediction of stock prices in \glsnasdaq (ACI Worldwide, Staples, and Seagate).

Meanwhile, hybrid models were used in some of the papers. The author of Liu_2017 applied \glscnn+\glslstm in their studies. Heaton et. al. Heaton_2016 implemented smart indexing with \glsae. The authors of Batres_2015 combined \glsdbn and \glsmlp to construct a stock portfolio by predicting each stock’s monthly log-return and choosing the only stocks that were expected to perform better than the performance of the median stock.

In addition, some novel approaches were adapted in some of the studies. The author of Yuan_2018 proposed novel \glsdwnn which is combination of \glsrnn and \glscnn. The author of Zhang_2017 implemented \glssfm recurrent network in their studies.

In another group of studies, some researchers again focused on \glslstm based models. However, their input parameters came from various sources including the raw price data, technical and/or fundamental analysis, macroeconomic data, financial statements, news, investor sentiment, etc. Table 2 tabulates the stock price forecasting papers that used various data such as the raw price data, technical and/or fundamental analysis, macroeconomic data in the literature. In Table 2, different methods/models are also listed based on five sub-groups: \glsdnn model; \glslstm and \glsrnn models; multiple and hybrid models; \glscnn model; novel methods.

\gls

dnn models were used in some of the stock price forecasting papers within this group. In Abe_2018, \glsdnn model and 25 fundamental features were used for the prediction of the Japan Index constituents. Feng et. al. Feng_2018 also used fundamental features and \glsdnn model for the prediction. \glsdnn model, macro economic data such as GDP, unemployment rate, inventories, etc. were used by the authors of Fan_2014 for the prediction of the U.S. low-level disaggregated macroeconomic time series.

\gls

lstm and \glsrnn models were chosen in some of the studies. Kraus and Feuerriegel Kraus_2017 implemented \glslstm with transfer learning using text mining through financial news and the stock market data. Similarly, the author of Minami_2018 used \glslstm to predict the stock’s next day price using corporate action events and macro-economic index. Zhang and Tan Zhang_2018_a implemented DeepStockRanker, an \glslstm based model for stock ranking using 11 technical indicators. In another study Zhuge_2017, the authors used the price time series and emotional data from text posts for predicting the stock opening price of the next day with \glslstm network. Akita et. al. Akita_2016 used textual information and stock prices through Paragraph Vector + \glslstm for forecasting the prices and the comparisons were provided with different classifiers. Ozbayoglu Ozbayoglu_2007 used technical indicators along with the stock data on a Jordan-Elman network for price prediction.

There were also multiple and hybrid models that used mostly technical analysis features as their inputs to the \glsdl model. Several technical indicators were fed into \glslstm and \glsmlp networks in Khare_2017 for predicting intraday price prediction. Recently, Zhou et. al. Zhou_2018 used \glsgan-fd model for stock price prediction and compared their model performances against \glsarima, \glsann and \glssvm. The authors of Singh_2016 used several technical indicator features and time series data with \glspca for dimensionality reduction cascaded with \glsdnn (2-layer \glsffnn) for stock price prediction. In Karaoglu_2017, the authors used Market microstructures based trade indicators as inputs into \glsrnn with Graves \glslstm detecting the buy-sell pressure of movements in \glsbist in order to perform the price prediction for intelligent stock trading. In Zhou_2018_a, next month’s return was predicted and top to be performed portfolios were constructed. Good monthly returns were achieved with \glslstm and \glslstm-\glsmlp models.

Meanwhile, in some of the papers, \glscnn models were preferred. The authors of Abroyan_2017 used 250 features: order details, etc for the prediction of the private brokerage company’s real data of risky transactions. They used \glscnn and \glslstm for stock price forecasting. The authors of GooglePatent used \glscnn model, fundamental, technical and market data for the prediction.

Novel methods were also developed in some of the studies. In Tran_2017, FI-2010 dataset: bid/ask and volume were used as the feature set for the forecast. In the study, they proposed \glswmtr, \glsmda. The authors of Feng_2018_a used 57 characteristic features such as Market equity, Market Beta, Industry momentum, Asset growth, etc. as inputs to a Fama-French n-factor model \glsdl for predicting monthly US equity returns in \glsnyse, \glsamex, or \glsnasdaq.

Table 2: Stock Price Forecasting Using Various Data
Art. Data Set Period Feature Set Lag Horizon Method Performance Criteria Env.
Abe_2018 Japan Index constituents from WorldScope 1990-2016 25 Fundamental Features 10d 1d \acrshortdnn Correlation, Accuracy, \acrshortmse Tensorflow
Feng_2018 Return of \acrshortsp500 1926-2016 Fundamental Features: - 1s \acrshortdnn \acrshortmspe Tensorflow
Fan_2014 U.S. low-level disaggregated macroeconomic time series 1959-2008 GDP, Unemployment rate, Inventories, etc. - - \acrshortdnn \acrshortr-sq -
Kraus_2017 \acrshortcdax stock market data 2010-2013 Financial news, stock market data 20d 1d \acrshortlstm \acrshortmse, \acrshortrmse, \acrshortmae, Accuracy, \acrshortauc TensorFlow, Theano, Python, Scikit-Learn
Minami_2018 Stock of Tsugami Corporation 2013 Price data - - \acrshortlstm \acrshortrmse Keras, Tensorflow
Zhang_2018_a Stocks in China’s A-share 2006-2007 11 technical indicators - 1d \acrshortlstm \acrshortareturn, \acrshortir, \acrshortic -
Zhuge_2017 SCI prices 2008-2015 \acrshortochl of change rate, price 7d - EmotionalAnalysis + \acrshortlstm \acrshortmse -
Akita_2016 10 stocks in Nikkei 225 and news 2001-2008 Textual information and Stock prices 10d - Paragraph Vector + \acrshortlstm Profit -
Ozbayoglu_2007 TKC stock in \acrshortnyse and QQQQ ETF 1999-2006 Technical indicators, Price 50d 1d \acrshortrnn (Jordan-Elman) Profit, \acrshortmse Java
Khare_2017 10 Stocks in \acrshortnyse - Price data, Technical indicators 20min 1min \acrshortlstm, \acrshortmlp \acrshortrmse -
Zhou_2018 42 stocks in China’s \acrshortsse 2016 \acrshortochlv, Technical Indicators 242min 1min \acrshortgan (\acrshortlstm, \acrshortcnn) \acrshortrmsre, \acrshortdpa, \acrshortgan-F, \acrshortgan-D -
Singh_2016 Google’s daily stock data 2004-2015 \acrshortochlv, Technical indicators 20d 1d (2D)2 \acrshortpca + \acrshortdnn \acrshortsmape, \acrshortpcd, \acrshortmape, \acrshortrmse, \acrshorthr, \acrshorttr, \acrshortr-sq R, Matlab
Karaoglu_2017 GarantiBank in \acrshortbist, Turkey 2016 \acrshortochlv, Volatility, etc. - - \acrshortplr, Graves \acrshortlstm \acrshortmse, \acrshortrmse, \acrshortmae, \acrshortrse, \acrshortr-sq Spark
Zhou_2018_a Stocks in \acrshortnyse, \acrshortamex, \acrshortnasdaq, \acrshorttaq intraday trade 1993-2017 Price, 15 firm characteristics 80d 1d \acrshortlstm+\acrshortmlp Monthly return, \acrshortsr Python,Keras, Tensorflow in AWS
Abroyan_2017 Private brokerage company’s real data of risky transactions - 250 features: order details, etc. - - \acrshortcnn, \acrshortlstm F1-Score Keras, Tensorflow
GooglePatent Fundamental and Technical Data, Economic Data - Fundamental , technical and market information - - \acrshortcnn - -
Tran_2017 The LOB of 5 stocks of Finnish Stock Market 2010 FI-2010 dataset: bid/ask and volume - * \acrshortwmtr, \acrshortmda Accuracy, Precision, Recall, F1-Score -
Feng_2018_a Returns in \acrshortnyse, \acrshortamex, \acrshortnasdaq 1975-2017 57 firm characteristics * - Fama-French n-factor model \acrshortdl \acrshortr-sq, \acrshortrmse Tensorflow

There were a number of research papers that also used text mining techniques for the feature extraction, but used non-\glslstm models for the stock price prediction. Table 3 tabulates the stock price forecasting papers that used text mining techniques. In Table 3, different methods/models are clustered into three sub-groups: \glscnn and \glslstm models; \glsgru, \glslstm, and \glsrnn models; novel methods.

\gls

cnn and \glslstm models were adapted in some of the papers. In Ding_2015, events were detected from Reuters and Bloomberg news through text mining and that information was used for the price prediction and stock trading through the \glscnn model. Vargas et. al. Vargas_2017 used text mining on \glssp500 index news from Reuters through a \glslstm+\glscnn hybrid model for price prediction and intraday directional movement estimation together. The authors of Lee_2017_b used the financial news data and implemented word embedding with Word2vec along with MA and stochastic oscillator to create inputs for \glsrcnn for stock price prediction. The authors of Iwasaki_2018 also used sentiment analyses through text mining and word embeddings from analyst reports and used sentiment features as inputs to \glsdfnn model for stock price prediction. Then different portfolio selections were implemented based on the projected stock returns.

\gls

gru, \glslstm, and \glsrnn models were preferred in the next group of papers. Das et. al. Das_2018 implemented sentiment analysis on Twitter posts along with the stock data for price forecasting using \glsrnn. Similarly, the authors of Jiahong_Li_2017 used sentiment classification (neutral, positive, negative) for the stock open or close price prediction with various \glslstm models. They compared their results with \glssvm and achieved higher overall performance. In Zhongshengz_2018, text and price data were used for the prediction of the \glssci prices.

Some novel approaches were also found in some of the papers. The authors of Nascimento_2015 used word embeddings for extracting information from web pages and then combined with the stock price data for stock price prediction. They compared \glsar model and \glsrf with and without news. The results showed embedding news information improved the performance. In Han_2018, financial news and ACE2005 Chinese corpus were used. Different event-types on Chinese companies were classified based on a novel event-type pattern classification algorithm in Han_2018, also next day stock price change was predicted using additional inputs.

Table 3: Stock Price Forecasting Using Text Mining Techniques for Feature Extraction
Art. Data Set Period Feature Set Lag Horizon Method Performance Criteria Env.
Ding_2015 \acrshortsp500 Index, 15 stocks in \acrshortsp500 2006-2013 News from Reuters and Bloomberg - - \acrshortcnn Accuracy, \acrshortmcc -
Vargas_2017 \acrshortsp500 index news from Reuters 2006-2013 Financial news titles, Technical indicators 1d 1d \acrshortrcnn Accuracy -
Lee_2017_b \acrshorttwse index, 4 stocks in \acrshorttwse 2001-2017 Technical indicators, Price data, News 15d - \acrshortcnn + \acrshortlstm \acrshortrmse, Profit Keras, Python, TALIB
Iwasaki_2018 Analyst reports on the TSE and Osaka Exchange 2016-2018 Text - - \acrshortlstm, \acrshortcnn, \acrshortbi-lstm Accuracy, R-squared R, Python, MeCab
Das_2018 Stocks of Google, Microsoft and Apple 2016-2017 Twitter sentiment and stock prices - - \acrshortrnn - Spark, Flume, Twitter API,
Jiahong_Li_2017 Stocks of \acrshortcsi300 index, \acrshortochlv of \acrshortcsi300 index 2009-2014 Sentiment Posts, Price data 1d 1d Naive Bayes + \acrshortlstm Precision, Recall, F1-score, Accuracy Python, Keras
Zhongshengz_2018 SCI prices 2013-2016 Text data and Price data 7d 1d \acrshortlstm Accuracy, F1-Measure Python, Keras
Nascimento_2015 Stocks from \acrshortsp500 2006-2013 Text (news) and Price data 7d 1d \acrshortlar+News, \acrshortrf+News \acrshortmape, \acrshortrmse -
Han_2018 News from Sina.com, ACE2005 Chinese corpus 2012-2016 A set of news text - - Their unique algorithm Precision, Recall, F1-score -

4.2 Index Forecasting

Instead of trying to forecast the price of a single stock, several researchers preferred to predict the stock market index. Indices generally are less volatile than individual stocks, since they are composed of multiple stocks from different sectors and are more indicative of the overall momentum and general state of the economy.

In the literature, different stock market index data were used for the experiments. Most commonly used index data can be listed as follows: \glssp500, \glscsi300, \glsnifty, \glsnikkei225, \glsdjia, \glssse180, \glshsi, \glsszse, \glsftse100, \glstaiex, \glsbist, \glsnasdaq, \glsdow30, \glskospi, \glsvix, \glsvxn, \glsbovespa, \glsomx, \glsnyse. The authors of Bao_2017; Parida_2016; Fischer_2018; Widegren_2017; borovykh_2018; Althelaya_2018; Dingli_2017; Rout_2017; Jeong_2019; Baek_2018; Hansson_2017; Elliot_2017; Ding_2015 used \glssp500 as their dataset. The authors of Bao_2017; Parida_2016; Li_2017a; Namini_2018; Hsieh_2011 used \glsnikkei as their dataset. \glskospi was used in Li_2017a; Jeong_2019; Baek_2018. \glsdjia was used as the dataset in Bao_2017; Namini_2018; Hsieh_2011; Zhang_2015; Bekiros_2013. Besides, the authors of Bao_2017; Li_2017a; Hsieh_2011; Jeong_2019 used \glshsi as the dataset in their studies. \glsszse is used in studies of Pang_2018; Li_2017a; Deng_2017; Yang_2017.

In addition, in the literature, there were different methods for the prediction of the index data. While some of the studies used only the raw time series data, some others used various other data such as technical indicators, index data, social media feeds, news from Reuters, Bloomberg, the statistical features of data (standard deviation, skewness, kurtosis, omega ratio, fund alpha). In this survey, first, we grouped the index forecasting articles according to their feature set such as studies using only the raw time series data (price/index data, \glsochlv); then in the second group we clustered the studies using various other data. Table 4 tabulates the index forecasting papers using only the raw time series data. Moreover, different methods (models) were used for index forecasting. \glsmlp, \glsrnn, \glslstm, \glsdnn (most probably \glsdfnn, or \glsdmlp) methods were the most used methods for index forecasting. In Table 4, these various methods/models are also listed as four sub-groups: \glsann, \glsdnn, \glsmlp, and \glsfddr models; \glsrl and \glsdl models; \glslstm and \glsrnn models; novel methods.

Table 4: Index Forecasting Using Only Raw Time Series Data
Art. Data Set Period Feature Set Lag Horizon Method Performance Criteria Env.
Parida_2016 \acrshortsp500, Nikkei225, USD Exchanges 2011-2015 Index data - 1d, 5d, 7d, 10d \acrshortlrnfis with Firefly-Harmony Search \acrshortrmse, \acrshortmape, \acrshortmae -
Fischer_2018 \acrshortsp500 Index 1989-2005 Index data, Volume 240d 1d \acrshortlstm Return, \acrshortstd, \acrshortsr, Accuracy Python, TensorFlow, Keras, R, H2O
borovykh_2018 \acrshortsp500, \acrshortvix 2005-2016 Index data * 1d uWN, cWN \acrshortmase, \acrshorthit, \acrshortrmse -
Althelaya_2018 \acrshortsp500 Index 2010-2017 Index data 10d 1d, 30d Stacked \acrshortlstm, \acrshortbi-lstm \acrshortmae, \acrshortrmse, R-squared Python, Keras, Tensorflow
Jeong_2019 \acrshortsp500, \acrshortkospi, \acrshorthsi, and EuroStoxx50 1987-2017 200-days stock price 200d 1d Deep Q-Learning and \acrshortdnn Total profit, Correlation -
Baek_2018 \acrshortsp500, \acrshortkospi200, 10-stocks 2000-2017 Index data 20d 1d ModAugNet: \acrshortlstm \acrshortmse, \acrshortmape, \acrshortmae Keras
Hansson_2017 \acrshortsp500, Bovespa50, \acrshortomx30 2009-2017 Autoregressive part of the time series - 1d \acrshortlstm \acrshortmse, Accuracy Tensorflow, Keras, R
Elliot_2017 \acrshortsp500 2000-2017 Index data - 1..4d, 1w, 1..3m \acrshortglm, \acrshortlstm+\acrshortrnn \acrshortmae, \acrshortrmse Python
Namini_2018 Nikkei225, \acrshortixic, \acrshorthsi, \acrshortgspc, \acrshortdjia 1985-2018 \acrshortochlv 5d 1d \acrshortlstm \acrshortrmse Python, Keras, Theano
Zhang_2015 \acrshortdjia - Index data - - Genetic Deep Neural Network \acrshortmse Java
Bekiros_2013 Log returns of the \acrshortdjia 1971-2002 Index data 20d 1d \acrshortrnn \acrshorttr, sign rate, PT/HM test, \acrshortmsfe, \acrshortsr, profit -
Pang_2018 Shanghai A-shares composite index, \acrshortszse 2006-2016 \acrshortochlv 10d - Embedded layer + \acrshortlstm Accuracy, \acrshortmse Python, Matlab, Theano
Deng_2017 300 stocks from \acrshortszse, Commodity 2014-2015 Index data - - \acrshortfddr, \acrshortdnn + \acrshortrl Profit, return, \acrshortsr, profit-loss curves Keras
Yang_2017 Shanghai composite index and \acrshortszse 1990-2016 \acrshortochlv 20d 1d Ensembles of \acrshortann Accuracy -
Lachiheb_2018 \acrshorttunindex 2013-2017 Log returns of index data - 5min \acrshortdnn with hierarchical input Accuracy, \acrshortmse Java
Yong_2017 Singapore Stock Market Index 2010-2017 \acrshortochl of last 10 days of index 10d 3d Feed-forward \acrshortdnn \acrshortrmse, \acrshortmape, Profit, \acrshortsr -
Yumlu_2005 \acrshortbist 1990-2002 Index data 7d 1d \acrshortmlp, \acrshortrnn, \acrshortmoe \acrshorthit, positive/negative \acrshorthit, \acrshortmse, \acrshortmae -
Yan_2017 SCI 2012-2017 \acrshortochlv, Index data - 1..10d Wavelet + \acrshortlstm \acrshortmape, theil unequal coefficient -
Takahashi_2017 \acrshortsp500 1950-2016 Index data 15d 1d \acrshortlstm \acrshortrmse Keras
Bildirici_2010 \acrshortise100 1987-2008 Index data - 2d, 4d, 8d, 12d, 18d \acrshorttar-\acrshortvec-\acrshortmlp, \acrshorttar-\acrshortvec-\acrshortrbf, \acrshorttar-\acrshortvec-\acrshortrhe \acrshortrmse -
Psaradellis_2016 \acrshortvix, \acrshortvxn, \acrshortvxd 2002-2014 First five autoregressive lags 5d 1d, 22d \acrshorthar-gasvr \acrshortmae, \acrshortrmse -
\gls

ann, \glsdnn, \glsmlp, and \glsfddr models were used in some of the studies. In Lachiheb_2018, log returns of the index data was used with \glsdnn with hierarchical input for the prediction of the TUNINDEX data. The authors of Yong_2017 used deep \glsffnn and \glsochl of the last 10 days of index data for prediction. In addition, \glsmlp and \glsann were used for the prediction of index data. In Yumlu_2005, the raw index data was used with \glsmlp, \glsrnn, \glsmoe and \glsegarch for the forecast. In Yang_2017, ensembles of \glsann with \glsochlv of the data were used for the prediction of the Shanghai composite index.

Furthermore, \glsrl and \glsdl methods were used together for the prediction of the index data in some of the studies. In Deng_2017, \glsfddr, \glsdnn and \glsrl methods were used to predict 300 stocks from \glsszse index data and commodity prices. In Jeong_2019, Deep Q-Learning and \glsdnn methods and 200-days stock price dataset were used together for the prediction of \glssp500 index.

Most of the preferred methods for prediction of the index data using the raw time series data were based on \glslstm and \glsrnn. In Bekiros_2013, \glsrnn was used for prediction of the log returns of \glsdjia index. In Fischer_2018, \glslstm was used to predict \glssp500 Index data. The authors of Althelaya_2018 used stacked \glslstm, \glsbi-lstm methods for \glssp500 Index forecasting. The authors of Yan_2017 used \glslstm network to predict the next day closing price of Shanghai stock Index. In their study, they used wavelet decomposition to reconstruct the financial time series for denoising and better learning. In Pang_2018, \glslstm was used for the prediction of Shanghai A-shares composite index. The authors of Namini_2018 used \glslstm to predict \glsnikkei225, IXIC, HIS, GSPC and \glsdjia index data. In Takahashi_2017 and Baek_2018, \glslstm was also used for the prediction of \glssp500 and \glskospi200 index. The authors of Baek_2018 developed an \glslstm based stock index forecasting model called ModAugNet. The proposed method was able to beat \glsb-h in the long term with an overfitting prevention mechanism. The authors of Elliot_2017 compared different \glsml models (linear model), \glsgml and several \glslstm, \glsrnn models for stock index price prediction. In Hansson_2017, \glslstm and autoregressive part of the time series index data were used for prediction of \glssp500, \glsbovespa50, \glsomx30 indices.

Also, some studies adapted novel appraches. In Zhang_2015, genetic \glsdnn was used for \glsdjia index forecasting. The authors of borovykh_2018 proposed a new \glsdnn model which is called Wavenet convolutional net for time series forecasting. The authors of Bildirici_2010 proposed a (\glstar-\glsvec-\glsrhe) model for forex and stock index of return prediction and compared several models. The authors of Parida_2016 proposed a method that is called \glslrnfis with \glsfhso \glsea to predict \glssp500, \glsnikkei225 indices and USD Exchange price data. The authors of Psaradellis_2016 proposed a \glshar with a \glsgasvr model that was called \glshar-\glsgasvr for prediction of \glsvix, \glsvxn, \glsvxd indices.

In the literature, some of the studies used various input data such as technical indicators, index data, social media news, news from Reuters, Bloomberg, the statistical features of data (standard deviation, skewness, kurtosis, omega ratio, fund alpha). Table 5 tabulates the index forecasting papers using these aforementioned various data. \glsdnn, \glsrnn, \glslstm, \glscnn methods were the most commonly used models in index forecasting. In Table 5, different methods/models are also listed within four sub-groups: \glsdnn model; \glsrnn and \glslstm models; \glscnn model; novel methods.

\gls

dnn was used as the classification model in some of the papers. In Chen_2016, \glsdnn and some of the feature of the data (Return, \glssr, \glsstd, Skewness, Kurtosis, Omega ratio, Fund alpha) were used for the prediction. In Widegren_2017, \glsdnn, \glsrnn and technical indicators were used for the prediction of \glsftse100, \glsomx30, \glssp500 indices.

In addition, \glsrnn and \glslstm models with various other data were also used for the prediction of the indices. The authors of Hsieh_2011 used \glsrnn and \glsochlv of indices, technical indicators to predict \glsdjia, \glsftse, Nikkei, \glstaiex indices. The authors of Mourelatos_2018 used \glsgasvr, \glslstm for the forecast. The authors of Chen_2018_f used four \glslstm models (technical analysis, attention mechanism and market vector embedded) for the prediction of the daily return ratio of \glshsi300 index. In Li_2017a, \glslstm with wavelet denoising and index data, volume, technical indicators were used for the prediction of the \glshsi, \glssse, \glsszse, \glstaiex, \glsnikkei, \glskospi indices. The authors of Si_2017 used MODRL+\glslstm method to predict Chinese stock-IF-IH-IC contract indices. The authors of Bao_2017 used stacked \glsplae to generate deep features using \glsochl of the stock prices, technical indicators and macroeconomic conditions to feed to \glslstm to predict the future stock prices.

Table 5: Index Forecasting Using Various Data
Art. Data Set Period Feature Set Lag Horizon Method Performance Criteria Env.
Ding_2015 \acrshortsp500 Index, 15 stocks in \acrshortsp500 2006-2013 News from Reuters and Bloomberg - - \acrshortcnn Accuracy, \acrshortmcc -
Lee_2017_b \acrshorttwse index, 4 stocks in \acrshorttwse 2001-2017 Technical indicators, Index data, News 15d - \acrshortcnn + \acrshortlstm \acrshortrmse, Profit Keras, Python, \acrshorttalib
Bao_2017 \acrshortcsi300, \acrshortnifty50, \acrshorthsi, \acrshortnikkei225, \acrshortsp500, \acrshortdjia 2010-2016 \acrshortochlv, Technical Indicators - 1d \acrshortwt, Stacked autoencoders, \acrshortlstm \acrshortmape, Correlation coefficient, \acrshorttheil-u -
Widegren_2017 FTSE100, OMXS 30, SP500, Commodity, Forex 1993-2017 Technical indicators 60d 1d \acrshortdnn, \acrshortrnn Accuracy, p-value -
Dingli_2017 \acrshortsp500, \acrshortdow30, \acrshortnasdaq100, Commodity, Forex, Bitcoin 2003-2016 Index data, Technical indicators - 1w, 1m \acrshortcnn Accuracy Tensorflow
Rout_2017 \acrshortbse, \acrshortsp500 2004-2012 Index data, technical indicators 5d 1d..1m \acrshortpso, \acrshorthmrpso, \acrshortde, \acrshortrceflann \acrshortrmse, \acrshortmape -
Li_2017a \acrshorthsi, \acrshortsse, \acrshortszse, \acrshorttaiex, \acrshortnikkei, \acrshortkospi 2010-2016 Index data, volume, technical indicators 2d..512d 1d \acrshortlstm with wavelet denoising Accuracy, \acrshortmape -
Hsieh_2011 \acrshortdjia, \acrshortftse, \acrshortnikkei, \acrshorttaiex 1997-2008 \acrshortochlv, Technical indicators 26d 1d \acrshortrnn \acrshortrmse, \acrshortmae, \acrshortmape, \acrshorttheil-u C
Chen_2016 Hedge fund monthly return data 1996-2015 Return, \acrshortsr, \acrshortstd, Skewness, Kurtosis, Omega ratio, Fund alpha 12m 3m, 6m, 12m \acrshortdnn Sharpe ratio, Annual return, Cum. return -
Mourelatos_2018 Stock of National Bank of Greece (ETE). 2009-2014 \acrshortftse100, \acrshortdjia, \acrshortgdax, \acrshortnikkei225, EUR/USD, Gold 1d, 2d, 5d, 10d 1d \acrshortgasvr, \acrshortlstm Return, volatility, \acrshortsr, Accuracy Tensorflow
Chen_2018_f Daily return ratio of \acrshorths300 index 2004-2018 \acrshortochlv, Technical indicators - - Market Vector + Tech. ind. + \acrshortlstm + Attention \acrshortmse, \acrshortmae Python, Tensorflow
Si_2017 Chinese stock-IF-IH-IC contract 2016-2017 Decisions for index change 240min 1min \acrshortmodrl+\acrshortlstm Profit and loss, \acrshortsr -
Chen_2018_e \acrshorths300 2015-2017 Social media news, Index data 1d 1d \acrshortrnn-Boost with \acrshortlda Accuracy, \acrshortmae, \acrshortmape, \acrshortrmse Python, Scikit-learn

Besides, different \glscnn implementations with various data (technical indicators, news, index data) were used in the literature. In Dingli_2017, \glscnn and index data, technical indicators were used for the \glssp500, \glsdow30, \glsnasdaq100 indices and Commodity, Forex, Bitcoin prices. In Ding_2015, \glscnn model with news from Reuters and Bloomberg were used for the prediction of \glssp500 Index and 15 stocks’ prices in \glssp500. In Lee_2017_b, \glscnn + \glslstm and technical indicators, index data, news were used for the forecasting of \glstwse index and 4 stocks’ prices in \glstwse.

In addition, there were some novel methods proposed for the index forecasting. The authors of Rout_2017 used \glsrnn models, \glsrceflann and \glsflann, with their weights optimized using various \glsea like \glspso, HMRPSO and \glspso for time series forecasting. The authors of Chen_2018_e used social media news to predict the index price and index direction with \glsrnn-Boost with \glslda features.

4.3 Commodity Price Forecasting

There were a number of studies particularly focused on the price prediction of any given commodity, such as gold, silver, oil, copper, etc. With increasing number of commodities that are available for public trading through online stock exchanges, interest in this topic will likely grow in the following years.

In the literature, there were different methods that were used for commodity price forecasting. \glsdnn, \glsrnn, \glsfddr, \glscnn were the most used models to predict the commodity prices. Table 6 provides the details about the commodity price forecasting studies with \glsdl.

In Dingli_2017, the authors used \glscnn for predicting the next week and next month price directional movement. Meanwhile, \glsrnn and \glslstm models were used in some of the commodity forecasting studies. In Dixon_2016, \glsdnn was used for Commodity forecasting. In Widegren_2017, different datasets (Commodity, forex, index) were used as datasets. \glsdnn and \glsrnn were used to predict the prices of the time series data. Technical indicators were used as the feature set which consist of \glsrsi, \glswilliamr, \glscci, \glspposc, momentum, \glsema. In S_nchez_Lasheras_2015, the authors used Elman \glsrnn to predict COMEX copper spot price (through \glsnymex) from daily close prices.

Hybrid and novel models were adapted in some studies. In Zhao_2017, \glsfnn and \glssdae deep models were compared against \glssvr, \glsrw and \glsmrs models for WTI oil price forecasting. As performance criteria, accuracy, \glsmape, \glsrmse were used. In Chen_2017_d, authors tried to predict WTI crude oil prices using several models including combinations of \glsdbn, \glslstm, \glsarma and \glsrw. \glsmse was used as the performance criteria. In Deng_2017, the authors used \glsfddr for stock price prediction and trading signal generation. They combined \glsdnn and \glsrl. Profit, return, SR, profit-loss curves were used as the performance criteria.

Table 6: Commodity Price Forecasting
Art. Data Set Period Feature Set Lag Horizon Method Performance Criteria Env.
Dingli_2017 \acrshortsp500, \acrshortdow30, \acrshortnasdaq100, Commodity, Forex, Bitcoin 2003-2016 Price data, Technical indicators - 1w, 1m \acrshortcnn Accuracy Tensorflow
Dixon_2016 Commodity, FX future, \acrshortetf 1991-2014 Price Data 100*5min 5min \acrshortdnn \acrshortsr, capability ratio, return C++, Python
Widegren_2017 \acrshortftse100, \acrshortomx30, \acrshortsp500, Commodity, Forex 1993-2017 Technical indicators 60d 1d \acrshortdnn, \acrshortrnn Accuracy, p-value -
S_nchez_Lasheras_2015 Copper prices from \acrshortnymex 2002-2014 Price data - - Elman \acrshortrnn \acrshortrmse R
Zhao_2017 \acrshortwti crude oil price 1986-2016 Price data 1m 1m \acrshortsdae, Bootstrap aggregation Accuracy, \acrshortmape, \acrshortrmse Matlab
Chen_2017_d \acrshortwti Crude Oil Prices 2007-2017 Price data - - \acrshortarma + \acrshortdbn, \acrshortrw + \acrshortlstm \acrshortmse Python, Keras, Tensorflow
Deng_2017 300 stocks from \acrshortszse, Commodity 2014-2015 Price data - - \acrshortfddr, \acrshortdnn + \acrshortrl Profit, return, \acrshortsr, profit-loss curves Keras

4.4 Volatility Forecasting

Volatility is directly related with the price variations in a given time period and is mostly used for risk assesment and asset pricing. Some researchers implemented models for accurately forecasting the underlying volatility of any given asset.

In the literature, there were different methods that were used for volatility forecasting. \glslstm, \glsrnn, \glscnn, MM, \glsgarch models were shown as some of these methods. Table 7 summarizes the studies that were focused on volatility forecasting. In Table 7, different methods/models are also represented as three sub-groups: \glscnn model; \glsrnn and \glslstm models; hybrid and novel models.

\gls

cnn model was used in one volatility forecasting study based on \glshft data Doering_2017.

Meanwhile, \glsrnn and \glslstm models were used in some of the researches. In Tino_2001, the authors used financial time series data to predict volatility changes with Markov Models and Elman \glsrnn for profitable straddle options trading. The authors of Xiong_2015 used the price data and different types of Google Domestic trends with \glslstm. The authors of Zhou_2018_b used \glscsi300, 28 words of the daily search volume based on Baidu as the dataset with \glslstm to predict the index volatility. The authors of Kim_2018 developed several \glslstm models integrated with \glsgarch for the prediction of volatility.

Hybrid and novel approaches were also adapted in some of the researches. In Nikolaev_2013, \glsrmdn-garch model was proposed. In addition, several models including traditional forecasting models and \glsdl models were compared for the estimation of volatility. The authors of Psaradellis_2016 proposed a novel method that is called \glshar-gasvr for volatility index forecasting.

Table 7: Volatility Forecasting
Art. Data Set Period Feature Set Lag Horizon Method Performance Criteria Env.
Doering_2017 London Stock Exchange 2007-2008 Limit order book state, trades, buy/sell orders, order deletions - - \acrshortcnn Accuracy, kappa Caffe
Tino_2001 \acrshortdax, \acrshortftse100, call/put options 1991-1998 Price data * * \acrshortmm, \acrshortrnn Ewa-measure, iv, daily profits’ mean and std -
Xiong_2015 \acrshortsp500 2004-2015 Price data, 25 Google Domestic trend dimensions - 1d \acrshortlstm \acrshortmape, \acrshortrmse -
Zhou_2018_b \acrshortcsi 300, 28 words of the daily search volume based on Baidu 2006-2017 Price data and text 5d 5d \acrshortlstm \acrshortmse, \acrshortmape Python, Keras
Kim_2018 \acrshortkospi200, Korea Treasury Bond interest rate, AA-grade corporate bond interest rate, gold, crude oil 2001-2011 Price data 22d 1d \acrshortlstm + \acrshortgarch \acrshortmae, \acrshortmse, \acrshorthmae, \acrshorthmse -
Nikolaev_2013 DEM/GBP exchange rate - Returns - - \acrshortrmdn-garch \acrshortnmse, \acrshortnmae, \acrshorthr, \acrshortwhr -
Psaradellis_2016 \acrshortvix, \acrshortvxn, \acrshortvxd 2002-2014 First five autoregressive lags 5d 1d, 22d \acrshorthar-gasvr \acrshortmae, \acrshortrmse -

4.5 Bond Price Forecasting

Some financial experts follow the changes in the bond prices to analyze the state of the economy, claiming bond prices represent the health of the economy better than the stock market Harvey_1989. Historically, long term rates are higher than the short term rates under normal economic expansion times, whereas just before recessions short term rates pass the long term rates, i.e. the inverted yield curve. Hence, accurate bond price prediction is very useful. However, \glsdl implementations for bond price prediction is very scarce. In one study bianchi_2018, excess bond return was predicted using several \glsml models including \glsrf, \glsae and \glspca network and a 2-3-4-layer \glsdfnn. 4 layer \glsnn outperformed the other models.

4.6 Forex Price Forecasting

Foreign exchange market has the highest volume among all existing financial markets in the world. It is open 24/7 and trillions of dollars worth of foreign exhange transactions happen in a single day. According to the Bank for International Settlements, foreign-exchange trading had a volume of more than 5 trillion USD a day Venketas_2019. In addition, there are a large number of online forex trading platforms that provide leveraged transaction opportunities to their subscribers. As a result, there is a huge interest for profitable trading strategies by traders. Hence, there were a number of forex forecasting and trading studies that were based on \glsdl models. Since most of the global financial transactions were based on US Dollar, almost all forex prediction research papers include USD in their analyses. However, depending on regional differences and intended research focus, various models were developed accordingly.

In the literature, there were different methods that were used for forex price forecasting. \glsrnn, \glslstm, \glscnn, \glsdbn, \glsdnn, \glsae, \glsmlp methods were shown as some of these methods. Table 8 provides details about these implementations. In Table 8, different methods/models are listed as four sub-groups: \glscdbn, \glsdbn, \glsdbn+\glsrbm, and \glsae models; \glsdnn, \glsrnn, \glspsn, and \glslstm models; \glscnn models; hybrid models.

\gls

cdbn, \glsdbn, \glsdbn+\glsrbm, and \glsae models were used in some of the studies. In Zhang_2014, Fuzzy information granulation integrated with \glscdbn was applied for predicting EUR/USD and GBU/USD exchange rates. They extended \glsdbn with \glscrbm to improve the performance. In Chao_2011, weekly GBP/USD and INR/USD prices were predicted, whereas in Zheng_2017, CNY/USD and INR/USD was the main focus. In both cases, \glsdbn was compared with \glsffnn. Similarly, the authors in Shen_2015 implemented several different \glsdbn networks to predict weekly GBP/USD, BRL/USD and INR/USD exchange rate returns. The researchers in Shen_2016 combined Stacked \glsae and \glssvr for predicting 28 normalized currency pairs using the time series data of (USD, GBP, EUR, JPY, AUD, CAD, CHF).

\gls

dnn, \glsrnn, \glspsn, and \glslstm models were preferred in some of the researches. In Dixon_2016, multiple \glsdmlp models were developed for predicting AD and BP futures using 5-minute data in a 130 day period. The authors of Sermpinis_2012_a used \glsmlp, \glsrnn, \glsgp and other \glsml techniques along with traditional regression methods for also predicting EUR/USD time series. They also integrated Kalman filter, LASSO operator and other models to further improve the results in Sermpinis_2012. They further extended their analyses by including \glspsn and providing comparisons along with traditional forecasters like \glsarima, RW and STAR Sermpinis_2014. To improve the performance they also integrated hybrid time-varying volatility leverage. In SUN_2009, the authors implemented RMB exchange rate forecasting against JPY, HKB, EUR and USD by comparing \glsrw, \glsrnn and \glsffnn performances. In Maknickien__2013, the authors predicted various Forex time series and created portfolios consisted of these investments. Each network used \glslstm (\glsrnn EVOLINO) and different risk appetites for users have been tested. The authors of Maknickiene_2014 also used EVOLINO RNN + orthogonal input data for predicting USD/JPY and XAU/USD prices for different periods.

Different \glscnn models were used in some of the studies. In persio_2016, EUR/USD was once again forecasted using multiple \glsdl models including \glsmlp, \glscnn, \glsrnn and Wavelet+\glscnn. The authors of Korczak_2017 implemented forex trading (GBP/PLN) using several different input parameters on a multi-agent based trading environment. One of the agents was using \glsae+\glscnn as the prediction model and outperformed all other models.

Hybrid models were also adapted in some of the researches. The authors of Bildirici_2010 developed several (TAR-VEC-RHE) models for predicting monthly returns for TRY/USD and compared model performances. In Nikolaev_2013, the authors compared several models including traditional forecasting models and \glsdl models for DEM/GBP prediction. The authors in Parida_2016 predicted AUD, CHF, MAX and BRL against USD currency time series data using LRNFIS and compared it with different models. Meanwhile, instead of using LMS based error minimization during the learning, they used \glsfhso.

Table 8: Forex Price Forecasting
Art. Data Set Period Feature Set Lag Horizon Method Performance Criteria Env.
Zhang_2014 EUR/USD, GBP/USD 2009-2012 Price data * 1d \acrshortcdbn-fg Profit -
Chao_2011 GBP/USD, INR/USD 1976-2003 Price data 10w 1w \acrshortdbn \acrshortrmse, \acrshortmae, \acrshortmape, \acrshortda, \acrshortpcc -
Zheng_2017 CNY/USD,INR/USD 1997-2016 Price data - 1w \acrshortdbn \acrshortmape, R-squared -
Shen_2015 GBP/USD, BRL/USD, INR/USD 1976-2003 Price data 10w 1w \acrshortdbn + \acrshortrbm \acrshortrmse, \acrshortmae, \acrshortmape, accuracy, \acrshortpcc -
Shen_2016 Combination of USD, GBP, EUR, JPY, AUD, CAD, CHF 2009-2016 Price data - - Stacked \acrshortae + \acrshortsvr \acrshortmae, \acrshortmse, \acrshortrmse Matlab
Dixon_2016 Commodity, FX future, \acrshortetf 1991-2014 Price Data 100*5min 5min \acrshortdnn \acrshortsr, capability ratio, return C++, Python
Widegren_2017 \acrshortftse100, \acrshortomx30, \acrshortsp500, Commodity, Forex 1993-2017 Technical indicators 60d 1d \acrshortdnn, \acrshortrnn Accuracy, p-value -
Sermpinis_2012_a EUR/USD 2001-2010 Close data 11d 1d \acrshortrnn and more \acrshortmae, \acrshortmape, \acrshortrmse, \acrshorttheil-u -
Sermpinis_2012 EUR/USD 2002-2010 Price data 13d 1d \acrshortrnn, \acrshortmlp, \acrshortpsn \acrshortmae, \acrshortmape, \acrshortrmse, \acrshorttheil-u -
Sermpinis_2014 EUR/USD, EUR/GBP, EUR/JPY, EUR/CHF 1999-2012 Price data 12d 1d \acrshortrnn, \acrshortmlp, \acrshortpsn \acrshortmae, \acrshortmape, \acrshortrmse, \acrshorttheil-u -
SUN_2009 RMB against USD, EUR, JPY, HKD 2006-2008 Price data 10d 1d \acrshortrnn, \acrshortann \acrshortrmse, \acrshortmae, \acrshortmse -
Maknickien__2013 EUR/USD, EUR/JPY, USD/JPY, EUR/CHF, XAU/USD, XAG/USD, QM, QG 2011-2012 Price data - - Evolino \acrshortrnn Correlation between predicted, real values -
Maknickiene_2014 USD/JPY 2009-2010 Price data, Gold - 5d EVOLINO \acrshortrnn + orthogonal input data \acrshortrmse -
persio_2016 \acrshortsp500, EUR/USD 1950-2016 Price data 30d, 30d*min 1d, 1min Wavelet+\acrshortcnn Accuracy, log-loss Keras
Korczak_2017 USD/GBP, \acrshortsp500, \acrshortftse100, oil, gold 2016 Price data - 5min \acrshortae + \acrshortcnn \acrshortsr, % volatility, avg return/trans, rate of return H2O
Bildirici_2010 \acrshortise100, TRY/USD 1987-2008 Price data - 2d, 4d, 8d, 12d, 18d \acrshorttar-\acrshortvec-\acrshortmlp, \acrshorttar-\acrshortvec-\acrshortrbf, \acrshorttar-\acrshortvec-\acrshortrhe \acrshortrmse -
Nikolaev_2013 DEM/GBP exchange rate - Returns - - \acrshortrmdn-\acrshortgarch \acrshortnmse, \acrshortnmae, \acrshorthr, \acrshortwhr -
Parida_2016 \acrshortsp500, \acrshortnikkei225, USD Exchanges 2011-2015 Price data - 1d, 5d, 7d, 10d \acrshortlrnfis with \acrshortfhso \acrshortrmse, \acrshortmape, \acrshortmae -

4.7 Cryptocurrency Price Forecasting

Since cryptocurrencies became a hot topic for discussion in the finance world, lots of studies and implementations started emerging in recent years. Most of the cryptocurrency studies were focused on price forecasting.

The rise of bitcoin from 1000 USD in January 2017 to 20,000 USD in January 2018 has attracted a lot of attention not only from the financial world, but also from ordinary people on the street. Recently, some papers have been published for price prediction and trading strategy development for bitcoin and other cryptocurrencies. Given the attention that the underlying technology has attracted, there is a great chance that some new studies will start appearing in the near future.

In the literature, \glsdnn, \glslstm, \glsgru, \glsrnn, Classical methods (\glsarma, \glsarima, \glsarch, \glsgarch, etc) were used for cryptocurrency price forecasting. Table 9 tabulates the studies that utilize these methods. In Lopes_2018_thesis, the author combined the opinion market and price prediction for cryptocurrency trading. Text mining combined with 2 models \glscnn and \glslstm were used to extract the opinion. Bitcoin, Litecoin, StockTwits were used as the dataset. \glsochlv of prices, technical indicators, and sentiment analysis were used as the feature set. In McNally_2018, the authors compared Bayesian optimized \glsrnn, \glslstm and \glsarima to predict bitcoin price direction. Sensitivity, specificity, precision, accuracy, RMSE were used as the performance metrics.

Table 9: Cryptocurrency Price Prediction
Art. Data Set Period Feature Set Lag Horizon Method Performance Criteria Env.
Lopes_2018_thesis Bitcoin, Litecoin, StockTwits 2015-2018 \acrshortochlv, technical indicators, sentiment analysis - 30min, 4h, 1d \acrshortcnn, \acrshortlstm, State Frequency Model \acrshortmse Keras, Tensorflow
McNally_2018 Bitcoin 2013-2016 Price data 100d 30d Bayesian optimized \acrshortrnn, \acrshortlstm Sensitivity, specificity, precision, accuracy, \acrshortrmse Keras, Python, Hyperas

4.8 Trend Forecasting

Even though trend forecasting and price forecasting share the same input characteristics, some researchers prefer to predict the price direction of the asset instead of the actual price. This alters the nature of the problem from regression to classification and the corresponding performance metrics also change. However, it is worth to mention that these two approaches are not really different, the difference is in the interpretation of the output.

In the literature, there were different methods for trend forecasting. In this survey, we grouped the articles according to their feature set such as studies using only the raw time series data (only price data, \glsochlv); studies using technical indicators & price data & fundamental data at the same time; studies using text mining techniques and studies using other various data. Table 10 tabulates the trend forecasting using only the raw time series data.

Table 10: Trend Forecasting Using Only Raw Time Series Data
Art. Data Set Period Feature Set Lag Horizon Method Performance Criteria Env.
Das_2018_a \acrshortsp500 stock indexes 1963-2016 Price data 30d 1d \acrshortnn Accuracy, precision, recall, F1-score, \acrshortauroc R, H2o, Python, Tensorflow
Navon_2017 \acrshortspy \acrshortetf, 10 stocks from \acrshortsp500 2014-2016 Price data 60min 30min \acrshortfnn Cumulative gain MatConvNet, Matlab
Yang_2017 Shanghai composite index and \acrshortszse 1990-2016 \acrshortochlv 20d 1d Ensembles of \acrshortann Accuracy -
Saad_1998 10 stocks from \acrshortsp500 - Price data \acrshorttdnn, \acrshortrnn, \acrshortpnn Missed opportunities, false alarms ratio -
persio_2017 GOOGL stock daily price data 2012-2016 Time window of 30 days of \acrshortochlv 22d, 50d, 70d * \acrshortlstm, \acrshortgru, \acrshortrnn Accuracy, Logloss Python, Keras
Hansson_2017 \acrshortsp500, Bovespa50, \acrshortomx30 2009-2017 Autoregressive part of the price data 30d 1..15d \acrshortlstm \acrshortmse, Accuracy Tensorflow, Keras, R
Shen_2018 \acrshorthsi, \acrshortdax, \acrshortsp500 1991-2017 Price data - 1d \acrshortgru, \acrshortgru-\acrshortsvm Daily return % Python, Tensorflow
Chen_2016_d Taiwan Stock Index Futures 2001-2015 \acrshortochlv 240d 1..2d \acrshortcnn with \acrshortgaf, \acrshortmam, Candlestick Accuracy Matlab
Sezer_2019 \acrshortetf and Dow30 1997-2007 Price data \acrshortcnn with feature imaging Annualized return Keras, Tensorflow
Zhou_2019 \acrshortssec, \acrshortnasdaq, \acrshortsp500 2007-2016 Price data 20min 7min \acrshortemd2fnn \acrshortmae, \acrshortrmse, \acrshortmape -
Ausmees_2017 23 cap stocks from the \acrshortomx30 index in Nasdaq Stockholm 2000-2017 Price data and returns 30d * \acrshortdbn \acrshortmae Python, Theano

Different methods and models were used for trend forecasting. In Table 10, these are divided into three sub-groups: \glsann, \glsdnn, and \glsffnn models; \glslstm, \glsrnn, and Probabilistic \glsnn models; novel methods. \glsann, \glsdnn, \glsdfnn, and \glsffnn methods were used in some of the studies. In Das_2018_a, \glsnn with the price data were used for prediction of the trend of \glssp500 stock indices. The authors of Navon_2017 combined deep \glsfnn with a selective trading strategy unit to predict the next price. The authors of Yang_2017 created an ensemble network of several Backpropagation and \glsadam models for trend prediction.

In the literature, \glslstm, \glsrnn, \glspnn methods with the raw time series data were also used for trend forecasting. In Saad_1998, the authors compared \glstdnn, \glsrnn and \glspnn for trend detection using 10 stocks from \glssp500. The authors of persio_2017 compared 3 different \glsrnn models (basic \glsrnn, \glslstm, \glsgru) to predict the movement of Google stock price. The authors of Hansson_2017 used \glslstm (and other classical forecasting techniques) to predict the trend of the stocks prices. In Shen_2018, \glsgru and \glsgru-\glssvm models were used for the trend of \glshsi, \glsdax, \glssp500 indices.

There were also novel methods that used only the raw time series price/index data in the literature. The author of Chen_2016_d proposed a method that used \glscnn with \glsgaf, \glsmam, Candlestick with converted image data. In Sezer_2019, a novel method, \glscnn with feature imaging was proposed for the prediction of the buy/sell/hold positions of the \glspletf’ prices and Dow30 stocks’ prices. The authors of Zhou_2019 proposed a method that uses \glsemd2fnn models to forecast the stock close prices’ direction accurately. In Ausmees_2017, \glsdbn with the price data were used for the prediction of the trend of 23 large cap stocks from the \glsomx30 index.

Table 11: Trend Forecasting Using Technical Indicators & Price Data & Fundamental Data
Art. Data Set Period Feature Set Lag Horizon Method Performance Criteria Env.
Raza_2017 \acrshortkse100 index - Price data, several fundamental data - - \acrshortann, \acrshortslp, \acrshortmlp, \acrshortrbf, \acrshortdbn, \acrshortsvm Accuracy -
Sezer_2017 Stocks in Dow30 1997-2017 \acrshortrsi (Technical Indicators) 200d 1d \acrshortdmlp with genetic algorithm Annualized return Spark MLlib, Java
Liang_2017 \acrshortsse Composite Index, \acrshortftse100, PingAnBank 1999-2016 Technical indicators, \acrshortochlv price 24d 1d \acrshortrbm Accuracy -
Troiano_2018 Dow30 stocks 2012-2016 Price data, several technical indicators 40d - \acrshortlstm Accuracy Python, Keras, Tensorflow, \acrshorttalib
Nelson_2017 Stock price from \acrshortibovespa index 2008-2015 Technical indicators, \acrshortochlv of price - 15min \acrshortlstm Accuracy, Precision, Recall, F1-score, % return, Maximum drawdown Keras
song_2018 20 stocks from \acrshortnasdaq and \acrshortnyse 2010-2017 Price data, technical indicators 5d 1d \acrshortlstm, \acrshortgru, \acrshortsvm, \acrshortxgboost Accuracy Keras, Tensorflow, Python
Gudelek_2017 17 \acrshortetf 2000-2016 Price data, technical indicators 28d 1d \acrshortcnn Accuracy, \acrshortmse, Profit, \acrshortauroc Keras, Tensorflow
Sezer_2018 Stocks in Dow30 and 9 Top Volume \acrshortetf 1997-2017 Price data, technical indicators 20d 1d \acrshortcnn with feature imaging Recall, precision, F1-score, annualized return Python, Keras, Tensorflow, Java
Gunduz_2017 Borsa Istanbul 100 Stocks 2011-2015 75 technical indicators, \acrshortochlv of price - 1h \acrshortcnn Accuracy Keras

In the literature, some of the studies used technical indicators & price data & fundamental data at the same time. Table 11 tabulates the trend forecasting papers using technical indicators, price data, fundamental data. In addition, these studies are clustered into three sub-groups: \glsann, \glsmlp, \glsdbn, and \glsrbm models; \glslstm and \glsgru models; novel methods. \glsann, \glsmlp, \glsdbn, and \glsrbm methods were used with technical indicators, price data and fundamental data in some of the studies. In Raza_2017, several classical, \glsml models and \glsdbn were compared for trend forecasting. In Sezer_2017, technical analysis indicator’s (\glsrsi) buy & sell limits were optimized with \glsga which was used for buy-sell signals. After optimization, \glsdmlp was also used for function approximation. The authors of Liang_2017 used technical analysis parameters, \glsochlv of prices and \glsrbm for stock trend prediction.

Besides, \glslstm and \glsgru methods with technical indicators & price data & fundamental data were also used in some of the papers. In Troiano_2018, the crossover and \glsmacd signals were used to predict the trend of the Dow 30 stocks prices. The authors of Nelson_2017 used \glslstm for stock price movement estimation. The author of song_2018 used stock prices, technical analysis features and four different \glsml Models (\glslstm, \glsgru, \glssvm and \glsxgboost) to predict the trend of the stocks prices.

In addition, there were also novel and new methods that used \glscnn with the price data and technical indicators. The authors of Gudelek_2017 converted the time series of price data to 2-dimensional images using technical analysis and classified them with deep \glscnn. Similarly, the authors of Sezer_2018 also proposed a novel technique that converted financial time series data that consisted of technical analysis indicator outputs to 2-dimensional images and classified these images using \glscnn to determine the trading signals. The authors of Gunduz_2017 proposed a method that used \glscnn with correlated features combined together to predict the trend of the stocks prices.

Besides, there were also studies that used text mining techniques in the literature. Table 12 tabulates the trend forecasting papers using text mining techniques. Different methods/models are represented within four sub-groups in that table: \glsdnn, \glsdmlp, and \glscnn with text mining models; \glsgru model; \glslstm, \glscnn, and \glslstm+\glscnn models; novel methods. In the first group of studies, \glsdnn, \glsdmlp, \glscnn with text mining were used for trend forecasting. In Huang_2016, the authors used different models that included \glshmm, \glsdmlp and \glscnn using Twitter moods to predict the next days’ move. In Peng_2016, the authors used the combination of text mining and word embeddings to extract information from financial news and \glsdnn model for prediction of the stock trends.

Moreover, \glsgru methods with text mining techniques were also used for trend forecasting. The authors of Huynh_2017 used financial news from Reuters, Bloomberg and stock prices data and \glsbi-gru model to predict the stock movements in the future. The authors of Dang_2018 used Stock2Vec and \glstgru models to generate input data from financial news and stock prices. Then, they used the sign difference between the previous close and next open for the classification of the stock prices. The results were better than the state-of-the-art models.

\gls

lstm, \glscnn and \glslstm+\glscnn models were also used for trend forecasting. The authors of Verma_2017 combined news data with financial data to classify the stock price movement and assessed them with certain factors. They used \glslstm model as the \glsnn architecture. The authors of Pinheiro_2017 proposed a novel method that used character-based neural language model using financial news and \glslstm for trend prediction. In Prosky_2017, sentiment/mood prediction and price prediction based on sentiment, price prediction with text mining and \glsdl models (\glslstm, \glsnn, \glscnn) were used for trend forecasting. The authors of Liu_2018 proposed a method that used two separate \glslstm networks to construct an ensemble network. One of the \glslstm models was used for word embeddings with word2Vec to create a matrix information as input to \glscnn. The other one was used for price prediction using technical analysis features and stock prices.

In the literature, there were also novel and different methods to predict the trend of the time series data. In Yoshihara_2014, the authors proposed a novel method that uses a combination of \glsrbm, \glsdbn and word embedding to create word vectors for \glsrnn-\glsrbm-\glsdbn network to predict the trend of stock prices. The authors of Shi_2018 proposed a novel method (called DeepClue) that visually interpretted text-based \glsdl models in predicting stock price movements. In their proposed method, financial news, charts and social media tweets were used together to predict the stock price movement. The authors of Zhang_2018 proposed a method that performed information fusion from several news and social media sources to predict the trend of the stocks. The authors of Hu_2018 proposed a novel method that used text mining techniques and Hybrid Attention Networks based on financial news for the forecast of the trend of stocks. The authors of Wang_2018_a combined technical analysis and sentiment analysis of social media (related financial topics) and created \glsdrse method for classification. The authors of MATSUBARA_2018 proposed a method that used \glsdgm with news articles using Paragraph Vector algorithm to create the input vector for the prediction of the trend of stocks. The authors of Li_2018 implemented intraday stock price direction classification using financial news and stocks prices.

Table 12: Trend Forecasting Using Text Mining Techniques
Art. Data Set Period Feature Set Lag Horizon Method Performance Criteria Env.
Huang_2016 \acrshortsp500, \acrshortnyse Composite, \acrshortdjia, \acrshortnasdaq Composite 2009-2011 Twitter moods, index data 7d 1d \acrshortdnn, \acrshortcnn Error rate Keras, Theano
Peng_2016 News from Reuters and Bloomberg, Historical stock security data 2006-2013 News, price data 5d 1d \acrshortdnn Accuracy -
Huynh_2017 News from Reuters, Bloomberg 2006-2013 Financial news, price data - 1d, 2d, 5d, 7d \acrshortbi-gru Accuracy Python, Keras
Dang_2018 News about Apple, Airbus, Amazon from Reuters, Bloomberg, \acrshortsp500 stock prices 2006-2013 Price data, news, technical indicators - - Two-stream \acrshortgru, stock2vec Accuracy, precision, \acrshortauroc Keras, Python
Verma_2017 \acrshortnifty50 Index, \acrshortnifty Bank/Auto/IT/Energy Index, News 2013-2017 Index data, news 1d, 2d, 5d 1d \acrshortlstm \acrshortmcc, Accuracy -
Pinheiro_2017 News from Reuters, Bloomberg, stock price/index data from \acrshortsp500 2006-2013 News and sentences - 1h, 1d \acrshortlstm Accuracy -
Prosky_2017 30 \acrshortdjia stocks, \acrshortsp500, \acrshortdji, news from Reuters 2002-2016 Price data and features from news articles 1m 1d \acrshortlstm, \acrshortnn, \acrshortcnn and word2vec Accuracy VADER
Liu_2018 APPL from \acrshortsp500 and news from Reuters 2011-2017 News, \acrshortochlv, Technical indicators - 1d \acrshortcnn + \acrshortlstm, \acrshortcnn+\acrshortsvm Accuracy, F1-score Tensorflow
Yoshihara_2014 News, Nikkei Stock Average and 10-Nikkei companies 1999-2008 News, \acrshortmacd - 1d \acrshortrnn, \acrshortrbm+\acrshortdbn Accuracy, P-value -
Shi_2018 News from Reuters and Bloomberg for \acrshortsp500 stocks 2006-2015 Financial news, price data 1d 1d DeepClue Accuracy Dynet software
Zhang_2018 Price data, index data, news, social media data 2015 Price data, news from articles and social media 1d 1d Coupled matrix and tensor Accuracy, \acrshortmcc Jieba
Hu_2018 News and Chinese stock data 2014-2017 Selected words in a news 10d 1d \acrshorthan Accuracy, Annual return -
Wang_2018_a Sina Weibo, Stock market records 2012-2015 Technical indicators, sentences - - DRSE F1-score, precision, recall, accuracy, \acrshortauroc Python
MATSUBARA_2018 Nikkei225, \acrshortsp500, news from Reuters and Bloomberg 2001-2013 Price data and news 1d 1d \acrshortdgm Accuracy, \acrshortmcc, %profit -
Li_2018 News, stock prices from Hong Kong Stock Exchange 2001 Price data and \acrshorttfidf from news 60min (1..6)*5min \acrshortelm, \acrshortdlr, \acrshortpca, \acrshortbelm, \acrshortkelm, \acrshortnn Accuracy Matlab

Moreover, there were also studies that used different data variations in the literature. Table 13 tabulates the trend forecasting papers using these various data clustered into two sub-groups: \glslstm, \glsrnn, \glsgru models; \glscnn model.

\gls

lstm, \glsrnn, \glsgru methods with various data representations were used in some trend forecasting papers. In Tsantekidis_2017, the authors used the limit order book time series data and \glslstm method for trend prediction. The authors of Sirignano_2018 proposed a novel method that used limit order book flow and history information for the determination of the stock movements using \glslstm. The results of the proposed method were remarkably stationary. The authors of Chen_2018_e used social media news, \glslda features and \glsrnn model to predict the trend of the index price. The authors of Buczkowski_2017 proposed a novel method that used expert recommendations (Buy, Hold or Sell), emsemble of \glsgru and \glslstm to predict the trend of the stocks prices.

\gls

cnn models with different data representations were also used for trend prediction. In Tsantekidis_2017_a, the authors used the last 100 entries from the limit order book to create images for the stock price prediction using \glscnn. Using the limit order book data to create 2D matrix-like format with \glscnn for predicting directional movement was innovative. In Doering_2017, \glshft microstructures forecasting with \glscnn was implemented.

Table 13: Trend Forecasting Using Various Data
Art. Data Set Period Feature Set Lag Horizon Method Performance Criteria Env.
Tsantekidis_2017 Nasdaq Nordic (Kesko Oyj, Outokumpu Oyj, Sampo, Rautaruukki, Wartsila Oyj) 2010 Price and volume data in \acrshortlob 100s 10s, 20s, 50s \acrshortlstm Precision, Recall, F1-score, Cohen’s k -
Sirignano_2018 High-frequency record of all orders 2014-2017 Price data, record of all orders, transactions 2h - \acrshortlstm Accuracy -
Chen_2018_e Chinese, The Shanghai-Shenzhen 300 Stock Index (\acrshorths300 2015-2017 Social media news (Sina Weibo), price data 1d 1d \acrshortrnn-Boost with \acrshortlda Accuracy, \acrshortmae, \acrshortmape, \acrshortrmse Python, Scikit learn
Buczkowski_2017 ISMIS 2017 Data Mining Competition dataset - Expert identifier, class predicted by expert - - \acrshortlstm + \acrshortgru + \acrshortfcnn Accuracy -
Tsantekidis_2017_a Nasdaq Nordic (Kesko Oyj, Outokumpu Oyj, Sampo, Rautaruukki, Wartsila Oyj) 2010 Price, Volume data, 10 orders of the \acrshortlob - - \acrshortcnn Precision, Recall, F1-score, Cohen’s k Theano, Scikit learn, Python
Doering_2017 London Stock Exchange 2007-2008 Limit order book state, trades, buy/sell orders, order deletions - - \acrshortcnn Accuracy, kappa Caffe

5 Current Snaphot of The Field

Figure 5: The histogram of Publication Count in Topics

After reviewing through all the research papers specifically targeted for financial time series forecasting implementations using \glsdl models, we are now ready to provide some overall statistics about the current state of the studies. The number of papers that we were able to locate to be included in our survey was 140. We categorized the papers according to their forecasted asset type. Furthermore, we also analyzed the studies through their \glsdl model choices, frameworks for the development environment, data sets, comparable benchmarks, and some other differentiating criteria like feature sets, number of citations, etc. which we were not able to include in the paper due to space constraints. We will now summarize our notable observations to provide important highlights for the interested researchers within the field.

Figure 6: The rate of Publication Count in Topics
Figure 7: Topic-Model Heatmap
Figure 8: The histogram of Publication Count in Years

Figure 5 presents the various asset types that the researchers decided to develop their corresponding forecasting models for. As expected, stock market-related prediction studies dominate the field. Stock price forecasting, trend forecasting and index forecasting were the top three picks for the financial time series forecasting research. So far, 46 papers were published for stock price forecasting, 38 for trend forecasting and 33 for index forecasting, respectively. These studies constitute more than 70% of all studies indicating high interest. Following those include 19 papers for forex prediction and 7 papers for volatility forecasting. Meanwhile cryptocurrency forecasting has started attracting researchers, however, there were just 3 papers published yet, but this number is expected to increase in coming years Fischer_2019. Figure 6 highlights the rate of publication counts for various implementation areas throughout the years. Meanwhile Figure 7 provides more details about the choice of DL models over various implementation areas.

Figure 9: The histogram of Publication Count in Publication Types

Figure 8 illustrates the accelerating appetite in the last 3 years by researchers for developing \glsdl models for the financial time series implementations. Meanwhile, as Figure 9 indicates, most of the studies were published in journals (57 of them) and conferences (49 papers) even though a considerable amount of arXiv papers (11) and graduate theses (6) also exist.

One of the most important questions for a researcher is where he/she can publish their research findings. During our review of the papers, we also carefully investigated where each paper was published. We tabulated our results for the top journals for financial time series forecasting in Fig 10. According to these results, the journals with the most published papers include Expert Systems with Applications, Neurocomputing, Applied Soft Computing, The Journal of Supercomputing, Decision Support Systems, Knowledge-based Systems, European Journal of Operational Research and IEEE Access. The interested researchers should also consider the trend within the last 3 years, as tendencies can be slightly varying depending on the particular implementation areas.

Carefully analyzing Figure 11 clearly validates the dominance of \glsrnn based models (65 papers) among all others for \glsdl model choices, followed by \glsdmlp (23 papers) and \glscnn (20 papers). The inner-circle represents all years considered, meanwhile the outer circle just provides the studies within the last 3 years. We should note that \glsrnn is a general model with several versions including \glslstm, \glsgru, etc. Within \glsrnn, the researchers mostly prefer \glslstm due to its relative easiness of model development phase, however, other types of \glsrnn are also common. Figure 12 provides a snapshot of the \glsrnn model distribution. As mentioned above, \glslstm had the highest interest among all with 58 papers, while Vanilla \glsrnn and \glsgru had 27 and 10 papers respectively. Hence, it is clear that \glslstm was the most popular \glsdl model for financial time series forecasting or regression studies.

Figure 10: Top Journals - corresponding numbers next to the bar graph are representing the impact factor of the journals
Figure 11: The Piechart of Publication Count in Model Types

Meanwhile, \glsdmlp and \glscnn generally were preferred for classification problems. Since the time series data generally consists of temporal components, some data preprocessing might be required before the actual classification can occur. Hence, a lot of these implementations utilize feature extraction, selection techniques along with possible dimensionality reduction methods. A lot of researchers decided to use \glsdmlp mostly due to the fact that its shallow version \glsmlp has been used extensively before and has a proven successful track record for many different financial applications including financial time series forecasting. Consistent with our observations, \glsdmlp was also mostly preferred in the stock, index or in particular trend forecasting, since it is by definition, a classification problem with two (uptrend or downtrend) and three (uptrend, stationary or downtrend) class instances.

Figure 12: Distribution of RNN Models

In addition to \glsdmlp, \glscnn was also a popular choice for classification type financial time series forecasting implementations. Most of these studies appeared within the last 3 years. As mentioned before, in order to convert the temporal time-varying sequential data into a more stationary classifiable form, some preprocessing might be necessary. Even though some 1-D representations exist, the 2-D implementation for \glscnn was more common, mostly inherited through image recognition applications of \glscnn from computer vision implementations. In some studies Chen_2016_d; Sezer_2019; Sezer_2017; Sezer_2018; Tsantekidis_2017_a, innovative transformations of financial time series data into an image-like representation has been adapted and impressive performance results have been achieved. As a result, \glscnn might increase its share of interest for financial time series forecasting in the next few years.

Figure 13: The Preferred Development Environments

As one final note, Figure 13 shows which frameworks and platforms the researchers and developers used while implementing their work. We tried to extract this information from the papers to the best of our effort. However, we need to keep in mind that not every publication provided their development environment. Also in most of the papers, generally, the details were not given preventing us from a more thorough comparison chart, i.e. some researchers claimed they used Python, but no further information was given, while some others mentioned the use of Keras or TensorFlow providing more details. Also, within the “Other" section the usage of Pytorch is on the rise in the last year or so, even though it is not visible from the chart. Regardless, Python-related tools were the most influential technologies behind the implementations covered in this survey.

6 Discussion and Open Issues

From an application perspective, even though financial time series forecasting has a relatively narrow focus, i.e. the implementations were mainly based on price or trend prediction, depending on the underlying \glsdl model, very different and versatile models exist in literature. We need to keep in mind that, even though financial time series forecasting is a subset of time-series studies, due to the embedded profit-making expectations through successful prediction models, some differences exist, such that higher prediction accuracy sometimes might not reflect a profitable model. Hence, the risk and reward structure must also be taken into consideration. At this point, we will try to elaborate on our observations about these differences in various model designs and implementations.

6.1 DL Models for financial time series forecasting

According to the publication statistics, \glslstm was the preferred choice of most researchers for financial time series forecasting. \glslstm and its variations utilized the time-varying data with feedback embedded representations, resulting in higher performances for time series prediction implementations. Since most of the financial data, one way or another, included time-dependent components, \glslstm was the natural choice in financial time series forecasting problems. Meanwhile, \glslstm is a special \glsdl model deriven from a more general classifier family, namely \glsrnn.

Careful analysis of Figure 11 illustrates the dominance of \glsrnn (which is highly consisted of \glslstm). As a matter of fact, more than half of the published papers for time series forecasting studies fall into the \glsrnn model category. Regardless of its problem type, price or trend prediction, the ordinal nature of the data representation forced the researchers to consider \glsrnn, \glsgru and \glslstm as viable preferences for their model choices. Hence, \glsrnn models were chosen, at least for benchmarking, in a lot of studies for performance comparison against other developed models.

Meanwhile, other models were also used for time series forecasting problems. Among those, \glsdmlp had the most interest due to the market dominance of its shallow cousin, \glsmlp and its wide acceptance and long history within \glsml society. However, there is a fundamental difference in how \glsdmlp and \glsrnn based models were used for financial time series prediction problems.

\gls

dmlp fits well for both regression and classification problems. However, in general, data order independence must be preserved for better utilizing the internal working dynamics of such networks, even though through the learning algorithm configuration, some adjustments can be performed. In most cases, either trend components of the data need to be removed from the underlying time series, or some data transformations might be needed so that the resulting data becomes stationary. Regardless, some careful preprocessing might be necessary for the \glsdmlp model to be successful. In contrast, \glsrnn based models can directly work with time-varying data, making it easier for researchers to develop \glsdl models.

As a result, most of the \glsdmlp implementations had embedded data preprocessing before the learning stage. However, this inconvenience did not prevent the researchers to use \glsdmlp and its variations during their model development process. Instead, a lot of versatile data representations were attempted in order to achieve higher overall prediction performances. A combination of fundamental and/or technical analysis parameters along with other features like financial sentiment through text mining was embedded into such models. In most of the \glsdmlp studies, the corresponding problem was treated as classification, especially in trend prediction models, whereas \glsrnn based models directly predicted the next value of the time series. Both approaches had some success in beating the underlying benchmark; hence it is not possible to claim victory of one model type over the other. However, for the general rule of thumb, researchers prefer \glsrnn based models for time series regression and \glsdmlp for trend classification (or buy-sell point identification)

Another model that started becoming popular recently is \glscnn. \glscnn also works better for classification problems and unlike \glsrnn based models, it is more suitable for either non-time varying or static data representations. The comments for \glsdmlp are also mostly valid for \glscnn. Furthermore, unlike \glsdmlp, \glscnn mostly requires locality within the data representation for better-performing classification results. One particular implementation area of \glscnn is image-based object recognition problems. In recent years, \glscnn based models dominated this field, handily outperforming all other models. Meanwhile, most financial data is time-varying and it might not be easy to implement \glscnn directly for financial applications. However, in some recent studies, various independent research groups followed an innovative transformation of 1-D time-varying financial data into 2-D mostly stationary image-like data so that they could utilize the power of \glscnn through adaptive filtering and implicit dimensionality reduction. Hence, with that approach, they were able to come up with successful models.

There is also a rising trend to use deep \glsrl based financial algorithmic trading implementations; these are mostly associated with various agent-based models where different agents interact and learn from their interactions. This field even has more opportunities to offer with advancements in financial sentiment analysis through text mining to capture investor psychology; as a result, behavioral finance can benefit from these particular studies associated with \glsrl based learning models coupled with agent-based studies.

Other models including \glsdbn, \glsae and \glsrbm also were used by several researchers and superior performances were reported in some of their work; but the interested readers need to check these studies case by case to see how they were modelled both from the data representation and learning point of view.

6.2 Discussions on Selected Features

Regardless of the underlying forecasting problem, somehow the raw time series data was almost always embedded directly or indirectly within the feature vector, which is particularly valid for \glsrnn-based models. However, in most of the other model types, other features were also included. Fundamental analysis and technical analysis features were among the most favorable choices for stock/index forecasting studies.

Meanwhile, in recent years, financial text mining is particularly getting more attention, mostly for extracting the investor/trader sentiment. The streaming flow of financial news, tweets, statements, blogs allowed the researchers to build better and more versatile prediction and evaluation models integrating numerical and textual data. The general methodology involves in extracting financial sentiment analysis through text mining and combining that information with fundamental/technical analysis data to achieve better overall performance. It is logical to assume that this trend will continue with the integration of more advanced text and \glsnlp techniques.

6.3 Discussions on Forecasted Asset Types

Even though forex price forecasting is always popular among the researchers and practitioners, stock/index forecasting has always had the most interest among all asset groups. Regardless, price/trend prediction and algo-trading models were mostly embedded with these prediction studies.

These days, one other hot area to financial time series forecasting research is involved with cryptocurrencies. Cryptocurrency price prediction has an increasing demand from the financial community. Since the topic is fairly new, we might see more studies and implementations coming in due to high expectations and promising rewards.

There were also a number of publications in commodity price forecasting research, in particular, the price of oil. Oil price prediction is crucial due to its tremendous effect on world economic activities and planning. Meanwhile, gold is considered a safe investment and almost every investor, at one time, considers allocating some portion of their portfolios for gold-related investments. In times of political uncertainties, a lot of people turn to gold for protecting their savings. Even though we have not encountered a noteworthy study for gold price forecasting, due to its historical importance, there might be opportunities in this area for the years to come.

6.4 Open Issues and Future Work

Despite the general motivation for financial time series forecasting remaining fairly unchanged, the means of achieving the financial goals vary depending on the choices and trade-off between the traditional techniques and newly developed models. Since our fundamental focus is on the application of \glsdl for financial time series studies, we will try to asses the current state of the research and extrapolate that into the future.

6.4.1 Model Choices for the Future

The dominance of \glsrnn-based models for price/trend prediction will probably not disappear anytime soon, mainly due to their easy adaptation to most asset forecasting problems. Meanwhile, some enhanced versions of the original \glslstm or \glsrnn models, generally integrated with hybrid learning systems started becoming more common. Readers need to check individual studies and assess their performances to see which one fits the best for their particular needs and domain requirements.

We have observed the increasing interest in 2-D \glscnn implementations of financial forecasting problems through converting the time series into an image-like data type. This innovative methodology seems to work quite satisfactorily and provides promising opportunities. More studies of this kind will probably continue in the near future.

Nowadays, new models are generated through older models via modifying or enhancing the existing models so that better performances can be achieved. Such topologies include \glsgan, Capsule networks, etc. They have been used in various non-financial studies, however, financial time series forecasting has not been investigated for those models yet. As such, there can be exciting opportunities both from research and practical point of view.

Another \glsdl model that is not investigated thoroughly is Graph \glscnn. Graphs can be used to represent portfolios, social networks of financial communities, fundamental analysis data, etc. Even though graph algorithms can directly be applied to such configurations, different graph representations can also be implemented for the time series forecasting problems. Not much has been done on this particular topic, however, through graph representations of the time series data and implementing graph analysis algorithms, or implementing \glscnn through these graphs are among the possibilities that the researchers can choose.

As a final note for the future models, we believe deep \glsrl and agent-based models offer great opportunities for the researchers. \glshft algorithms, robo-advisory systems highly depend on automated algorithmic trading systems that can decide what to buy and when to buy without any human intervention. These aforementioned models can fit very well in such challenging environments. The rise of the machines will also lead to a technological (and algorithmic) arms race between Fintech companies and quant funds to be the best in their neverending search for “achieving alpha". New research in these areas can be just what the doctor ordered.

6.4.2 Future Projections for Financial Time Series Forecasting

Most probably, for the foreseeable future, the financial time series forecasting will have a close research cooperation with the other financial application areas like algorithmic trading and portfolio management, as it was the case before. However, changes in the available data characteristics and introduction of new asset classes might not only alter the forecasting strategies of the developers, but also force the developers to look for new or alternative techniques to better adapt to these new challenging working conditions. In addition, metrics like \glscrps for evaluating probability distributions might be included for more thorough analysis.

One rising trend, not only for financial time series forecasting, but for all intelligent decision support systems, is the human-computer interaction and \glsnlp research. Within that field, text mining and financial sentiment analysis areas are of particular importance to financial time series forecasting. Behavioral finance may benefit from the new advancements in these fields.

In order to utilize the power of text mining, researchers started developing new data representations like Stock2Vec Dang_2018 that can be useful for combining textual and numerical data for better prediction models. Furthermore, \glsnlp based ensemble models that integrate data semantics with time-series data might increase the accuracy of the existing models.

One area that can benefit a lot from the interconnected financial markets is the automated statistical arbitrage trading model development. It has been used in forex and commodity markets before. In addition, a lot of practitioners currently seek arbitrage opportunities in the cryptocurrency markets Fischer_2019, due to the existence of the huge number of coins available on various marketplaces. Price disruptions, high volatility, bid-ask spread variations cause arbitrage opportunities across different platforms. Some opportunists develop software models that can track these price anomalies for the instant materialization of profits. Also, it is possible to construct pairs trading portfolios across different asset classes using appropriate models. It is possible that \glsdl models can learn (or predict) these opportunities faster and more efficient than classical rule-based systems. This will also benefit \glshft studies that are constantly looking for faster and more efficient trading algorithms and embedded systems with minimum latency. In order to achieve that, Graphics Processing Unit (GPU) or Field Programmable Gate Array (FPGA) based hardware solutions embedded with \glsdl models can be utilized. There is a lack of research accomplished on this hardware aspect of financial time series forecasting and algorithmic trading. As long as there is enough computing power available, it is worth investigating the possibilities for better algorithms, since the rewards are high.

6.5 Responses to our Initial Research Questions

We are now ready to go back to our initially stated research questions. Our question and answer pairs, through our observations, are as follows:

  • 1.

    Which DL models are used for financial time series forecasting ?

    Response: \glsrnn based models (in particular \glslstm) are the most commonly used models. Meanwhile, \glscnn and \glsdmlp have been used extensively in classification type implementations (like trend classification) as long as appropriate data processing is applied to the raw data.

  • 2.

    How is the performance of \glsdl models compared with traditional machine learning counterparts ?

    Response: In the majority of the studies, \glsdl models were better than \glsml. However, there were also many cases where their performances were comparable. There were even two particular studies (Dezsi_2016; Sermpinis_2014 where \glsml models performed better than \glsdl models. Meanwhile, appetite for preferrance of DL implementations over ML models is growing. Advances in computing power, availability of big data, superior performance, implicit feature learning capabilities and user friendly model development environment for DL models are among the main reasons for this migration.

  • 3.

    What is the future direction for \glsdl research for financial time series forecasting ?

    Response: \glsnlp, semantics and text mining-based hybrid models ensembled with time-series data might be more common in the near future.

7 Conclusions

Financial time series forecasting has been very popular among \glsml researchers for more than 40 years. The financial community got a new boost lately with the introduction of \glsdl implementations for financial prediction research and a lot of new publications appeared accordingly. In our survey, we wanted to review the existing studies to provide a snapshot of the current research status of \glsdl implementations for financial time series forecasting. We grouped the studies according to their intended asset class along with the preferred \glsdl model associated with the problem. Our findings indicate, even though financial forecasting has a long research history, overall interest within the \glsdl community is on the rise through utilizing new \glsdl models; hence, a lot of opportunities exist for researchers.

8 Acknowledgement

This work is supported by Scientific and Technological Research Council of Turkey (TUBITAK) grant no 215E248.

\printglossaries

References