Abstract
Financial time series forecasting is, without a doubt, the top choice ofcomputational intelligence for finance researchers from both academia andfinancial industry due to its broad implementation areas and substantialimpact. Machine Learning (ML) researchers came up with various models and avast number of studies have been published accordingly. As such, a significantamount of surveys exist covering ML for financial time series forecastingstudies. Lately, Deep Learning (DL) models started appearing within the field,with results that significantly outperform traditional ML counterparts. Eventhough there is a growing interest in developing models for financial timeseries forecasting research, there is a lack of review papers that were solelyfocused on DL for finance. Hence, our motivation in this paper is to provide acomprehensive literature review on DL studies for financial time seriesforecasting implementations. We not only categorized the studies according totheir intended forecasting implementation areas, such as index, forex,commodity forecasting, but also grouped them based on their DL model choices,such as Convolutional Neural Networks (CNNs), Deep Belief Networks (DBNs),LongShort Term Memory (LSTM). We also tried to envision the future for thefield by highlighting the possible setbacks and opportunities, so theinterested researchers can benefit.
Quick Read (beta)
Financial Time Series Forecasting with Deep Learning : A Systematic Literature Review: 20052019
Abstract
Financial time series forecasting is, without a doubt, the top choice of computational intelligence for finance researchers from both academia and financial industry due to its broad implementation areas and substantial impact. \glsml researchers came up with various models and a vast number of studies have been published accordingly. As such, a significant amount of surveys exist covering \glsml for financial time series forecasting studies. Lately, \glsdl models started appearing within the field, with results that significantly outperform traditional \glsml counterparts. Even though there is a growing interest in developing models for financial time series forecasting research, there is a lack of review papers that were solely focused on \glsdl for finance. Hence, our motivation in this paper is to provide a comprehensive literature review on \glsdl studies for financial time series forecasting implementations. We not only categorized the studies according to their intended forecasting implementation areas, such as index, forex, commodity forecasting, but also grouped them based on their \glsdl model choices, such as \glsplcnn, \glspldbn, \glslstm. We also tried to envision the future for the field by highlighting the possible setbacks and opportunities, so the interested researchers can benefit.
keywords:
deep learning, finance, computational intelligence, machine learning, time series forecasting, CNN, LSTM, RNNmcolindex \makeglossaries\newacronymmlMLMachine Learning \newacronymdlDLDeep Learning \newacronymaiAIArtificial Intelligence \newacronym[plural=CNNs]cnnCNNConvolutional Neural Network \newacronym[plural=DBNs]dbnDBNDeep Belief Network \newacronymlstmLSTMLongShort Term Memory \newacronymcrpsCRPSContinuous Ranked Probability Score \newacronym[plural=RNNs]rnnRNNRecurrent Neural Network \newacronym[plural=ANNs]annANNArtificial Neural Network \newacronym[plural=DNNs]dnnDNNDeep Neural Network \newacronymmlpMLPMultilayer Perceptron \newacronym[plural=DMLPs]dmlpDMLPDeep Multilayer Perceptron \newacronym[plural=AEs]aeAEAutoencoder \newacronymhciHCIHumanComputer Interaction \newacronym[plural=GPs]gpGPGenetic Programming \newacronym[plural=GAs]gaGAGenetic Algorithm \newacronym[plural=ECs]ecECEvolutionary Computation \newacronym[plural=MOEAs]moeaMOEAMultiobjective Evolutionary Algorithm \newacronymreluReLURectified Linear Unit \newacronymsgdSGDStochastic Gradient Descent \newacronymadagradAdaGradAdaptive Gradient Algorithm \newacronymrmspropRMSPropRoot Mean Square Propagation \newacronymadamADAMAdaptive Moment Estimation \newacronymcdCDContrastive Divergence \newacronymkldivergenceKLDivergenceKullback Leibler Divergence \newacronymaisAISAnnealed Importance Sampling \newacronymrlRLReinforcement learning \newacronymdpDPDynamic Programming \newacronymmcMCMonte Carlo \newacronymtdTDTemporal Difference \newacronymdwnnDWNNDeep and Wide Neural Network \newacronymsrnnSRNNStacked Recurrent Neural Network \newacronymarmaARMAAutoregressive Moving Average \newacronymelmELMExtreme Learning Machine \newacronymgbtGBTGradient Boosted Trees \newacronymganfdGANFDGAN for minimizing Forecast error loss and Direction prediction loss \newacronymrcnnRCNNRecurrent CNN \newacronymarARAutoregressive \newacronymrfRFRandom Forest \newacronymsp500S&P500Standard’s & Poor’s 500 Index \newacronymniftyNIFTYNational Stock Exchange of India \newacronymsseSSEShanghai Stock Exchange \newacronymhsiHSIHong Kong Hang Seng Index \newacronymtaiexTAIEXTaiwan Capitalization Weighted Stock Index \newacronymdow30DOW30Dow Jones Industrial Average 30 \newacronymkospiKOSPIThe Korea Composite Stock Price Index \newacronymvxnVXNNASDAQ100 Volatility Index \newacronymbovespaBovespaBrazilian Stock Exchange \newacronymomxOMXStockholm Stock Exchange \newacronymegarchEGARCHExponential GARCH \newacronymbilstmBiLSTMBidirectional LSTM \newacronymharHARHeterogeneous Autoregressive Process \newacronymgasvrGASVR\acrshortga with a \acrshortsvr \newacronymflannFLANNFunctional Link Neural network \newacronymeaEAEvolutionary Algorithm \newacronymldaLDALatent Dirichlet Allocation \newacronymcdbnCDBNContinuousvalued Deep Belief Networks \newacronymcrbmCRBMContinuous Restricted Boltzman machine \newacronympnnPNNProbabilistic Neural Network \newacronymxgboostXGBoosteXtreme Gradient Boosting \newacronymhmmHMMHidden Markov Model \newacronymtgruTGRUTwostream GRU \newacronymhargasvrHARGASVR\acrshorthar with a \acrshortgasvr \newacronymrmdnRMDNRecurrent Mixture Density Network \newacronymrmdngarchRMDNGARCH\acrshortrmdn with a \acrshortgarch \newacronymrwRWRandom Walk \newacronymmrsMRSMarkov Regime Switching \newacronymwilliamrWilliam%RWilliams Percent Range \newacronympposcPPOSCPercentage Price Oscillator \newacronymgmlGMLGeneralized Linear Model \newacronymdfnnDFNNDeep Feedforward Neural Network \newacronymffnnFFNNFeedforward Neural Network \newacronymnnNNNeural Network \newacronymaarAARAnnual Rate of Return \newacronymacACAutocorrelation \newacronymamexAMEXAmerican Stock Exchange \newacronymareturnARActive Return \newacronymarchARCHAutoregressive Conditional Heteroscedasticity \newacronymarimaARIMAAutoregressive Integrated Moving Average \newacronymatrATRAverage True Range \newacronymaucAUCArea Under the Curve \newacronymaurocAUROCArea Under the Receiver Operating Characteristics \newacronymbaBABalanced Accuracy \newacronymbelmBELMBasic Extreme Learning Machine \newacronymbetcBETCBreak Even Transaction Cost \newacronymbistBISTIstanbul Stock Exchange Index \newacronymbigruBiGRUBidirectional Gated Recurrent Unit \newacronymbollBOLLBollinger Band \newacronymbpBPBack Propagation \newacronymbpttBPTTBack Propagation Through Time \newacronymbseBSEBombay Stock Exchange \newacronymcagrCAGRCompound Annual Growth Rate \newacronymcarCARCumulative Abnormal Return \newacronymcartCARTClassification and Regression Trees \newacronymccCCCorrelation Coefficient \newacronymcciCCICommodity Channel Index \newacronymcdaxCDAXGerman Stock Market Index Calculated by Deutsche Börse \newacronymcdbnfgCDBNFGFuzzy Granulation with Continuousvalued Deep Belief Networks \newacronymcdsCDSCredit Default Swaps \newacronymcewCEWEmerging Markets Currency Index \newacronymcganCGANConditional \acrshortgan \newacronymcmeCMEChicago Mercantile Exchange \newacronymcoefficientcoefficient \newacronymcrspCRSPCenter for Research in Security Prices \newacronymcseCSEColombo Stock Exchange \newacronymcsiCSIChina Securities Index \newacronymcwncWNConditional Wavenet \newacronymdaDADirection Accuracy \newacronymdaxDAXThe Deutscher Aktienindex \newacronymdcnnDCNNDeep Convolutional Neural Network \newacronymddpgDDPGDeep Deterministic Policy Gradient \newacronymdeDEDifferential Evolution \newacronymdeepfaspDeepFASPThe Financial Aspect and Sentiment Prediction task with Deep neural networks \newacronymdeepcnlDeepCNLDeep Coinvestment Network Learning \newacronymdffnDFFNDeep Feed Forward Network \newacronymdgmDGMDeep Neural Generative Model \newacronymdjiaDJIADow Jones Industrial Average \newacronymdlrDLRDeep Learning Representation \newacronymdmiDMIDirectional Movement Index \newacronymdofDOFDegrees of Freedom \newacronymdpaDPADirection Prediction Accuracy \newacronymdqlDQLDeep QLearning \newacronymdrlDRLDeep Reinforcement Learning \newacronymdrseDRSEDeep Random Subspace Ensembles \newacronymdtwDTWDynamic Time Warping \newacronymemaEMAExponential Moving Average \newacronymemd2fnnEMD2FNNEmpirical Mode Decomposition and Factorization Machine based Neural Network \newacronymetfETFExchangeTraded Fund \newacronymfarFARFalse Acceptance Rate \newacronymfddrFDDRFuzzy Deep Direct Reinforcement Learning \newacronymfeqarFEQARFixed Effects Quantile VAR \newacronymfiqaFiQAFinancial Opinion Mining and Question Answering Challange \newacronymfnFNFalse Negative \newacronymfnnFNNFully Connected Neural Network \newacronymfcnnFCNNFully Connected Neural Network \newacronymfpFPFalse Positive \newacronymfpeFPEAkaike’s Minimum Final Prediction Error \newacronymfpgaFPGAField Programmable Gate Array \newacronymfrrFRRFalse Rejection Rate \newacronymftseFTSELondon Financial Times Stock Exchange Index \newacronymgmeanGmeanGeometric Mean \newacronymgafGAFGramian Angular Field \newacronymganGANGenerative Adversarial Network \newacronymgarchGARCHGeneralised AutoRegressive Conditional Heteroscedasticity \newacronymgbdtGBDTGradientBoostedDecisionTrees \newacronymglmGLMGeneralized Linear Model \newacronymgpuGPUGraphic Processing Unit \newacronymgruGRUGatedRecurrent Unit \newacronymgspcGSPCS&P500 Commodity Price Index \newacronymhanHANHybrid Attention Network \newacronymhftHFTHigh Frequency Trading \newacronymhitHITHit Rate \newacronymhmrpsoHMRPSOModified Version of PSO \newacronymhsHSChina Shanghai Shenzhen Stock Index \newacronymibbIBBiShares Nasdaq Biotechnology ETF \newacronymicICInformation Coeffiencient \newacronymirIRInformation Ratio \newacronymise100ISE100Istanbul Stock Exchange Index \newacronymixicIXICNASDAQ Composite Index \newacronymkelmKELMKernel Extreme Learning Machine \newacronymksKSKolmogorov–Smirnov \newacronymlarLARLinear Autoregression Predictor \newacronymlfmLFMLookahead Factor Models \newacronymlobLOBLimit Order Book Data \newacronymlrnfisLRNFISLocally Recurrent Neurofuzzy Information System \newacronymmaMAMoving Average \newacronymmacdMACDMoving Average Convergence and Divergence \newacronymmadMADMean Absolute Deviation \newacronymmadrMADRMoving Average Deviation Rate \newacronymmaeMAEMean Absolute Error \newacronymmamMAMMoving Average Mapping \newacronymmapMAPMaximum Absolute Percentage Error \newacronymmapeMAPEMean Absolute Percentage Error \newacronymmarMARMean Abnormal Return \newacronymmaseMASEMean Standard Deviation \newacronymmccMCCMatthew Correlation Coefficient \newacronymmdaMDAMultilinear Discriminant Analysis \newacronymmddMDDMaximum Drawdown \newacronymmdpMDPMarkov Decision Process \newacronymmfiMFIMoney Flow Index \newacronymmiMIMutual Information \newacronymmodrlMODRLMultiobjective Deep Reinforcement Learning \newacronymmoeMoEMixture of Experts \newacronymmseMSEMean Squared Error \newacronymmsfeMSFEMean Squared Forecast Error \newacronymmspeMSPEMean Squared Prediction Error \newacronymmtmMTMMomentum \newacronymnarmaxNARMAXNonlinear Autoregressive Moving Average model with exogenous inputs \newacronymnasdaqNASDAQNational Association of Securities Dealers Automated Quotations \newacronymnesNESNatural Evolution Strategies \newacronymnikkeiNIKKEITokyo Nikkei Index \newacronymnlpNLPNatural Language Processing \newacronymnmaeNMAENormalized Mean Absolute Error \newacronymnmseNMSENormalized Mean Square Error \newacronymnymexNYMEXNew York Mercantile Exchange \newacronymnyseNYSENew York Stock Exchange \newacronymobvOBVOn Balance Volume \newacronymochlOCHLOpen,Close,High, Low \newacronymochlvOCHLVOpen,Close,High, Low, Volume \newacronympcaPCAPrincipal Component Analysis \newacronympccPCCPearson’s Correlation Coefficient \newacronympcdPCDPercentage of Correct Direction \newacronymplrPLRPiecewise Linear Representation \newacronympocidPOCIDPercentage of Change in Direction \newacronymppoPPOProximal Policy Optimization \newacronymprofitPROFITAverage Annual Profit of the Model \newacronympsnPSNPsiSigma Network \newacronympsoPSOParticle Swarm Optimization \newacronymrsqR${}^{2}$Squared correlation, Nonlinear regression multiple correlation \newacronymr1r1Correlation coefficient between actual value and prediction value \newacronymr2r2Correlation coefficient between actual return and prediction return \newacronymraRARolling Average \newacronymrafRAFRandom Forests \newacronymrbfRBFRadial Basis Function Neural Network \newacronymrbmRBMRestricted Boltzmann Machine \newacronymrceflannRCEFLANNRecurrent Computationally Efficient Functional Link Neural Network \newacronymrciRCIRank Correlation Index \newacronymreturnRETURNAverage Annual Returns of the Model \newacronymrmseRMSERoot Mean Square Error \newacronymrmsreRMSRERoot Mean Square Relative Error \newacronymroaROAReturn on Assets \newacronymrocROCPrice of Change \newacronymrseRSERelative Squared Error \newacronymrsiRSIRelative Strength Index \newacronymsaeSAEStacked Autoencoder \newacronymsarSARParabolic Stop and Reverse \newacronymsciSCISSE Composite Index \newacronymsdSDStandard Deviation (also referred as the Greek letter r) \newacronymsdaeSDAEStacked Denoising Autoencoders \newacronymsfmSFMState Frequency Memory \newacronymsiSIStochastic Index \newacronymslpSLPSingle Layer Perceptron \newacronymsmapeSMAPESymmetric Mean Absolute Percentage Error \newacronymsomSOMSelfOrganising Map \newacronymsrSRSharperatio \newacronymsvdSVDSingular Value Decomposition \newacronymsvmSVMSupport Vector Machine \newacronymsvrSVRSupport Vector Regressor \newacronymszseSZSEShenzhen Stock Exchange Composite Index \newacronymtalibTALIBTechnical Analysis Library Package \newacronymtarTARThreshold Autoregressive \newacronymvecVECVector Error Correction model \newacronymrheRHERecurrent Hybrid Elman \newacronymtdnnTDNNTimedelay Neural Network \newacronymtheiluTHEILUTheil’s inequality coefficient \newacronymtnTNTrue Negative \newacronymtpTPTrue Positive \newacronymtrTRTotal Return \newacronymtseTSETokyo Stock Exchange \newacronymtunindexTUNINDEXTunisian Stock Market Index \newacronymtwseTWSETaiwan Stock Exchange \newacronymuwnuWNUnconditional WaveNet \newacronymvarVARVector Auto Regression \newacronymvixVIXS&P500 Volatility Index \newacronymvrVRVariance Reduction \newacronymvwlVWLWL Kernelbased Method \newacronymvxdVXDDow Jones Industrial Average Volatility Index \newacronymwbaWBAWeighted Balanced Accuracy \newacronymwekaWEKAWaikato Environment for Knowledge Analysis \newacronymwhrWHRWeighted Hit Rate \newacronymwmtrWMTRWeighted Multichannel Timeseries Regression \newacronymwprWPRWilliam % R \newacronymwsurtWSURTWilcoxon Sumrank Test \newacronymwtWTWavelet Transforms \newacronymtrueTRUETrue Range of Price Movements \newacronymnseNSENational Stock Exchange of India \newacronymnormrmsenormRMSENormalized \acrshortrmse \newacronymtaqTAQTrade and Quote \newacronymhrHRHit Rate \newacronymstdSTDStandard Deviation \newacronymiseISEIstanbul Stock Exchange Index \newacronymgdaxGDAXGlobal Digital Asset Exchange \newacronymwtiWTIWest Texas Intermediate \newacronymmmMMMarkov Model \newacronymhmaeHMAEHeteroscedasticity Adjusted MAE \newacronymhmseHMSEHeteroscedasticity Adjusted MSE \newacronymspySPYSPDR S&P 500 ETF \newacronymssecSSECShanghai Stock Exchange Composite \newacronymkseKSEKorea Stock Exchange \newacronymibovespaIBOVESPAIndice Bolsa de Valores de Sao Paulo \newacronymdjiDJIDow Jones Index \newacronymtfidfTFIDFTerm FrequencyInverse Document Frequency \newacronymlrLRLogistic Regression \newacronymtemaTEMATriple Exponential Moving Average \newacronymbhB&HBuy and Hold \newacronymwcnWCNWavenet Convolution Network \newacronymfhsFHSFirefly Harmony Search \newacronymmanualsearchMSManual Search \newacronymgridsearchGSGrid Search \newacronymrandomsearchRSRandomSearch \newacronymsmbgoSMBGOSequential ModelBased Global Optimization \newacronymgpaGPAThe Gaussian Process Approach \newacronymtspeaTSPEATreestructured Parzen Estimator Approach \newacronymfhsoFHSOFirefly Harmony Search Optimization
1 Introduction
The finance industry has always been interested in successful prediction of financial time series data. Numerous studies have been published that were based on \glsml models with relatively better performances compared to classical time series forecasting techniques. Meanwhile, the widespread application of automated electronic trading systems coupled with increasing demand for higher yields keeps forcing the researchers and practitioners to continue working on searching for better models. Hence, new publications and implementations keep pouring into finance and computational intelligence literature.
In the last few years, \glsdl started emerging strongly as the best performing predictor class within the \glsml field in various implementation areas. Financial time series forecasting is no exception, as such, an increasing number of prediction models based on various \glsdl techniques were introduced in the appropriate conferences and journals in recent years. Despite the existence of the vast amount of survey papers covering financial time series forecasting and trading systems using traditional soft computing techniques, to the best of our knowledge, no reviews have been performed in literature for \glsdl. Hence, we decided to work on such a comprehensive study focusing on \glsdl implementations of financial time series forecasting. Our motivation is twofold such that we not only aimed at providing the stateoftheart snapshot of academic and industry perspectives of the developed \glsdl models but also pinpointing the important and distinctive characteristics of each studied model to prevent researchers and practitioners to make unsatisfactory choices during their system development phase. We also wanted to envision where the industry is heading by indicating possible future directions.
Our fundamental motivation in this paper was to come up with answers for the following research questions:

1.
Which \glsdl models are used for financial time series forecasting ?

2.
How is the performance of \glsdl models compared with traditional \glsml counterparts ?

3.
What is the future direction for \glsdl research for financial time series forecasting ?
Our focus was solely on \glsdl implementations for financial time series forecasting. For other \glsdl based financial applications such as risk assessment, portfolio management, etc., interested readers can check the recent survey paper Ozbayoglu_2019. Since we singled out financial time series prediction studies in our survey, we omitted other time series forecasting studies that were not focused on financial data. Meanwhile, we included timeseries research papers that had financial use cases or examples even though the papers themselves were not directly intended for financial time series forecasting. Also, we decided to include algorithmic trading papers that were based on financial forecasting, but ignore the ones that did not have a time series forecasting component.
We reviewed journals and conferences for our survey, however, we also included Masters and PhD theses, book chapters, arXiv papers and noteworthy technical publications that came up in web searches. We decided to only include the articles in the English language.
During our survey through the papers, we realized that most of the papers using the term “deep learning" in their description were published in the last 5 years. However, we also encountered some older studies that implemented deep models; such as \glsplrnn, JordanElman networks. However, at their time of publication, the term “deep learning" was not in common usage. So, we decided to also include those papers.
According to our findings, this will be one of the first comprehensive “financial time series forecasting" survey papers focusing on \glsdl. A lot of \glsml reviews for financial time series forecasting exist in the literature, meanwhile, we have not encountered any study on \glsdl. Hence, we wanted to fill this gap by analyzing the developed models and applications accordingly. We hope, as a result of this paper, the researchers and model developers will have a better idea of how they can implement \glsdl models for their studies.
We structured the rest of the paper as follows. Following this brief introduction, in Section 2, the existing surveys that are focused on \glsml and soft computing studies for financial time series forecasting are mentioned. In Section 3, we will cover the existing \glsdl models that are used, such as \glscnn, \glslstm, \glsdrl. Section 4 will focus on the various financial time series forecasting implementation areas using \glsdl, namely stock forecasting, index forecasting, trend forecasting, commodity forecasting, volatility forecasting, foreign exchange forecasting, cryptocurrency forecasting. In each subsection, the problem definition will be given, followed by the particular \glsdl implementations. In Section 5, overall statistical results about our findings will be presented including histograms about the yearly distribution of different subfields, models, publication types, etc. As a result, the stateoftheart snapshot for financial time series forecasting studies will be given through these statistics. At the same time, it will also show the areas that are already mature, compared against promising or new areas that still have room for improvement. Section 6 will provide discussions about what has been done through academic and industrial achievements and expectations through what might be needed in the future. The section will include highlights about the open areas that need further research. Finally, we will conclude in Section 7 by summarizing our findings.
2 Financial Time Series Forecasting with ML
Financial time series forecasting and associated applications have been studied extensively for many years. When \glsml started gaining popularity, financial prediction applications based on soft computing models also became available accordingly. Even though our focus is particularly on \glsdl implementations of financial time series prediction studies, it will be beneficial to briefly mention about the existing surveys covering \glsmlbased financial time series forecasting studies in order to gain historical perspective.
In our study, we did not include any survey papers that were focused on specific financial application areas other than forecasting studies. However, we were faced with some review publications that included not only financial timeseries studies but also other financial applications. We decided to include those papers in order to maintain the comprehensiveness of our coverage.
Examples of these aforementioned publications are provided here. There were published books on stock market forecasting Aliev_2004, trading system development Dymowa_2011, practical examples of forex and market forecasting applications Kovalerchuk_2000 using \glsml models like \glsplann, \glsplec, \glsgp and Agentbased models Brabazon_2008.
There were also some existing journal and conference surveys. Bahrammirzaee et. al. Bahrammirzaee_2010 surveyed financial prediction and planning studies along with other financial applications using various \glsai techniques like \glsann, Expert Systems, hybrid models. The authors of Zhang_2004 also compared \glsml methods in different financial applications including stock market prediction studies. In Mochn_2007, soft computing models for the market, forex prediction and trading systems were analyzed. Mullainathan and Spies Mullainathan_2017 surveyed the prediction process in general from an econometric perspective.
There were also a number of survey papers concentrated on a single particular \glsml model. Even though these papers focused on one technique, the implementation areas generally spanned various financial applications including financial time series forecasting studies. Among those soft computing methods, \glsec and \glsann had the most overall interest.
For the \glsec studies, Chen wrote a book on \glsplga and \glsgp in Computational Finance Chen_2002s. Later, \glsplmoea were extensively surveyed on various financial applications including financial time series prediction Castillo_Tapia_2007; Ponsich_2013; Aguilar_Rivera_2015. Meanwhile, Rada reviewed \glsec applications along with Expert Systems for financial investing models RADA_2008.
For the \glsann studies, Li and Ma reviewed implementations of \glsann for stock price forecasting and some other financial applications Li_2010. The authors of Tkac_2016 surveyed different implementations of \glsann in financial applications including stock price forecasting. Recently, Elmsili and Outtaj contained \glsann applications in economics and management research including economic time series forecasting in their survey Elmsili_2018.
There were also several text mining surveys focused on financial applications (which included financial time series forecasting). Mittermayer and Knolmayer compared various text mining implementations that extract market response to news for prediction Mittermayer_2006. The authors of Mitra_2012 focused on news analytics studies for prediction of abnormal returns for trading strategies in their survey. Nassirtoussi et. al. reviewed text mining studies for stock or forex market prediction Nassirtoussi_2014. The authors of Kearney_2014 also surveyed text miningbased time series forecasting and trading strategies using textual sentiment. Similarly, Kumar and Ravi Kumar_2016 reviewed text mining studies for forex and stock market prediction. Lately, Xing et. al. Xing_2017 surveyed natural languagebased financial forecasting studies.
Finally, there were applicationspecific survey papers that focused on particular financial time series forecasting implementations. Among these studies, stock market forecasting had the most interest. A number of surveys were published for stock market forecasting studies based on various soft computing methods at different times Vanstone_2003; Hajizadeh_2010; Nair_2014; Cavalcante_2016; Krollner_2010; Yoo; Preethi_2012; Atsalakis_2009. Chatterjee et. al. Chatterjee_2000 and Katarya and Mahajan Katarya_2017 concentrated on \glsannbased financial market prediction studies whereas Hu et. al. Hu_2015 focused on \glsec implementations for stock forecasting and algorithmic trading models. In a different time series forecasting application, researchers surveyed forex prediction studies using \glsann Huang_2004 and various other soft computing techniques Pradeepkumar_2018.
Even though, many surveys exist for \glsml implementations of financial time series forecasting, \glsdl has not been surveyed comprehensively so far despite the existence of various \glsdl implementations in recent years. Hence, this was our main motivation for the survey. At this point, we would like to cover the various \glsdl models used in financial time series forecasting studies.
3 Deep Learning
\glsdl is a type of \glsann that consists of multiple processing layers and enables highlevel abstraction to model data. The key advantage of \glsdl models is extracting the good features of input data automatically using a generalpurpose learning procedure. Therefore, in the literature, \glsdl models are used in lots of applications: image, speech, video, audio reconstruction, natural language understanding (particularly topic classification), sentiment analysis, question answering and language translation LeCun2015. The historical improvements on \glsdl models are surveyed in Schmidhuber_2015.
Financial time series forecasting has been very popular among \glsml researchers for more than 40 years. The financial community got a new boost lately with the introduction of \glsdl models for financial prediction research and a lot of new publications appeared accordingly. The success of \glsdl over \glsml models is the major attractive point for the finance researchers. With more financial time series data and different deep architectures, new \glsdl methods will be proposed. In our survey, we found that in the vast majority of the studies, \glsdl models were better than \glsml counterparts.
In literature, there are different kinds of \glsdl models: \glsdmlp, \glsrnn, \glslstm, \glscnn, \glsplrbm, \glsdbn, \glsae, and \glsdrl LeCun2015; Schmidhuber_2015. Throughout the literature, financial time series forecasting was mostly considered as a regression problem. However, there were also a significant number of studies, in particular trend prediction, that used classification models to tackle financial forecasting problems. In Section 4, different \glsdl implementations are provided along with their model choices.
3.1 Deep Multi Layer Perceptron (DMLP)
\glspldmlp is one of the first developed \glsplann. The difference from shallow nets is that \glsdmlp contains more layers. Even though particular model architectures might have variations depending on different problem requirements, \glsdmlp models consist of mainly three layers: input, hidden and output. The number of neurons in each layer and the number of layers are the hyperparameters of the network. In general, each neuron in the hidden layers has input ($x$), weight ($w$) and bias ($b$) terms. In addition, each neuron has a nonlinear activation function which produces a cumulative output of the preceding neurons. Equation 1 Goodfellowetal2016 illustrates an output of a single neuron in the \glsnn. There are different types of nonlinear activation functions. Most commonly used nonlinear activation functions are: sigmoid (Equation 2) Cybenko_1989, hyperbolic tangent (Equation 3) Kalman_1992, \glsrelu (Equation 4) Nair_2010, leaky\glsrelu (Equation 5) Maas_2013, swish (Equation 6) Ramachandran_2017, and softmax (Equation 7) Goodfellowetal2016. The comparison of the nonlinear activations are studied in Ramachandran_2017.
$${y}_{i}=\sigma (\sum _{i}{W}_{i}{x}_{i}+{b}_{i})$$  (1) 
$$\sigma (z)=\frac{1}{1+{e}^{z}}$$  (2) 
$$\mathrm{tanh}(z)=\frac{{e}^{z}{e}^{z}}{{e}^{z}+{e}^{z}}$$  (3) 
$$R(z)=\mathrm{max}(0,z)$$  (4) 
$$  (5) 
$$f(x)=x\sigma (\beta x)$$  (6) 
$$\mathrm{soft}\mathrm{max}({z}_{i})=\frac{\mathrm{exp}{z}_{i}}{\sum _{j}\mathrm{exp}{z}_{j}}$$  (7) 
dmlp models have been appearing in various application areas Deng_2014_App; LeCun2015 . Using a \glsdmlp model has advantages and disadvantages depending on the problem requirements. Through \glsdmlp models, problems such as regression and classification can be solved by modeling the input data Gardner_1998. However, if the number of the input features is increased (e.g. image as input), the parameter size in the network will increase accordingly due to the fully connected nature of the model and it will jeopardize the computation performance and create storage problems. To overcome this issue, different types of \glsdnn methods are proposed (such as \glscnn) LeCun2015. With \glsdmlp, much more efficient classification and regression processes are performed. In Figure 1, a \glsdmlp model, layers, neurons in layers, weights between neurons are shown.
dmlp learning stage is implemented through backpropagation. The amount of error in the neurons in the output layer is propagated back to the preceeding layers. Optimization algorithms are used to find the optimum parameters/variables of the \glsplnn. They are used to update the weights of the connections between the layers. There are different optimization algorithms that are developed: \glssgd, \glssgd with Momentum, \glsadagrad, \glsrmsprop, \glsadam Robbins_1951; Sutskever_2013; Duchi_2011; Tieleman_2012; Kingma_2014. Gradient descent is an iterative method to find optimum parameters of the function that minimizes the cost function. \glssgd is an algorithm that randomly selects a few samples instead of the whole data set for each iteration Robbins_1951. \glssgd with Momentum remembers the update in each iteration that accelerates gradient descent method Sutskever_2013. \glsadagrad is a modified \glssgd that improves convergence performance over standard \glssgd algorithm Duchi_2011. \glsrmsprop is an optimization algorithm that provides the adaptation of the learning rate for each of the parameters. In \glsrmsprop, the learning rate is divided by a running average of the magnitudes of recent gradients for that weight Tieleman_2012. \glsadam is updated version of \glsrmsprop that uses running averages of both the gradients and the second moments of the gradients. \glsadam combines advantages of the \glsrmsprop (works well in online and nonstationary settings) and \glsadagrad (works well with sparse gradients) Kingma_2014.
As shown in Figure 1, the effect of the backpropagation is transferred to the previous layers. If the effect of \glssgd is gradually lost when the effect reaches the early layers during backpropagation, this problem is called vanishing gradient problem in the literature Bengio_1994. In this case, updates between the early layers become unavailable and the learning process stops. The high number of layers in the neural network and the increasing complexity cause the vanishing gradient problem.
The important issue in the \glsdmlp are the hyperparameters of the networks and method of tuning these hyperparameters. Hyperparameters are the variables of the network that affect the network architecture, and the performance of the networks. The number of hidden layers, the number of units in each layer, regularization techniques (dropout, L1, L2), network weight initialization (zero, random, He He_2015, Xavier Glorot_2010), activation functions (Sigmoid, \glsrelu, hyperbolic tangent, etc.), learning rate, decay rate, momentum values, number of epochs, batch size (minibatch size), and optimization algorithms (\glssgd, \glsadagrad, \glsrmsprop, \glsadam, etc.) are the hyperparameters of \glsdmlp. Choosing better hyperparameter values/variables for the network result in better performance. So, finding the best hyperparameters for the network is a significant issue. In literature, there are different methods to find best hyperparameters: \glsmanualsearch, \glsgridsearch, \glsrandomsearch, Bayesian Methods (\glssmbgo, \glsgpa, \glstspea) Bergstra_2011; Bergstra_2012.
3.2 Recurrent Neural Network (RNN)
\glsrnn is another type of \glsdl network that is used for time series or sequential data, such as language and speech. \glsplrnn are also used in traditional \glsml models (\glsbptt, JordanElman networks, etc.), however, the time lengths in such models are generally less than the models used in deep \glsrnn models. Deep \glsplrnn are preferred due to their ability to include longer time periods. Unlike \glsplfnn, \glsplrnn use internal memory to process incoming inputs. \glsplrnn are used in the analysis of time series data in various fields (handwriting recognition, speech recognition, etc. As stated in the literature, \glsplrnn are good at predicting the next character in the text, language translation applications, sequential data processing Deng_2014_App; LeCun2015.
rnn model architecture consists of different number of layers and different type of units in each layer. The main difference between \glsrnn and \glsfnn is that each \glsrnn unit takes the current and previous input data at the same time. The output depends on the previous data in \glsrnn model. The \glsplrnn process input sequences one by one at any given time, during their operation. In the units on the hidden layer, they hold information about the history of the input in the “state vector". When the output of the units in the hidden layer is divided into different discrete time steps, the \glsplrnn are converted into a \glsdmlp LeCun2015. In Figure 2, the information flow in the \glsrnn’s hidden layer is divided into discrete times. The status of the node $S$ at different times of $t$ is shown as ${s}_{t}$, the input value $x$ at different times is ${x}_{t}$, and the output value $o$ at different times is shown as ${o}_{t}$. Parameter values ($U,W,V$) are always used in the same step.
rnn can be trained using the \glsbptt algorithm. Optimization algorithms (\glssgd, \glsrmsprop, \glsadam) are used for weight adjustment process. With the \glsbptt learning method, the error change at any $t$ time is reflected in the input and weights of the previous $t$ times. The difficulty of training \glsrnn is due to the fact that the \glsrnn structure has a backward dependence over time. Therefore, \glsplrnn become very complex in terms of the learning period. Although the main aim of using \glsrnn is to learn longterm dependencies, studies in the literature show that when knowledge is stored for long time periods, it is not easy to learn with \glsrnn (training difficulties on \glsrnn) Pascanu_2013. In order to solve this particular problem, \glspllstm with different structures of \glsann were developed LeCun2015. Equations 8, 9 illustrate simpler \glsrnn formulations. Equation 10 shows the total error which is the sum of each error at time step $t$^{1}^{1} 1 Richard Socher, CS224d: Deep Learning for Natural Language Processing, Lecture Notes.
$${h}_{t}=Wf({h}_{t1})+{W}^{(hx)}{x}_{[t]}$$  (8) 
$${y}_{t}={W}^{(S)}f({h}_{t})$$  (9) 
$$\frac{\partial E}{\partial W}=\sum _{t=1}^{T}\frac{\partial {E}_{t}}{\partial W}$$  (10) 
Hyperparameters of \glsrnn also define the network architecture and the performance of the network is affected by the parameter choices as was in \glsdmlp case. The number of hidden layers, the number of units in each layer, regularization techniques, network weight initialization, activation functions, learning rate, momentum values, number of epochs, batch size (minibatch size), decay rate, optimization algorithms, model of \glsrnn (Vanilla \glsrnn, \glsgru, \glslstm), sequence length for \glsrnn are the hyperparameters of \glsrnn. Finding the best hyperparameters for the network is a significant issue. In literature, there are different methods to find best hyperparameters: \glsmanualsearch, \glsgridsearch, \glsrandomsearch, Bayesian Methods (\glssmbgo, \glsgpa, \glstspea) Bergstra_2011; Bergstra_2012.
3.3 Long Short Term Memory (LSTM)
\glslstm hochreiter1997lstm is a type of \glsrnn where the network can remember both short term and long term values. \glslstm networks are the preferred choice of many \glsdl model developers when tackling complex problems like automatic speech recognition, and handwritten character recognition. \glslstm models are mostly used with timeseries data. It is used in different applications such as \glsnlp, language modeling, language translation, speech recognition, sentiment analysis, predictive analysis, financial time series analysis, etc. Wu_2016; Greff_2016. With attention modules and \glsae structures, \glslstm networks can be more successful on time series data analysis such as language translation Wu_2016.
lstm networks consist of \glslstm units. Each \glslstm unit merges to form an \glslstm layer. An \glslstm unit is composed of cells having input gate, output gate and forget gate. Three gates regulate the information flow. With these features, each cell remembers the desired values over arbitrary time intervals. Equations 1115 show the form of the forward pass of the \glslstm unit hochreiter1997lstm (${x}_{t}$: input vector to the \glslstm unit, ${f}_{t}$: forget gate’s activation vector, ${i}_{t}$: input gate’s activation vector, ${o}_{t}$: output gate’s activation vector, ${h}_{t}$: output vector of the \glslstm unit, ${c}_{t}$: cell state vector, ${\sigma}_{g}$: sigmoid function, ${\sigma}_{c}$ , ${\sigma}_{h}$: hyperbolic tangent function, $*$: elementwise (Hadamard) product, $W$ , $U$: weight matrices that need to be learned, $b$: bias vector parameters that need to be learned) Greff_2016.
$${f}_{t}={\sigma}_{g}({W}_{f}{x}_{t}+{U}_{f}{h}_{t1}+{b}_{f})$$  (11) 
$${i}_{t}={\sigma}_{g}({W}_{i}{x}_{t}+{U}_{i}{h}_{t1}+{b}_{i})$$  (12) 
$${o}_{t}={\sigma}_{g}({W}_{o}{x}_{t}+{U}_{o}{h}_{t1}+{b}_{o})$$  (13) 
$${c}_{t}={f}_{t}*{c}_{t1}+{i}_{t}*{\sigma}_{c}({W}_{c}{x}_{t}+{U}_{c}{h}_{t1}+{b}_{c})$$  (14) 
$${h}_{t}={o}_{t}*{\sigma}_{h}({c}_{t})$$  (15) 
lstm is a specialized version of \glsrnn. Therefore, the weight updates and preferred optimization methods are the same. In addition, the hyperparameters of \glslstm are just like \glsrnn: the number of hidden layers, the number of units in each layer, network weight initialization, activation functions, learning rate, momentum values, the number of epochs, batch size (minibatch size), decay rate, optimization algorithms, sequence length for \glslstm, gradient clipping , gradient normalization, and dropoutReimers_2017; Greff_2016. In order to find the best hyperparameters of \glslstm, the hyperparameter optimization methods that are used for \glsrnn are also applicable to \glslstm Bergstra_2011; Bergstra_2012.
3.4 Convolutional Neural Networks (CNNs)
\glscnn is a type of \glsdnn that consists of convolutional layers that are based on the convolutional operation. Meanwhile, \glscnn is the most common model that is frequently used for vision or image processing based classification problems (image classification, object detection, image segmentation, etc.) Ji_2012; Szegedy_2013; Long_2015. The advantage of the usage of \glscnn is the number of parameters when comparing the vanilla \glsdl models such as \glsdmlp. Filtering with kernel window function gives an advantage of image processing to \glscnn architectures with fewer parameters that are beneficial for computing and storage. In \glscnn architectures, there are different layers: convolutional, maxpooling, dropout and fully connected \glsmlp layer. The convolutional layer consists of the convolution (filtering) operation. Basic convolution operation is shown in Equation 16 ($t$ denotes time, $s$ denotes feature map, $w$ denotes kernel, $x$ denotes input, $a$ denotes variable). In addition, the convolution operation is implemented on twodimensional images. Equation 17 shows the convolution operation of twodimensional image ($I$ denotes input image, $K$ denotes the kernel, $m$ and $n$ denote the dimension of images, $i$ and $j$ denote variables). Besides, consecutive convolutional and maxpooling layers construct the deep network. Equation 18 provides the details about the \glsnn architecture ($W$ denotes weights, $x$ denotes input, $b$ denotes bias, $z$ denotes the output of neurons). At the end of the network, the softmax function is used to get the output. Equation 19 and 20 illustrate the softmax function ($y$ denotes output) Goodfellowetal2016.
$$s(t)=(x*w)(t)=\sum _{a=\mathrm{\infty}}^{\mathrm{\infty}}x(a)w(ta)$$  (16) 
$$S(i,j)=(I*K)(i,j)=\sum _{m}\sum _{n}I(m,n)K(im,jn).$$  (17) 
$${z}_{i}=\sum _{j}{W}_{i}{,}_{j}{x}_{j}+{b}_{i}.$$  (18) 
$$y=\mathrm{soft}\mathrm{max}(z)$$  (19) 
$$\mathrm{soft}\mathrm{max}({z}_{i})=\frac{\mathrm{exp}({z}_{i})}{\sum _{j}\mathrm{exp}({z}_{j})}$$  (20) 
The backpropagation process is used for model learning of \glscnn. Most commonly used optimization algorithms (\glssgd, \glsrmsprop) are used to find optimum parameters of \glscnn. Hyperparameters of \glscnn are similar to other \glsdl model hyperparameters: the number of hidden layers, the number of units in each layer, network weight initialization, activation functions, learning rate, momentum values, the number of epochs, batch size (minibatch size), decay rate, optimization algorithms, dropout, kernel size, and filter size. In order to find the best hyperparameters of \glscnn, usual search algorithms are used: \glsmanualsearch, \glsgridsearch, \glsrandomsearch, and Bayesian Methods. Bergstra_2011; Bergstra_2012.
3.5 Restricted Boltzmann Machines (RBMs)
\glsrbm is a productive stochastic \glsann that can learn probability distribution on the input set Qiu2014. \glsplrbm are mostly used for unsupervised learning Hrasko_2015. \glsplrbm are used in applications such as dimension reduction, classification, feature learning, collaborative filtering Salakhutdinov_2007. The advantage of the \glsplrbm is to find hidden patterns with an unsupervised method. The disadvantage of \glsplrbm is its difficult training process. “\glsplrbm are tricky because although there are good estimators of the loglikelihood gradient, there are no known cheap ways of estimating the loglikelihood itself" Bengio_2012.
rbm is a twolayer, bipartite, and undirected graphical model that consists of two layers; visible and hidden layers (Figure 3). The layers are not connected among themselves. Each cell is a computational point that processes the input and makes stochastic decisions about whether this nerve node will transmit the input. Inputs are multiplied by specific weights, certain threshold values (bias) are added to input values, then calculated values are passed through an activation function. In reconstruction stage, the results in the outputs reenter the network as the input, then they exit from the visible layer as the output. The values of the previous input and the values after the processes are compared. The purpose of the comparison is to reduce the difference.
Equation 21 illustrates the probabilistic semantics for an \glsrbm by using its energy function ($P$ denotes the probabilistic semantics for an \glsrbm, $Z$ denotes the partition function, $E$ denotes the energy function, $h$ denotes hidden units, $v$ denotes visible units).Equation 22 illustrates the partition function or the normalizing constant. Equation 23 shows the energy of a configuration (in matrix notation) of the standard type of \glsrbm that has binaryvalued hidden and visible units ($a$ denotes bias weights (offsets) for the visible units, $b$ denotes bias weights for the hidden units, $W$ denotes matrix weight of the connection between hidden and visible units, $T$ denotes the transpose of matrix, $v$ denotes visible units, $h$ denotes hidden units) mohamed2009deep; lee2009convolutional.
$$P(v,h)=\frac{1}{Z}\mathrm{exp}(E(v,h))$$  (21) 
$$Z=\sum _{v}\sum _{h}\mathrm{exp}(E(v,h))$$  (22) 
$$E(v,h)={a}^{T}v{b}^{T}h{v}^{T}Wh$$  (23) 
The learning is performed multiple times on the network Qiu2014. The training of \glsplrbm is implemented through minimizing the negative loglikelihood of the model and data. \glscd algorithm is used for the stochastic approximation algorithm which replaces the model expectation for an estimation using Gibbs Sampling with a limited number of iterations Hrasko_2015. In the \glscd algorithm, the \glskldivergence algorithm is used to measure the distance between its reconstructed probability distribution and the original probability distribution of the input Van_2009.
Momentum, learning rate, weightcost (decay rate), batch size (minibatch size), regularization method, the number of epochs, the number of layers, initialization of weights, size of visible units, size of hidden units, type of activation units (sigmoid, softmax, \glsrelu, Gaussian units, etc.), loss function, and optimization algorithms are the hyperparameters of \glsplrbm. Similar to the other deep networks, the hyperparameters are searched with \glsmanualsearch, \glsgridsearch, \glsrandomsearch, and bayesian methods (Gaussian process). In addition to these, \glsais is used to estimate the partition function. \glscd algorithm is also used for the optimization of \glsplrbm Bergstra_2011; Bergstra_2012; Yao_2016; Carreira_2005.
3.6 Deep Belief Networks (DBNs)
\glsdbn is a type of deep \glsann and consists of a stack of \glsrbm networks (Figure 4). \glsdbn is a probabilistic generative model that consists of latent variables. In \glsdbn, there is no link between units in each layer. \glspldbn are used to find discriminate independent features in the input set using unsupervised learning mohamed2009deep. The ability to encode the higherorder network structures and fast inference are the advantages of the DBNs Tamilselvan_2013. \glspldbn have disadvantages of training like \glsplrbm which is mentioned in the \glsrbm section, (\glspldbn are composed of \glsplrbm).
When \glsdbn is trained on the training set in an unsupervised manner, it can learn to reconstruct the input set in a probabilistic way. Then the layers on the network begin to detect discriminating features in the input. After this learning step, supervised learning is carried out to perform the classification Hinton2006. Equation 24 illustrates the probability of generating a visible vector ($W$: matrix weight of connection between hidden unit $h$ and visible unit $v$, $p(hW)$: the prior distribution over hidden vectors) mohamed2009deep.
$$p(v)=\sum _{h}p(hW)p(vh,W)$$  (24) 
dbn training process can be divided into two steps: stacked \glsrbm learning and backpropagation learning. In stacked \glsrbm learning, iterative \glscd algorithm is used Hrasko_2015. In backpropagation learning, optimization algorithms (\glssgd, \glsrmsprop, \glsadam) are used to train network Tamilselvan_2013. \glspldbn’ hyperparameters are similar to RBMs’ hyperparameters. Momentum, learning rate, weightcost (decay rate), regularization method, batch size (minibatch size), the number of epochs, the number of layers, initialization of weights, the number of \glsrbm stacks, size of visible units in \glsplrbm’ layers, size of hidden units in \glsplrbm’ layer, type of units (sigmoid, softmax, rectified, Gaussian units, etc.), network weight initialization, and optimization algorithms are the hyperparameters of DBNs. Similar to the other deep networks, the hyperparameters are searched with \glsmanualsearch, \glsgridsearch, \glsrandomsearch, and Bayesian methods. \glscd algorithm is also used for the optimization of \glspldbn Bergstra_2011; Bergstra_2012; Yao_2016; Carreira_2005.
3.7 Autoencoders (AEs)
\glsae networks are \glsann types that are used as unsupervised learning models. In addition, \glsae networks are commonly used in \glsdl models, wherein they remap the inputs (features) such that the inputs are more representative for classification. In other words, \glsae networks perform an unsupervised feature learning process, which fits very well with the \glsdl theme. A representation of a data set is learned by reducing the dimensionality with \glsplae. \glsplae are similar to \glsplffnn’ architecture. They consist of an input layer, an output layer and one or more hidden layers that connect them together. The number of nodes in the input layer and the number of nodes in the output layer are equal to each other in \glsplae, and they have a symmetrical structure. The most notable advantages of \glsplae are dimensionality reduction and feature learning. Meanwhile, reducing dimensionality and feature extraction in \glsplae cause some drawbacks. Focusing on minimizing the loss of data relationship in encoding of \glsae cause the loss of some significant data relationships. Hence, this may be considered as a drawback of \glsplaeMeng_2017.
In general, \glsplae contain two components: encoder and decoder. The input $x\in {[0,1]}^{d}$ is converted through function $f(x)$ (${W}_{1}$ denotes a weight matrix of encoder, ${b}_{1}$ denotes a bias vector of encoder, ${\sigma}_{1}$ elementwise sigmoid activation function of encoder). Output $h$ is the encoded part of \glsplae (code), latent variables, or latent representation. The inverse of function $f(x)$, called function $g(h)$, produces the reconstruction of output $r$ (${W}_{2}$ denotes a weight matrix of decoder, ${b}_{2}$ denotes a bias vector of decoder, ${\sigma}_{2}$ elementwise sigmoid activation function of decoder). Equations 25 and 26 illustrate the simple AE process Vincent_2008. Equation 27 shows the loss function of the \glsae, the \glsmse. In the literature, \glsplae have been used for feature extraction and dimensionality reduction Goodfellowetal2016; Vincent_2008.
$$h=f(x)={\sigma}_{1}({W}_{1}x+{b}_{1})$$  (25) 
$$r=g(h)={\sigma}_{2}({W}_{2}h+{b}_{2})$$  (26) 
$$L(x,r)={xr}^{2}$$  (27) 
ae are a specialized version of \glsplffnn. The backpropagation learning is used for the update of the weights in the networkGoodfellowetal2016. Optimization algorithms (\glssgd, \glsrmsprop, \glsadam) are used for the learning process of \glsplae. \glsmse is used as a loss function in \glsplae. In addition, recirculation algorithms may also be used for the training of the \glsplae Goodfellowetal2016. \glsplae’ hyperparameters are similar to \glsdl hyperparameters. Learning rate, weightcost (decay rate), dropout fraction, batch size (minibatch size), the number of epochs, the number of layers, the number of nodes in each encoder layers, type of activation functions, number of nodes in each decoder layers, network weight initialization, optimization algorithms, and the number of nodes in the code layer (size of latent representation) are the hyperparameters of \glsplae. Similar to the other deep networks, the hyperparameters are searched with \glsmanualsearch, \glsgridsearch, \glsrandomsearch, and Bayesian methods Bergstra_2011; Bergstra_2012.
3.8 Deep Reinforcement Learning (DRL)
\glsrl is a type of learning method that differs from supervised and unsupervised learning models. It does not need a preliminary data set which is labeled or clustered before. \glsrl is an ML approach inspired by learning action/behavior, which deals with what actions should be taken by subjects to achieve the highest reward in an environment. There are different application areas that are used: game theory, control theory, multiagent systems, operations research, robotics, information theory, managing investment portfolio, simulationbased optimization, playing Atari games, and statistics sutton1998introduction. Some of the advantages of using \glsrl for control problems are that an agent can be easily retrained to adapt to changes in the environment and that the system is continually improved while training is constantly performed. An \glsrl agent learns by interacting with its surroundings and observing the results of these interactions. This learning method mimics the basic way of how people learn.
rl is mainly based on \glsmdp. \glsmdp is used to formalize the \glsrl environment. \glsmdp consists of five tuples: state (finite set of states), action (finite set of actions), reward function (scalar feedback signal), state transition probability matrix ($p({s}^{\prime},rs,a)$, ${s}^{\prime}$ denotes next state, $r$ denotes reward function, $s$ denotes state, $a$ denotes action), discount factor ($\gamma $, present value of future rewards). The aim of the agent is to maximize the cumulative reward. The return (${G}_{t}$) is the total discounted reward. Equation 28 illustrates the total return (${G}_{t}$ denotes total discounted reward, $R$ denotes rewards, $t$ denotes time, $k$ denotes variable in time).
$${G}_{t}={R}_{t+1}+\gamma {R}_{t+2}+{\gamma}^{2}{R}_{t+3}+\mathrm{\dots}=\sum _{k=0}^{\mathrm{\infty}}{\gamma}^{k}{R}_{t+k+1}$$  (28) 
The value function is the prediction of the future values. It informs about how good is state/action. Equation 29 illustrates the formulation of the value function ($v(s)$ denotes the value function, $E[.]$ denotes the expectation function, ${G}_{t}$ denotes the total discounted reward, $s$ denotes the given state, $R$ denotes the rewards, $S$ denotes the set of states, $t$ denotes time).
$$v(s)=E[{G}_{t}{S}_{t}=s]=E[{R}_{t+1}+\gamma v({S}_{t+1}){S}_{t}=s]$$  (29) 
Policy ($\pi $) is the agent’s behavior strategy. It is like a map from state to action. There are two types of value functions to express the actions in the policy: statevalue function (${v}_{\pi}(s)$), actionvalue function (${q}_{\pi}(s,a)$). The statevalue function (Equation 30) is the expected return of starting from $s$ to following policy $\pi $ (${E}_{\pi}[.]$ denotes expectation function). The actionvalue function (Equation 31) is the expected return of starting from $s$, taking action a to following policy $\pi $ ($A$ denotes the set of actions, $a$ denotes the given action).
$${v}_{\pi}(s)={E}_{\pi}[{G}_{t}{S}_{t}=s]={E}_{\pi}[\sum _{k=0}^{\mathrm{\infty}}{\gamma}^{k}{R}_{t+k+1}{S}_{t}=s]$$  (30) 
$${q}_{\pi}(s,a)={E}_{\pi}[{G}_{t}{S}_{t}=s,{A}_{t}=a]$$  (31) 
The optimal statevalue function (Equation 32) is the maximum value function over all policies. The optimal actionvalue function (Equation 33) is the maximum actionvalue function over all policies.
$${v}_{*}(s)=\mathrm{max}({v}_{\pi}(s))$$  (32) 
$${q}_{*}(s,a)=\mathrm{max}({q}_{\pi}(s,a))$$  (33) 
The \glsrl solutions and methods in the literature are too broad to review in this paper. So, we summarized the important issues of \glsrl, important \glsrl solutions and methods. \glsrl methods are mainly divided into two sections: Modelbased methods and modelfree methods. The modelbased method uses a model that is known by the agent before, value/policy and experience. The experience can be real (sample from the environment) or simulated (sample from the model). Modelbased methods are mostly used in the application of robotics, and control algorithms Nguyen_2011. Modelfree methods are mainly divided into two groups: Valuebased and policybased methods. In valuebased methods, a policy is produced directly from the value function (e.g. epsilongreedy). In policybased methods, the policy is parametrized directly. In valuebased methods, there are three main solutions for \glsmdp problems: \glsdp, \glsmc, and \glstd.
In \glsdp method, problems are solved with optimal substructure and overlapping subproblems. The full model is known and it is used for planning in \glsmdp. There are two iterations (learning algorithms) in \glsdp: policy iteration and value iteration. \glsmc method learns experience directly by running an episode of game/simulation. \glsmc is a type of modelfree method that does not need \glsmdp transitions/rewards. It collects states, returns and it gets mean of returns for the value function. \glstd is also a modelfree method that learns the experience directly by running the episode. In addition, \glstd learns incomplete episodes like the \glsdp method by using bootstrapping. \glstd method combines \glsmc and \glsdp methods. SARSA (state, action, reward, state, action; ${S}_{t}$, ${A}_{t}$, ${R}_{t}$, ${S}_{t+1}$, ${A}_{t+1}$) is a type of \glstd control algorithm. Qvalue (actionvalue function) is updated with the agent actions. It is an onpolicy learning model that learns from actions according to the current policy $\pi $. Equation 34 illustrates the update of the actionvalue function in SARSA algorithm (${S}_{t}$ denotes current state, ${A}_{t}$ denotes current action, $t$ denotes time, $R$ denotes reward, $\alpha $ denotes learning rate, $\gamma $ denotes discount factor). Qlearning is another \glstd control algorithm. It is an offpolicy learning model that learns from different actions that do not need the policy $\pi $ at all. Equation 35 illustrates the update of the actionvalue function in QLearning algorithm (The whole algorithms can be reached in sutton1998introduction, ${a}^{\prime}$ denotes action).
$$Q({S}_{t},{A}_{t})=Q({S}_{t},{A}_{t})+\alpha [R(t+1)+\gamma Q({S}_{t+1},{A}_{t+1})Q({S}_{t},{A}_{t})]$$  (34) 
$$Q({S}_{t},{A}_{t})=Q({S}_{t},{A}_{t})+\alpha [R(t+1)+\gamma ma{x}_{{a}^{\prime}}Q({S}_{t+1},{a}^{\prime})Q({S}_{t},{A}_{t})]$$  (35) 
In the valuebased methods, a policy can be generated directly from the value function (e.g. using epsilongreedy). The policybased method uses the policy directly instead of using the value function. It has advantages and disadvantages over the valuebased methods. The policybased methods are more effective in highdimensional or continuous action spaces, and have better convergence properties when compared against the valuebased methods. It can also learn the stochastic policies. On the other hand, the policybased method evaluates a policy that is typically inefficient and has high variance. It typically converges to a local rather than the global optimum. In the policybased methods, there are also different solutions: Policy gradient, Reinforce (MonteCarlo Policy Gradient), ActorCritic sutton1998introduction (Details of policybased methods can be reached in sutton1998introduction).
drl methods contain \glsplnn. Therefore, \glsdrl hyperparameters are similar to \glsdl hyperparameters. Learning rate, weightcost (decay rate), dropout fraction, regularization method, batch size (minibatch size), the number of epochs, the number of layers, the number of nodes in each layer, type of activation functions, network weight initialization, optimization algorithms, discount factor, and the number of episodes are the hyperparameters of \glsdrl. Similar to the other deep networks, the hyperparameters are searched with \glsmanualsearch, \glsgridsearch, \glsrandomsearch and bayesian methods Bergstra_2011; Bergstra_2012.
4 Financial Time Series Forecasting
The most widely studied financial application area is forecasting of a given financial time series, in particular asset price forecasting. Even though some variations exist, the main focus is on predicting the next movement of the underlying asset. More than half of the existing implementations of \glsdl were focused on this area. Even though there are several subtopics of this general problem including Stock price forecasting, Index prediction, forex price prediction, commodity (oil, gold, etc) price prediction, bond price forecasting, volatility forecasting, cryptocurrency price forecasting, the underlying dynamics are the same in all of these applications.
The studies can also be clustered into two main groups based on their expected outputs: price prediction and price movement (trend) prediction. Even though price forecasting is basically a regression problem, in most of the financial time series forecasting applications, correct prediction of the price is not perceived as important as correctly identifying the directional movement. As a result, researchers consider trend prediction, i.e. forecasting which way the price will change, a more crucial study area compared with exact price prediction. In that sense, trend prediction becomes a classification problem. In some studies, only up or down movements are taken into consideration (2class problem), whereas up, down or neutral movements (3class problem) also exist.
lstm and its variations along with some hybrid models dominate the financial time series forecasting domain. \glslstm, by its nature utilizes the temporal characteristics of any time series signal, hence forecasting financial time series is a wellstudied and successful implementation of \glslstm. However, some researchers prefer to either extract appropriate features from the time series or transform the time series in such a way that, the resulting financial data becomes stationary from a temporal perspective, meaning even if we shuffle the data order, we will still be able to properly train the model and achieve successful outofsample test performance. For those implementations, \glscnn and \glsdfnn were the most commonly chosen \glsdl models.
Various financial time series forecasting implementations using \glsdl models exist in literature. We will cover each of these aforementioned implementation areas in the following subsections. In this survey paper, we examined the papers using the following criteria:

1.
First, we grouped the articles according to their subjects.

2.
Then, we grouped the related papers according to their feature set.

3.
Finally, we grouped each subgroup according to \glsdl models/methods.
For each implementation area, the related papers will be subgrouped and tabulated. Each table will have the following fields to provide the information about the implementation details for the papers within the group: Article (Art.) and Data Set are trivial, Period refers to the time period for training and testing. Feature Set lists the input features used in the study. Lag has the time length of the input vector (e.g. 30d means the input vector has a 30 day window) and horizon shows how far out into the future is predicted by the model. Some abbreviations are used for these two aforementioned fields: min is minutes, h is hours, d is days, w is weeks, m is months, y is years, s is steps, * is mixed. Method shows the \glsdl models that are used in the study. Performance criteria provides the evaluation metrics, and finally the Environment (Env.) lists the development framework/software/tools. Some column values might be empty, indicating there was no relevant information in the paper for the corresponding field.
4.1 Stock Price Forecasting
Price prediction of any given stock is the most studied financial application of all. We observed the same trend within the \glsdl implementations. Depending on the prediction time horizon, different input parameters are chosen varying from \glshft and intraday price movements to daily, weekly or even monthly stock close prices. Also, technical, fundamental analysis, social media feeds, sentiment, etc. are among the different parameters that are used for the prediction models.
Art.  Data Set  Period  Feature Set  Lag  Horizon  Method  Performance Criteria  Env. 

Chong_2017  38 stocks in \acrshortkospi  20102014  Lagged stock returns  50min  5min  \acrshortdnn  \acrshortnmse, \acrshortrmse, \acrshortmae, \acrshortmi   
Chen_2015  China stock market, 3049 Stocks  19902015  \acrshortochlv  30d  3d  \acrshortlstm  Accuracy  Theano, Keras 
Dezsi_2016  Daily returns of ‘BRD’ stock in Romanian Market  20012016  \acrshortochlv    1d  \acrshortlstm  \acrshortrmse, \acrshortmae  Python, Theano 
Samarawickrama_2017  297 listed companies of \acrshortcse  20122013  \acrshortochlv  2d  1d  \acrshortlstm, \acrshortsrnn, \acrshortgru  \acrshortmad, \acrshortmape  Keras 
M_2018  5 stock in \acrshortnse  19972016  \acrshortochlv, Price data, turnover and number of trades.  200d  1..10d  \acrshortlstm, \acrshortrnn, \acrshortcnn, \acrshortmlp  \acrshortmape   
Selvin_2017  Stocks of Infosys, TCS and CIPLA from \acrshortnse  2014  Price data      \acrshortrnn, \acrshortlstm and \acrshortcnn  Accuracy   
Lee_2018  10 stocks in \acrshortsp500  19972016  \acrshortochlv, Price data  36m  1m  \acrshortrnn, \acrshortlstm, \acrshortgru  Accuracy, Monthly return  Keras, Tensorflow 
Li_2017  Stocks data from \acrshortsp500  20112016  \acrshortochlv  1d  1d  \acrshortdbn  \acrshortmse, \acrshortnormrmse, \acrshortmae   
Chen_2018  Highfrequency transaction data of the \acrshortcsi300 futures  2017  Price data    1min  \acrshortdnn, \acrshortelm, \acrshortrbf  \acrshortrmse, \acrshortmape, Accuracy  Matlab 
Krauss_2017  Stocks in the \acrshortsp500  19902015  Price data  240d  1d  \acrshortdnn, \acrshortgbt, \acrshortrf  Mean return, \acrshortmdd, Calmar ratio  H2O 
Chandra_2016  ACI Worldwide, Staples, and Seagate in \acrshortnasdaq  20062010  Daily closing prices  17d  1d  \acrshortrnn, \acrshortann  \acrshortrmse   
Liu_2017  Chinese Stocks  20072017  \acrshortochlv  30d  1..5d  \acrshortcnn + \acrshortlstm  Annualized Return, Mxm Retracement  Python 
Heaton_2016  20 stocks in \acrshortsp500  20102015  Price data      \acrshortae + \acrshortlstm  Weekly Returns   
Batres_2015  \acrshortsp500  19852006  Monthly and daily logreturns  *  1d  \acrshortdbn+\acrshortmlp  Validation, Test Error  Theano, Python, Matlab 
Yuan_2018  12 stocks from \acrshortsse Composite Index  20002017  \acrshortochlv  60d  1..7d  \acrshortdwnn  \acrshortmse  Tensorflow 
Zhang_2017  50 stocks from \acrshortnyse  20072016  Price data    1d, 3d, 5d  \acrshortsfm  \acrshortmse   
In this survey, first, we grouped the stock price forecasting articles according to their feature set such as studies using only the raw time series data (price data, \glsochlv) for price prediction; studies using various other data and papers that used text mining techniques. Regarding the first group, the corresponding \glsdl models were directly implemented using the raw time series for price prediction. Table 1 tabulates the stock price forecasting papers that used only raw time series data in the literature. In Table 1, different methods/models are also listed based on four subgroups: \glsdnn (networks that are deep but without any given topology details) and \glslstm models; multi models; hybrid models; novel methods.
dnn and \glslstm models were solely used in 3 papers. In Chong_2017, \glsdnn and lagged stock returns were used to predict the stock prices in \glskospi. Chen et. al. Chen_2015, Dezsi and Nistor Dezsi_2016 applied the raw price data as the input to \glslstm models.
Meanwhile, there were some studies implementing multiple \glsdl models for performance comparison using only the raw price (\glsochlv) data for forecasting. Among the noteworthy studies, the authors in Samarawickrama_2017 compared \glsrnn, \glssrnn, \glslstm and \glsgru. Hiransha et. al. M_2018 compared \glslstm, \glsrnn, \glscnn, \glsmlp, whereas in Selvin_2017 \glsrnn, \glslstm, \glscnn, \glsarima were preferred, Lee and Yoo Lee_2018 compared 3 \glsrnn models (\glssrnn, \glslstm, \glsgru) for stock price prediction and then constructed a threshold based portfolio with selecting stocks according to the predictions and Li et. al. Li_2017 implemented \glsdbn. Finally, the authors of Chen_2018 compared 4 different \glsml models (1 \glsdl model  \glsae and \glsrbm), \glsmlp, \glsrbf and \glselm for predicting the next price in 1minute price data. They also compared the results with different sized datasets. The authors of Krauss_2017 used price data and \glsdnn, \glsgbt, \glsrf methods for the prediction of the stocks in the \glssp500. In Chandra and Chan Chandra_2016, cooperative neuroevolution, \glsrnn (Elman network) and \glsdfnn were used for the prediction of stock prices in \glsnasdaq (ACI Worldwide, Staples, and Seagate).
Meanwhile, hybrid models were used in some of the papers. The author of Liu_2017 applied \glscnn+\glslstm in their studies. Heaton et. al. Heaton_2016 implemented smart indexing with \glsae. The authors of Batres_2015 combined \glsdbn and \glsmlp to construct a stock portfolio by predicting each stock’s monthly logreturn and choosing the only stocks that were expected to perform better than the performance of the median stock.
In addition, some novel approaches were adapted in some of the studies. The author of Yuan_2018 proposed novel \glsdwnn which is combination of \glsrnn and \glscnn. The author of Zhang_2017 implemented \glssfm recurrent network in their studies.
In another group of studies, some researchers again focused on \glslstm based models. However, their input parameters came from various sources including the raw price data, technical and/or fundamental analysis, macroeconomic data, financial statements, news, investor sentiment, etc. Table 2 tabulates the stock price forecasting papers that used various data such as the raw price data, technical and/or fundamental analysis, macroeconomic data in the literature. In Table 2, different methods/models are also listed based on five subgroups: \glsdnn model; \glslstm and \glsrnn models; multiple and hybrid models; \glscnn model; novel methods.
dnn models were used in some of the stock price forecasting papers within this group. In Abe_2018, \glsdnn model and 25 fundamental features were used for the prediction of the Japan Index constituents. Feng et. al. Feng_2018 also used fundamental features and \glsdnn model for the prediction. \glsdnn model, macro economic data such as GDP, unemployment rate, inventories, etc. were used by the authors of Fan_2014 for the prediction of the U.S. lowlevel disaggregated macroeconomic time series.
lstm and \glsrnn models were chosen in some of the studies. Kraus and Feuerriegel Kraus_2017 implemented \glslstm with transfer learning using text mining through financial news and the stock market data. Similarly, the author of Minami_2018 used \glslstm to predict the stock’s next day price using corporate action events and macroeconomic index. Zhang and Tan Zhang_2018_a implemented DeepStockRanker, an \glslstm based model for stock ranking using 11 technical indicators. In another study Zhuge_2017, the authors used the price time series and emotional data from text posts for predicting the stock opening price of the next day with \glslstm network. Akita et. al. Akita_2016 used textual information and stock prices through Paragraph Vector + \glslstm for forecasting the prices and the comparisons were provided with different classifiers. Ozbayoglu Ozbayoglu_2007 used technical indicators along with the stock data on a JordanElman network for price prediction.
There were also multiple and hybrid models that used mostly technical analysis features as their inputs to the \glsdl model. Several technical indicators were fed into \glslstm and \glsmlp networks in Khare_2017 for predicting intraday price prediction. Recently, Zhou et. al. Zhou_2018 used \glsganfd model for stock price prediction and compared their model performances against \glsarima, \glsann and \glssvm. The authors of Singh_2016 used several technical indicator features and time series data with \glspca for dimensionality reduction cascaded with \glsdnn (2layer \glsffnn) for stock price prediction. In Karaoglu_2017, the authors used Market microstructures based trade indicators as inputs into \glsrnn with Graves \glslstm detecting the buysell pressure of movements in \glsbist in order to perform the price prediction for intelligent stock trading. In Zhou_2018_a, next month’s return was predicted and top to be performed portfolios were constructed. Good monthly returns were achieved with \glslstm and \glslstm\glsmlp models.
Meanwhile, in some of the papers, \glscnn models were preferred. The authors of Abroyan_2017 used 250 features: order details, etc for the prediction of the private brokerage company’s real data of risky transactions. They used \glscnn and \glslstm for stock price forecasting. The authors of GooglePatent used \glscnn model, fundamental, technical and market data for the prediction.
Novel methods were also developed in some of the studies. In Tran_2017, FI2010 dataset: bid/ask and volume were used as the feature set for the forecast. In the study, they proposed \glswmtr, \glsmda. The authors of Feng_2018_a used 57 characteristic features such as Market equity, Market Beta, Industry momentum, Asset growth, etc. as inputs to a FamaFrench nfactor model \glsdl for predicting monthly US equity returns in \glsnyse, \glsamex, or \glsnasdaq.
Art.  Data Set  Period  Feature Set  Lag  Horizon  Method  Performance Criteria  Env. 

Abe_2018  Japan Index constituents from WorldScope  19902016  25 Fundamental Features  10d  1d  \acrshortdnn  Correlation, Accuracy, \acrshortmse  Tensorflow 
Feng_2018  Return of \acrshortsp500  19262016  Fundamental Features:    1s  \acrshortdnn  \acrshortmspe  Tensorflow 
Fan_2014  U.S. lowlevel disaggregated macroeconomic time series  19592008  GDP, Unemployment rate, Inventories, etc.      \acrshortdnn  \acrshortrsq   
Kraus_2017  \acrshortcdax stock market data  20102013  Financial news, stock market data  20d  1d  \acrshortlstm  \acrshortmse, \acrshortrmse, \acrshortmae, Accuracy, \acrshortauc  TensorFlow, Theano, Python, ScikitLearn 
Minami_2018  Stock of Tsugami Corporation  2013  Price data      \acrshortlstm  \acrshortrmse  Keras, Tensorflow 
Zhang_2018_a  Stocks in China’s Ashare  20062007  11 technical indicators    1d  \acrshortlstm  \acrshortareturn, \acrshortir, \acrshortic   
Zhuge_2017  SCI prices  20082015  \acrshortochl of change rate, price  7d    EmotionalAnalysis + \acrshortlstm  \acrshortmse   
Akita_2016  10 stocks in Nikkei 225 and news  20012008  Textual information and Stock prices  10d    Paragraph Vector + \acrshortlstm  Profit   
Ozbayoglu_2007  TKC stock in \acrshortnyse and QQQQ ETF  19992006  Technical indicators, Price  50d  1d  \acrshortrnn (JordanElman)  Profit, \acrshortmse  Java 
Khare_2017  10 Stocks in \acrshortnyse    Price data, Technical indicators  20min  1min  \acrshortlstm, \acrshortmlp  \acrshortrmse   
Zhou_2018  42 stocks in China’s \acrshortsse  2016  \acrshortochlv, Technical Indicators  242min  1min  \acrshortgan (\acrshortlstm, \acrshortcnn)  \acrshortrmsre, \acrshortdpa, \acrshortganF, \acrshortganD   
Singh_2016  Google’s daily stock data  20042015  \acrshortochlv, Technical indicators  20d  1d  ${\left(2D\right)}^{2}$ \acrshortpca + \acrshortdnn  \acrshortsmape, \acrshortpcd, \acrshortmape, \acrshortrmse, \acrshorthr, \acrshorttr, \acrshortrsq  R, Matlab 
Karaoglu_2017  GarantiBank in \acrshortbist, Turkey  2016  \acrshortochlv, Volatility, etc.      \acrshortplr, Graves \acrshortlstm  \acrshortmse, \acrshortrmse, \acrshortmae, \acrshortrse, \acrshortrsq  Spark 
Zhou_2018_a  Stocks in \acrshortnyse, \acrshortamex, \acrshortnasdaq, \acrshorttaq intraday trade  19932017  Price, 15 firm characteristics  80d  1d  \acrshortlstm+\acrshortmlp  Monthly return, \acrshortsr  Python,Keras, Tensorflow in AWS 
Abroyan_2017  Private brokerage company’s real data of risky transactions    250 features: order details, etc.      \acrshortcnn, \acrshortlstm  F1Score  Keras, Tensorflow 
GooglePatent  Fundamental and Technical Data, Economic Data    Fundamental , technical and market information      \acrshortcnn     
Tran_2017  The LOB of 5 stocks of Finnish Stock Market  2010  FI2010 dataset: bid/ask and volume    *  \acrshortwmtr, \acrshortmda  Accuracy, Precision, Recall, F1Score   
Feng_2018_a  Returns in \acrshortnyse, \acrshortamex, \acrshortnasdaq  19752017  57 firm characteristics  *    FamaFrench nfactor model \acrshortdl  \acrshortrsq, \acrshortrmse  Tensorflow 
There were a number of research papers that also used text mining techniques for the feature extraction, but used non\glslstm models for the stock price prediction. Table 3 tabulates the stock price forecasting papers that used text mining techniques. In Table 3, different methods/models are clustered into three subgroups: \glscnn and \glslstm models; \glsgru, \glslstm, and \glsrnn models; novel methods.
cnn and \glslstm models were adapted in some of the papers. In Ding_2015, events were detected from Reuters and Bloomberg news through text mining and that information was used for the price prediction and stock trading through the \glscnn model. Vargas et. al. Vargas_2017 used text mining on \glssp500 index news from Reuters through a \glslstm+\glscnn hybrid model for price prediction and intraday directional movement estimation together. The authors of Lee_2017_b used the financial news data and implemented word embedding with Word2vec along with MA and stochastic oscillator to create inputs for \glsrcnn for stock price prediction. The authors of Iwasaki_2018 also used sentiment analyses through text mining and word embeddings from analyst reports and used sentiment features as inputs to \glsdfnn model for stock price prediction. Then different portfolio selections were implemented based on the projected stock returns.
gru, \glslstm, and \glsrnn models were preferred in the next group of papers. Das et. al. Das_2018 implemented sentiment analysis on Twitter posts along with the stock data for price forecasting using \glsrnn. Similarly, the authors of Jiahong_Li_2017 used sentiment classification (neutral, positive, negative) for the stock open or close price prediction with various \glslstm models. They compared their results with \glssvm and achieved higher overall performance. In Zhongshengz_2018, text and price data were used for the prediction of the \glssci prices.
Some novel approaches were also found in some of the papers. The authors of Nascimento_2015 used word embeddings for extracting information from web pages and then combined with the stock price data for stock price prediction. They compared \glsar model and \glsrf with and without news. The results showed embedding news information improved the performance. In Han_2018, financial news and ACE2005 Chinese corpus were used. Different eventtypes on Chinese companies were classified based on a novel eventtype pattern classification algorithm in Han_2018, also next day stock price change was predicted using additional inputs.
Art.  Data Set  Period  Feature Set  Lag  Horizon  Method  Performance Criteria  Env. 

Ding_2015  \acrshortsp500 Index, 15 stocks in \acrshortsp500  20062013  News from Reuters and Bloomberg      \acrshortcnn  Accuracy, \acrshortmcc   
Vargas_2017  \acrshortsp500 index news from Reuters  20062013  Financial news titles, Technical indicators  1d  1d  \acrshortrcnn  Accuracy   
Lee_2017_b  \acrshorttwse index, 4 stocks in \acrshorttwse  20012017  Technical indicators, Price data, News  15d    \acrshortcnn + \acrshortlstm  \acrshortrmse, Profit  Keras, Python, TALIB 
Iwasaki_2018  Analyst reports on the TSE and Osaka Exchange  20162018  Text      \acrshortlstm, \acrshortcnn, \acrshortbilstm  Accuracy, Rsquared  R, Python, MeCab 
Das_2018  Stocks of Google, Microsoft and Apple  20162017  Twitter sentiment and stock prices      \acrshortrnn    Spark, Flume, Twitter API, 
Jiahong_Li_2017  Stocks of \acrshortcsi300 index, \acrshortochlv of \acrshortcsi300 index  20092014  Sentiment Posts, Price data  1d  1d  Naive Bayes + \acrshortlstm  Precision, Recall, F1score, Accuracy  Python, Keras 
Zhongshengz_2018  SCI prices  20132016  Text data and Price data  7d  1d  \acrshortlstm  Accuracy, F1Measure  Python, Keras 
Nascimento_2015  Stocks from \acrshortsp500  20062013  Text (news) and Price data  7d  1d  \acrshortlar+News, \acrshortrf+News  \acrshortmape, \acrshortrmse   
Han_2018  News from Sina.com, ACE2005 Chinese corpus  20122016  A set of news text      Their unique algorithm  Precision, Recall, F1score   
4.2 Index Forecasting
Instead of trying to forecast the price of a single stock, several researchers preferred to predict the stock market index. Indices generally are less volatile than individual stocks, since they are composed of multiple stocks from different sectors and are more indicative of the overall momentum and general state of the economy.
In the literature, different stock market index data were used for the experiments. Most commonly used index data can be listed as follows: \glssp500, \glscsi300, \glsnifty, \glsnikkei225, \glsdjia, \glssse180, \glshsi, \glsszse, \glsftse100, \glstaiex, \glsbist, \glsnasdaq, \glsdow30, \glskospi, \glsvix, \glsvxn, \glsbovespa, \glsomx, \glsnyse. The authors of Bao_2017; Parida_2016; Fischer_2018; Widegren_2017; borovykh_2018; Althelaya_2018; Dingli_2017; Rout_2017; Jeong_2019; Baek_2018; Hansson_2017; Elliot_2017; Ding_2015 used \glssp500 as their dataset. The authors of Bao_2017; Parida_2016; Li_2017a; Namini_2018; Hsieh_2011 used \glsnikkei as their dataset. \glskospi was used in Li_2017a; Jeong_2019; Baek_2018. \glsdjia was used as the dataset in Bao_2017; Namini_2018; Hsieh_2011; Zhang_2015; Bekiros_2013. Besides, the authors of Bao_2017; Li_2017a; Hsieh_2011; Jeong_2019 used \glshsi as the dataset in their studies. \glsszse is used in studies of Pang_2018; Li_2017a; Deng_2017; Yang_2017.
In addition, in the literature, there were different methods for the prediction of the index data. While some of the studies used only the raw time series data, some others used various other data such as technical indicators, index data, social media feeds, news from Reuters, Bloomberg, the statistical features of data (standard deviation, skewness, kurtosis, omega ratio, fund alpha). In this survey, first, we grouped the index forecasting articles according to their feature set such as studies using only the raw time series data (price/index data, \glsochlv); then in the second group we clustered the studies using various other data. Table 4 tabulates the index forecasting papers using only the raw time series data. Moreover, different methods (models) were used for index forecasting. \glsmlp, \glsrnn, \glslstm, \glsdnn (most probably \glsdfnn, or \glsdmlp) methods were the most used methods for index forecasting. In Table 4, these various methods/models are also listed as four subgroups: \glsann, \glsdnn, \glsmlp, and \glsfddr models; \glsrl and \glsdl models; \glslstm and \glsrnn models; novel methods.
Art.  Data Set  Period  Feature Set  Lag  Horizon  Method  Performance Criteria  Env. 

Parida_2016  \acrshortsp500, Nikkei225, USD Exchanges  20112015  Index data    1d, 5d, 7d, 10d  \acrshortlrnfis with FireflyHarmony Search  \acrshortrmse, \acrshortmape, \acrshortmae   
Fischer_2018  \acrshortsp500 Index  19892005  Index data, Volume  240d  1d  \acrshortlstm  Return, \acrshortstd, \acrshortsr, Accuracy  Python, TensorFlow, Keras, R, H2O 
borovykh_2018  \acrshortsp500, \acrshortvix  20052016  Index data  *  1d  uWN, cWN  \acrshortmase, \acrshorthit, \acrshortrmse   
Althelaya_2018  \acrshortsp500 Index  20102017  Index data  10d  1d, 30d  Stacked \acrshortlstm, \acrshortbilstm  \acrshortmae, \acrshortrmse, Rsquared  Python, Keras, Tensorflow 
Jeong_2019  \acrshortsp500, \acrshortkospi, \acrshorthsi, and EuroStoxx50  19872017  200days stock price  200d  1d  Deep QLearning and \acrshortdnn  Total profit, Correlation   
Baek_2018  \acrshortsp500, \acrshortkospi200, 10stocks  20002017  Index data  20d  1d  ModAugNet: \acrshortlstm  \acrshortmse, \acrshortmape, \acrshortmae  Keras 
Hansson_2017  \acrshortsp500, Bovespa50, \acrshortomx30  20092017  Autoregressive part of the time series    1d  \acrshortlstm  \acrshortmse, Accuracy  Tensorflow, Keras, R 
Elliot_2017  \acrshortsp500  20002017  Index data    1..4d, 1w, 1..3m  \acrshortglm, \acrshortlstm+\acrshortrnn  \acrshortmae, \acrshortrmse  Python 
Namini_2018  Nikkei225, \acrshortixic, \acrshorthsi, \acrshortgspc, \acrshortdjia  19852018  \acrshortochlv  5d  1d  \acrshortlstm  \acrshortrmse  Python, Keras, Theano 
Zhang_2015  \acrshortdjia    Index data      Genetic Deep Neural Network  \acrshortmse  Java 
Bekiros_2013  Log returns of the \acrshortdjia  19712002  Index data  20d  1d  \acrshortrnn  \acrshorttr, sign rate, PT/HM test, \acrshortmsfe, \acrshortsr, profit   
Pang_2018  Shanghai Ashares composite index, \acrshortszse  20062016  \acrshortochlv  10d    Embedded layer + \acrshortlstm  Accuracy, \acrshortmse  Python, Matlab, Theano 
Deng_2017  300 stocks from \acrshortszse, Commodity  20142015  Index data      \acrshortfddr, \acrshortdnn + \acrshortrl  Profit, return, \acrshortsr, profitloss curves  Keras 
Yang_2017  Shanghai composite index and \acrshortszse  19902016  \acrshortochlv  20d  1d  Ensembles of \acrshortann  Accuracy   
Lachiheb_2018  \acrshorttunindex  20132017  Log returns of index data    5min  \acrshortdnn with hierarchical input  Accuracy, \acrshortmse  Java 
Yong_2017  Singapore Stock Market Index  20102017  \acrshortochl of last 10 days of index  10d  3d  Feedforward \acrshortdnn  \acrshortrmse, \acrshortmape, Profit, \acrshortsr   
Yumlu_2005  \acrshortbist  19902002  Index data  7d  1d  \acrshortmlp, \acrshortrnn, \acrshortmoe  \acrshorthit, positive/negative \acrshorthit, \acrshortmse, \acrshortmae   
Yan_2017  SCI  20122017  \acrshortochlv, Index data    1..10d  Wavelet + \acrshortlstm  \acrshortmape, theil unequal coefficient   
Takahashi_2017  \acrshortsp500  19502016  Index data  15d  1d  \acrshortlstm  \acrshortrmse  Keras 
Bildirici_2010  \acrshortise100  19872008  Index data    2d, 4d, 8d, 12d, 18d  \acrshorttar\acrshortvec\acrshortmlp, \acrshorttar\acrshortvec\acrshortrbf, \acrshorttar\acrshortvec\acrshortrhe  \acrshortrmse   
Psaradellis_2016  \acrshortvix, \acrshortvxn, \acrshortvxd  20022014  First five autoregressive lags  5d  1d, 22d  \acrshorthargasvr  \acrshortmae, \acrshortrmse   
ann, \glsdnn, \glsmlp, and \glsfddr models were used in some of the studies. In Lachiheb_2018, log returns of the index data was used with \glsdnn with hierarchical input for the prediction of the TUNINDEX data. The authors of Yong_2017 used deep \glsffnn and \glsochl of the last 10 days of index data for prediction. In addition, \glsmlp and \glsann were used for the prediction of index data. In Yumlu_2005, the raw index data was used with \glsmlp, \glsrnn, \glsmoe and \glsegarch for the forecast. In Yang_2017, ensembles of \glsann with \glsochlv of the data were used for the prediction of the Shanghai composite index.
Furthermore, \glsrl and \glsdl methods were used together for the prediction of the index data in some of the studies. In Deng_2017, \glsfddr, \glsdnn and \glsrl methods were used to predict 300 stocks from \glsszse index data and commodity prices. In Jeong_2019, Deep QLearning and \glsdnn methods and 200days stock price dataset were used together for the prediction of \glssp500 index.
Most of the preferred methods for prediction of the index data using the raw time series data were based on \glslstm and \glsrnn. In Bekiros_2013, \glsrnn was used for prediction of the log returns of \glsdjia index. In Fischer_2018, \glslstm was used to predict \glssp500 Index data. The authors of Althelaya_2018 used stacked \glslstm, \glsbilstm methods for \glssp500 Index forecasting. The authors of Yan_2017 used \glslstm network to predict the next day closing price of Shanghai stock Index. In their study, they used wavelet decomposition to reconstruct the financial time series for denoising and better learning. In Pang_2018, \glslstm was used for the prediction of Shanghai Ashares composite index. The authors of Namini_2018 used \glslstm to predict \glsnikkei225, IXIC, HIS, GSPC and \glsdjia index data. In Takahashi_2017 and Baek_2018, \glslstm was also used for the prediction of \glssp500 and \glskospi200 index. The authors of Baek_2018 developed an \glslstm based stock index forecasting model called ModAugNet. The proposed method was able to beat \glsbh in the long term with an overfitting prevention mechanism. The authors of Elliot_2017 compared different \glsml models (linear model), \glsgml and several \glslstm, \glsrnn models for stock index price prediction. In Hansson_2017, \glslstm and autoregressive part of the time series index data were used for prediction of \glssp500, \glsbovespa50, \glsomx30 indices.
Also, some studies adapted novel appraches. In Zhang_2015, genetic \glsdnn was used for \glsdjia index forecasting. The authors of borovykh_2018 proposed a new \glsdnn model which is called Wavenet convolutional net for time series forecasting. The authors of Bildirici_2010 proposed a (\glstar\glsvec\glsrhe) model for forex and stock index of return prediction and compared several models. The authors of Parida_2016 proposed a method that is called \glslrnfis with \glsfhso \glsea to predict \glssp500, \glsnikkei225 indices and USD Exchange price data. The authors of Psaradellis_2016 proposed a \glshar with a \glsgasvr model that was called \glshar\glsgasvr for prediction of \glsvix, \glsvxn, \glsvxd indices.
In the literature, some of the studies used various input data such as technical indicators, index data, social media news, news from Reuters, Bloomberg, the statistical features of data (standard deviation, skewness, kurtosis, omega ratio, fund alpha). Table 5 tabulates the index forecasting papers using these aforementioned various data. \glsdnn, \glsrnn, \glslstm, \glscnn methods were the most commonly used models in index forecasting. In Table 5, different methods/models are also listed within four subgroups: \glsdnn model; \glsrnn and \glslstm models; \glscnn model; novel methods.
dnn was used as the classification model in some of the papers. In Chen_2016, \glsdnn and some of the feature of the data (Return, \glssr, \glsstd, Skewness, Kurtosis, Omega ratio, Fund alpha) were used for the prediction. In Widegren_2017, \glsdnn, \glsrnn and technical indicators were used for the prediction of \glsftse100, \glsomx30, \glssp500 indices.
In addition, \glsrnn and \glslstm models with various other data were also used for the prediction of the indices. The authors of Hsieh_2011 used \glsrnn and \glsochlv of indices, technical indicators to predict \glsdjia, \glsftse, Nikkei, \glstaiex indices. The authors of Mourelatos_2018 used \glsgasvr, \glslstm for the forecast. The authors of Chen_2018_f used four \glslstm models (technical analysis, attention mechanism and market vector embedded) for the prediction of the daily return ratio of \glshsi300 index. In Li_2017a, \glslstm with wavelet denoising and index data, volume, technical indicators were used for the prediction of the \glshsi, \glssse, \glsszse, \glstaiex, \glsnikkei, \glskospi indices. The authors of Si_2017 used MODRL+\glslstm method to predict Chinese stockIFIHIC contract indices. The authors of Bao_2017 used stacked \glsplae to generate deep features using \glsochl of the stock prices, technical indicators and macroeconomic conditions to feed to \glslstm to predict the future stock prices.
Art.  Data Set  Period  Feature Set  Lag  Horizon  Method  Performance Criteria  Env. 

Ding_2015  \acrshortsp500 Index, 15 stocks in \acrshortsp500  20062013  News from Reuters and Bloomberg      \acrshortcnn  Accuracy, \acrshortmcc   
Lee_2017_b  \acrshorttwse index, 4 stocks in \acrshorttwse  20012017  Technical indicators, Index data, News  15d    \acrshortcnn + \acrshortlstm  \acrshortrmse, Profit  Keras, Python, \acrshorttalib 
Bao_2017  \acrshortcsi300, \acrshortnifty50, \acrshorthsi, \acrshortnikkei225, \acrshortsp500, \acrshortdjia  20102016  \acrshortochlv, Technical Indicators    1d  \acrshortwt, Stacked autoencoders, \acrshortlstm  \acrshortmape, Correlation coefficient, \acrshorttheilu   
Widegren_2017  FTSE100, OMXS 30, SP500, Commodity, Forex  19932017  Technical indicators  60d  1d  \acrshortdnn, \acrshortrnn  Accuracy, pvalue   
Dingli_2017  \acrshortsp500, \acrshortdow30, \acrshortnasdaq100, Commodity, Forex, Bitcoin  20032016  Index data, Technical indicators    1w, 1m  \acrshortcnn  Accuracy  Tensorflow 
Rout_2017  \acrshortbse, \acrshortsp500  20042012  Index data, technical indicators  5d  1d..1m  \acrshortpso, \acrshorthmrpso, \acrshortde, \acrshortrceflann  \acrshortrmse, \acrshortmape   
Li_2017a  \acrshorthsi, \acrshortsse, \acrshortszse, \acrshorttaiex, \acrshortnikkei, \acrshortkospi  20102016  Index data, volume, technical indicators  2d..512d  1d  \acrshortlstm with wavelet denoising  Accuracy, \acrshortmape   
Hsieh_2011  \acrshortdjia, \acrshortftse, \acrshortnikkei, \acrshorttaiex  19972008  \acrshortochlv, Technical indicators  26d  1d  \acrshortrnn  \acrshortrmse, \acrshortmae, \acrshortmape, \acrshorttheilu  C 
Chen_2016  Hedge fund monthly return data  19962015  Return, \acrshortsr, \acrshortstd, Skewness, Kurtosis, Omega ratio, Fund alpha  12m  3m, 6m, 12m  \acrshortdnn  Sharpe ratio, Annual return, Cum. return   
Mourelatos_2018  Stock of National Bank of Greece (ETE).  20092014  \acrshortftse100, \acrshortdjia, \acrshortgdax, \acrshortnikkei225, EUR/USD, Gold  1d, 2d, 5d, 10d  1d  \acrshortgasvr, \acrshortlstm  Return, volatility, \acrshortsr, Accuracy  Tensorflow 
Chen_2018_f  Daily return ratio of \acrshorths300 index  20042018  \acrshortochlv, Technical indicators      Market Vector + Tech. ind. + \acrshortlstm + Attention  \acrshortmse, \acrshortmae  Python, Tensorflow 
Si_2017  Chinese stockIFIHIC contract  20162017  Decisions for index change  240min  1min  \acrshortmodrl+\acrshortlstm  Profit and loss, \acrshortsr   
Chen_2018_e  \acrshorths300  20152017  Social media news, Index data  1d  1d  \acrshortrnnBoost with \acrshortlda  Accuracy, \acrshortmae, \acrshortmape, \acrshortrmse  Python, Scikitlearn 
Besides, different \glscnn implementations with various data (technical indicators, news, index data) were used in the literature. In Dingli_2017, \glscnn and index data, technical indicators were used for the \glssp500, \glsdow30, \glsnasdaq100 indices and Commodity, Forex, Bitcoin prices. In Ding_2015, \glscnn model with news from Reuters and Bloomberg were used for the prediction of \glssp500 Index and 15 stocks’ prices in \glssp500. In Lee_2017_b, \glscnn + \glslstm and technical indicators, index data, news were used for the forecasting of \glstwse index and 4 stocks’ prices in \glstwse.
In addition, there were some novel methods proposed for the index forecasting. The authors of Rout_2017 used \glsrnn models, \glsrceflann and \glsflann, with their weights optimized using various \glsea like \glspso, HMRPSO and \glspso for time series forecasting. The authors of Chen_2018_e used social media news to predict the index price and index direction with \glsrnnBoost with \glslda features.
4.3 Commodity Price Forecasting
There were a number of studies particularly focused on the price prediction of any given commodity, such as gold, silver, oil, copper, etc. With increasing number of commodities that are available for public trading through online stock exchanges, interest in this topic will likely grow in the following years.
In the literature, there were different methods that were used for commodity price forecasting. \glsdnn, \glsrnn, \glsfddr, \glscnn were the most used models to predict the commodity prices. Table 6 provides the details about the commodity price forecasting studies with \glsdl.
In Dingli_2017, the authors used \glscnn for predicting the next week and next month price directional movement. Meanwhile, \glsrnn and \glslstm models were used in some of the commodity forecasting studies. In Dixon_2016, \glsdnn was used for Commodity forecasting. In Widegren_2017, different datasets (Commodity, forex, index) were used as datasets. \glsdnn and \glsrnn were used to predict the prices of the time series data. Technical indicators were used as the feature set which consist of \glsrsi, \glswilliamr, \glscci, \glspposc, momentum, \glsema. In S_nchez_Lasheras_2015, the authors used Elman \glsrnn to predict COMEX copper spot price (through \glsnymex) from daily close prices.
Hybrid and novel models were adapted in some studies. In Zhao_2017, \glsfnn and \glssdae deep models were compared against \glssvr, \glsrw and \glsmrs models for WTI oil price forecasting. As performance criteria, accuracy, \glsmape, \glsrmse were used. In Chen_2017_d, authors tried to predict WTI crude oil prices using several models including combinations of \glsdbn, \glslstm, \glsarma and \glsrw. \glsmse was used as the performance criteria. In Deng_2017, the authors used \glsfddr for stock price prediction and trading signal generation. They combined \glsdnn and \glsrl. Profit, return, SR, profitloss curves were used as the performance criteria.
Art.  Data Set  Period  Feature Set  Lag  Horizon  Method  Performance Criteria  Env. 

Dingli_2017  \acrshortsp500, \acrshortdow30, \acrshortnasdaq100, Commodity, Forex, Bitcoin  20032016  Price data, Technical indicators    1w, 1m  \acrshortcnn  Accuracy  Tensorflow 
Dixon_2016  Commodity, FX future, \acrshortetf  19912014  Price Data  100*5min  5min  \acrshortdnn  \acrshortsr, capability ratio, return  C++, Python 
Widegren_2017  \acrshortftse100, \acrshortomx30, \acrshortsp500, Commodity, Forex  19932017  Technical indicators  60d  1d  \acrshortdnn, \acrshortrnn  Accuracy, pvalue   
S_nchez_Lasheras_2015  Copper prices from \acrshortnymex  20022014  Price data      Elman \acrshortrnn  \acrshortrmse  R 
Zhao_2017  \acrshortwti crude oil price  19862016  Price data  1m  1m  \acrshortsdae, Bootstrap aggregation  Accuracy, \acrshortmape, \acrshortrmse  Matlab 
Chen_2017_d  \acrshortwti Crude Oil Prices  20072017  Price data      \acrshortarma + \acrshortdbn, \acrshortrw + \acrshortlstm  \acrshortmse  Python, Keras, Tensorflow 
Deng_2017  300 stocks from \acrshortszse, Commodity  20142015  Price data      \acrshortfddr, \acrshortdnn + \acrshortrl  Profit, return, \acrshortsr, profitloss curves  Keras 
4.4 Volatility Forecasting
Volatility is directly related with the price variations in a given time period and is mostly used for risk assesment and asset pricing. Some researchers implemented models for accurately forecasting the underlying volatility of any given asset.
In the literature, there were different methods that were used for volatility forecasting. \glslstm, \glsrnn, \glscnn, MM, \glsgarch models were shown as some of these methods. Table 7 summarizes the studies that were focused on volatility forecasting. In Table 7, different methods/models are also represented as three subgroups: \glscnn model; \glsrnn and \glslstm models; hybrid and novel models.
cnn model was used in one volatility forecasting study based on \glshft data Doering_2017.
Meanwhile, \glsrnn and \glslstm models were used in some of the researches. In Tino_2001, the authors used financial time series data to predict volatility changes with Markov Models and Elman \glsrnn for profitable straddle options trading. The authors of Xiong_2015 used the price data and different types of Google Domestic trends with \glslstm. The authors of Zhou_2018_b used \glscsi300, 28 words of the daily search volume based on Baidu as the dataset with \glslstm to predict the index volatility. The authors of Kim_2018 developed several \glslstm models integrated with \glsgarch for the prediction of volatility.
Hybrid and novel approaches were also adapted in some of the researches. In Nikolaev_2013, \glsrmdngarch model was proposed. In addition, several models including traditional forecasting models and \glsdl models were compared for the estimation of volatility. The authors of Psaradellis_2016 proposed a novel method that is called \glshargasvr for volatility index forecasting.
Art.  Data Set  Period  Feature Set  Lag  Horizon  Method  Performance Criteria  Env. 

Doering_2017  London Stock Exchange  20072008  Limit order book state, trades, buy/sell orders, order deletions      \acrshortcnn  Accuracy, kappa  Caffe 
Tino_2001  \acrshortdax, \acrshortftse100, call/put options  19911998  Price data  *  *  \acrshortmm, \acrshortrnn  Ewameasure, iv, daily profits’ mean and std   
Xiong_2015  \acrshortsp500  20042015  Price data, 25 Google Domestic trend dimensions    1d  \acrshortlstm  \acrshortmape, \acrshortrmse   
Zhou_2018_b  \acrshortcsi 300, 28 words of the daily search volume based on Baidu  20062017  Price data and text  5d  5d  \acrshortlstm  \acrshortmse, \acrshortmape  Python, Keras 
Kim_2018  \acrshortkospi200, Korea Treasury Bond interest rate, AAgrade corporate bond interest rate, gold, crude oil  20012011  Price data  22d  1d  \acrshortlstm + \acrshortgarch  \acrshortmae, \acrshortmse, \acrshorthmae, \acrshorthmse   
Nikolaev_2013  DEM/GBP exchange rate    Returns      \acrshortrmdngarch  \acrshortnmse, \acrshortnmae, \acrshorthr, \acrshortwhr   
Psaradellis_2016  \acrshortvix, \acrshortvxn, \acrshortvxd  20022014  First five autoregressive lags  5d  1d, 22d  \acrshorthargasvr  \acrshortmae, \acrshortrmse   
4.5 Bond Price Forecasting
Some financial experts follow the changes in the bond prices to analyze the state of the economy, claiming bond prices represent the health of the economy better than the stock market Harvey_1989. Historically, long term rates are higher than the short term rates under normal economic expansion times, whereas just before recessions short term rates pass the long term rates, i.e. the inverted yield curve. Hence, accurate bond price prediction is very useful. However, \glsdl implementations for bond price prediction is very scarce. In one study bianchi_2018, excess bond return was predicted using several \glsml models including \glsrf, \glsae and \glspca network and a 234layer \glsdfnn. 4 layer \glsnn outperformed the other models.
4.6 Forex Price Forecasting
Foreign exchange market has the highest volume among all existing financial markets in the world. It is open 24/7 and trillions of dollars worth of foreign exhange transactions happen in a single day. According to the Bank for International Settlements, foreignexchange trading had a volume of more than 5 trillion USD a day Venketas_2019. In addition, there are a large number of online forex trading platforms that provide leveraged transaction opportunities to their subscribers. As a result, there is a huge interest for profitable trading strategies by traders. Hence, there were a number of forex forecasting and trading studies that were based on \glsdl models. Since most of the global financial transactions were based on US Dollar, almost all forex prediction research papers include USD in their analyses. However, depending on regional differences and intended research focus, various models were developed accordingly.
In the literature, there were different methods that were used for forex price forecasting. \glsrnn, \glslstm, \glscnn, \glsdbn, \glsdnn, \glsae, \glsmlp methods were shown as some of these methods. Table 8 provides details about these implementations. In Table 8, different methods/models are listed as four subgroups: \glscdbn, \glsdbn, \glsdbn+\glsrbm, and \glsae models; \glsdnn, \glsrnn, \glspsn, and \glslstm models; \glscnn models; hybrid models.
cdbn, \glsdbn, \glsdbn+\glsrbm, and \glsae models were used in some of the studies. In Zhang_2014, Fuzzy information granulation integrated with \glscdbn was applied for predicting EUR/USD and GBU/USD exchange rates. They extended \glsdbn with \glscrbm to improve the performance. In Chao_2011, weekly GBP/USD and INR/USD prices were predicted, whereas in Zheng_2017, CNY/USD and INR/USD was the main focus. In both cases, \glsdbn was compared with \glsffnn. Similarly, the authors in Shen_2015 implemented several different \glsdbn networks to predict weekly GBP/USD, BRL/USD and INR/USD exchange rate returns. The researchers in Shen_2016 combined Stacked \glsae and \glssvr for predicting 28 normalized currency pairs using the time series data of (USD, GBP, EUR, JPY, AUD, CAD, CHF).
dnn, \glsrnn, \glspsn, and \glslstm models were preferred in some of the researches. In Dixon_2016, multiple \glsdmlp models were developed for predicting AD and BP futures using 5minute data in a 130 day period. The authors of Sermpinis_2012_a used \glsmlp, \glsrnn, \glsgp and other \glsml techniques along with traditional regression methods for also predicting EUR/USD time series. They also integrated Kalman filter, LASSO operator and other models to further improve the results in Sermpinis_2012. They further extended their analyses by including \glspsn and providing comparisons along with traditional forecasters like \glsarima, RW and STAR Sermpinis_2014. To improve the performance they also integrated hybrid timevarying volatility leverage. In SUN_2009, the authors implemented RMB exchange rate forecasting against JPY, HKB, EUR and USD by comparing \glsrw, \glsrnn and \glsffnn performances. In Maknickien__2013, the authors predicted various Forex time series and created portfolios consisted of these investments. Each network used \glslstm (\glsrnn EVOLINO) and different risk appetites for users have been tested. The authors of Maknickiene_2014 also used EVOLINO RNN + orthogonal input data for predicting USD/JPY and XAU/USD prices for different periods.
Different \glscnn models were used in some of the studies. In persio_2016, EUR/USD was once again forecasted using multiple \glsdl models including \glsmlp, \glscnn, \glsrnn and Wavelet+\glscnn. The authors of Korczak_2017 implemented forex trading (GBP/PLN) using several different input parameters on a multiagent based trading environment. One of the agents was using \glsae+\glscnn as the prediction model and outperformed all other models.
Hybrid models were also adapted in some of the researches. The authors of Bildirici_2010 developed several (TARVECRHE) models for predicting monthly returns for TRY/USD and compared model performances. In Nikolaev_2013, the authors compared several models including traditional forecasting models and \glsdl models for DEM/GBP prediction. The authors in Parida_2016 predicted AUD, CHF, MAX and BRL against USD currency time series data using LRNFIS and compared it with different models. Meanwhile, instead of using LMS based error minimization during the learning, they used \glsfhso.
Art.  Data Set  Period  Feature Set  Lag  Horizon  Method  Performance Criteria  Env. 

Zhang_2014  EUR/USD, GBP/USD  20092012  Price data  *  1d  \acrshortcdbnfg  Profit   
Chao_2011  GBP/USD, INR/USD  19762003  Price data  10w  1w  \acrshortdbn  \acrshortrmse, \acrshortmae, \acrshortmape, \acrshortda, \acrshortpcc   
Zheng_2017  CNY/USD,INR/USD  19972016  Price data    1w  \acrshortdbn  \acrshortmape, Rsquared   
Shen_2015  GBP/USD, BRL/USD, INR/USD  19762003  Price data  10w  1w  \acrshortdbn + \acrshortrbm  \acrshortrmse, \acrshortmae, \acrshortmape, accuracy, \acrshortpcc   
Shen_2016  Combination of USD, GBP, EUR, JPY, AUD, CAD, CHF  20092016  Price data      Stacked \acrshortae + \acrshortsvr  \acrshortmae, \acrshortmse, \acrshortrmse  Matlab 
Dixon_2016  Commodity, FX future, \acrshortetf  19912014  Price Data  100*5min  5min  \acrshortdnn  \acrshortsr, capability ratio, return  C++, Python 
Widegren_2017  \acrshortftse100, \acrshortomx30, \acrshortsp500, Commodity, Forex  19932017  Technical indicators  60d  1d  \acrshortdnn, \acrshortrnn  Accuracy, pvalue   
Sermpinis_2012_a  EUR/USD  20012010  Close data  11d  1d  \acrshortrnn and more  \acrshortmae, \acrshortmape, \acrshortrmse, \acrshorttheilu   
Sermpinis_2012  EUR/USD  20022010  Price data  13d  1d  \acrshortrnn, \acrshortmlp, \acrshortpsn  \acrshortmae, \acrshortmape, \acrshortrmse, \acrshorttheilu   
Sermpinis_2014  EUR/USD, EUR/GBP, EUR/JPY, EUR/CHF  19992012  Price data  12d  1d  \acrshortrnn, \acrshortmlp, \acrshortpsn  \acrshortmae, \acrshortmape, \acrshortrmse, \acrshorttheilu   
SUN_2009  RMB against USD, EUR, JPY, HKD  20062008  Price data  10d  1d  \acrshortrnn, \acrshortann  \acrshortrmse, \acrshortmae, \acrshortmse   
Maknickien__2013  EUR/USD, EUR/JPY, USD/JPY, EUR/CHF, XAU/USD, XAG/USD, QM, QG  20112012  Price data      Evolino \acrshortrnn  Correlation between predicted, real values   
Maknickiene_2014  USD/JPY  20092010  Price data, Gold    5d  EVOLINO \acrshortrnn + orthogonal input data  \acrshortrmse   
persio_2016  \acrshortsp500, EUR/USD  19502016  Price data  30d, 30d*min  1d, 1min  Wavelet+\acrshortcnn  Accuracy, logloss  Keras 
Korczak_2017  USD/GBP, \acrshortsp500, \acrshortftse100, oil, gold  2016  Price data    5min  \acrshortae + \acrshortcnn  \acrshortsr, % volatility, avg return/trans, rate of return  H2O 
Bildirici_2010  \acrshortise100, TRY/USD  19872008  Price data    2d, 4d, 8d, 12d, 18d  \acrshorttar\acrshortvec\acrshortmlp, \acrshorttar\acrshortvec\acrshortrbf, \acrshorttar\acrshortvec\acrshortrhe  \acrshortrmse   
Nikolaev_2013  DEM/GBP exchange rate    Returns      \acrshortrmdn\acrshortgarch  \acrshortnmse, \acrshortnmae, \acrshorthr, \acrshortwhr   
Parida_2016  \acrshortsp500, \acrshortnikkei225, USD Exchanges  20112015  Price data    1d, 5d, 7d, 10d  \acrshortlrnfis with \acrshortfhso  \acrshortrmse, \acrshortmape, \acrshortmae   
4.7 Cryptocurrency Price Forecasting
Since cryptocurrencies became a hot topic for discussion in the finance world, lots of studies and implementations started emerging in recent years. Most of the cryptocurrency studies were focused on price forecasting.
The rise of bitcoin from 1000 USD in January 2017 to 20,000 USD in January 2018 has attracted a lot of attention not only from the financial world, but also from ordinary people on the street. Recently, some papers have been published for price prediction and trading strategy development for bitcoin and other cryptocurrencies. Given the attention that the underlying technology has attracted, there is a great chance that some new studies will start appearing in the near future.
In the literature, \glsdnn, \glslstm, \glsgru, \glsrnn, Classical methods (\glsarma, \glsarima, \glsarch, \glsgarch, etc) were used for cryptocurrency price forecasting. Table 9 tabulates the studies that utilize these methods. In Lopes_2018_thesis, the author combined the opinion market and price prediction for cryptocurrency trading. Text mining combined with 2 models \glscnn and \glslstm were used to extract the opinion. Bitcoin, Litecoin, StockTwits were used as the dataset. \glsochlv of prices, technical indicators, and sentiment analysis were used as the feature set. In McNally_2018, the authors compared Bayesian optimized \glsrnn, \glslstm and \glsarima to predict bitcoin price direction. Sensitivity, specificity, precision, accuracy, RMSE were used as the performance metrics.
Art.  Data Set  Period  Feature Set  Lag  Horizon  Method  Performance Criteria  Env. 

Lopes_2018_thesis  Bitcoin, Litecoin, StockTwits  20152018  \acrshortochlv, technical indicators, sentiment analysis    30min, 4h, 1d  \acrshortcnn, \acrshortlstm, State Frequency Model  \acrshortmse  Keras, Tensorflow 
McNally_2018  Bitcoin  20132016  Price data  100d  30d  Bayesian optimized \acrshortrnn, \acrshortlstm  Sensitivity, specificity, precision, accuracy, \acrshortrmse  Keras, Python, Hyperas 
4.8 Trend Forecasting
Even though trend forecasting and price forecasting share the same input characteristics, some researchers prefer to predict the price direction of the asset instead of the actual price. This alters the nature of the problem from regression to classification and the corresponding performance metrics also change. However, it is worth to mention that these two approaches are not really different, the difference is in the interpretation of the output.
In the literature, there were different methods for trend forecasting. In this survey, we grouped the articles according to their feature set such as studies using only the raw time series data (only price data, \glsochlv); studies using technical indicators & price data & fundamental data at the same time; studies using text mining techniques and studies using other various data. Table 10 tabulates the trend forecasting using only the raw time series data.
Art.  Data Set  Period  Feature Set  Lag  Horizon  Method  Performance Criteria  Env. 

Das_2018_a  \acrshortsp500 stock indexes  19632016  Price data  30d  1d  \acrshortnn  Accuracy, precision, recall, F1score, \acrshortauroc  R, H2o, Python, Tensorflow 
Navon_2017  \acrshortspy \acrshortetf, 10 stocks from \acrshortsp500  20142016  Price data  60min  30min  \acrshortfnn  Cumulative gain  MatConvNet, Matlab 
Yang_2017  Shanghai composite index and \acrshortszse  19902016  \acrshortochlv  20d  1d  Ensembles of \acrshortann  Accuracy   
Saad_1998  10 stocks from \acrshortsp500    Price data  \acrshorttdnn, \acrshortrnn, \acrshortpnn  Missed opportunities, false alarms ratio    
persio_2017  GOOGL stock daily price data  20122016  Time window of 30 days of \acrshortochlv  22d, 50d, 70d  *  \acrshortlstm, \acrshortgru, \acrshortrnn  Accuracy, Logloss  Python, Keras 
Hansson_2017  \acrshortsp500, Bovespa50, \acrshortomx30  20092017  Autoregressive part of the price data  30d  1..15d  \acrshortlstm  \acrshortmse, Accuracy  Tensorflow, Keras, R 
Shen_2018  \acrshorthsi, \acrshortdax, \acrshortsp500  19912017  Price data    1d  \acrshortgru, \acrshortgru\acrshortsvm  Daily return %  Python, Tensorflow 
Chen_2016_d  Taiwan Stock Index Futures  20012015  \acrshortochlv  240d  1..2d  \acrshortcnn with \acrshortgaf, \acrshortmam, Candlestick  Accuracy  Matlab 
Sezer_2019  \acrshortetf and Dow30  19972007  Price data  \acrshortcnn with feature imaging  Annualized return  Keras, Tensorflow  
Zhou_2019  \acrshortssec, \acrshortnasdaq, \acrshortsp500  20072016  Price data  20min  7min  \acrshortemd2fnn  \acrshortmae, \acrshortrmse, \acrshortmape   
Ausmees_2017  23 cap stocks from the \acrshortomx30 index in Nasdaq Stockholm  20002017  Price data and returns  30d  *  \acrshortdbn  \acrshortmae  Python, Theano 
Different methods and models were used for trend forecasting. In Table 10, these are divided into three subgroups: \glsann, \glsdnn, and \glsffnn models; \glslstm, \glsrnn, and Probabilistic \glsnn models; novel methods. \glsann, \glsdnn, \glsdfnn, and \glsffnn methods were used in some of the studies. In Das_2018_a, \glsnn with the price data were used for prediction of the trend of \glssp500 stock indices. The authors of Navon_2017 combined deep \glsfnn with a selective trading strategy unit to predict the next price. The authors of Yang_2017 created an ensemble network of several Backpropagation and \glsadam models for trend prediction.
In the literature, \glslstm, \glsrnn, \glspnn methods with the raw time series data were also used for trend forecasting. In Saad_1998, the authors compared \glstdnn, \glsrnn and \glspnn for trend detection using 10 stocks from \glssp500. The authors of persio_2017 compared 3 different \glsrnn models (basic \glsrnn, \glslstm, \glsgru) to predict the movement of Google stock price. The authors of Hansson_2017 used \glslstm (and other classical forecasting techniques) to predict the trend of the stocks prices. In Shen_2018, \glsgru and \glsgru\glssvm models were used for the trend of \glshsi, \glsdax, \glssp500 indices.
There were also novel methods that used only the raw time series price/index data in the literature. The author of Chen_2016_d proposed a method that used \glscnn with \glsgaf, \glsmam, Candlestick with converted image data. In Sezer_2019, a novel method, \glscnn with feature imaging was proposed for the prediction of the buy/sell/hold positions of the \glspletf’ prices and Dow30 stocks’ prices. The authors of Zhou_2019 proposed a method that uses \glsemd2fnn models to forecast the stock close prices’ direction accurately. In Ausmees_2017, \glsdbn with the price data were used for the prediction of the trend of 23 large cap stocks from the \glsomx30 index.
Art.  Data Set  Period  Feature Set  Lag  Horizon  Method  Performance Criteria  Env. 

Raza_2017  \acrshortkse100 index    Price data, several fundamental data      \acrshortann, \acrshortslp, \acrshortmlp, \acrshortrbf, \acrshortdbn, \acrshortsvm  Accuracy   
Sezer_2017  Stocks in Dow30  19972017  \acrshortrsi (Technical Indicators)  200d  1d  \acrshortdmlp with genetic algorithm  Annualized return  Spark MLlib, Java 
Liang_2017  \acrshortsse Composite Index, \acrshortftse100, PingAnBank  19992016  Technical indicators, \acrshortochlv price  24d  1d  \acrshortrbm  Accuracy   
Troiano_2018  Dow30 stocks  20122016  Price data, several technical indicators  40d    \acrshortlstm  Accuracy  Python, Keras, Tensorflow, \acrshorttalib 
Nelson_2017  Stock price from \acrshortibovespa index  20082015  Technical indicators, \acrshortochlv of price    15min  \acrshortlstm  Accuracy, Precision, Recall, F1score, % return, Maximum drawdown  Keras 
song_2018  20 stocks from \acrshortnasdaq and \acrshortnyse  20102017  Price data, technical indicators  5d  1d  \acrshortlstm, \acrshortgru, \acrshortsvm, \acrshortxgboost  Accuracy  Keras, Tensorflow, Python 
Gudelek_2017  17 \acrshortetf  20002016  Price data, technical indicators  28d  1d  \acrshortcnn  Accuracy, \acrshortmse, Profit, \acrshortauroc  Keras, Tensorflow 
Sezer_2018  Stocks in Dow30 and 9 Top Volume \acrshortetf  19972017  Price data, technical indicators  20d  1d  \acrshortcnn with feature imaging  Recall, precision, F1score, annualized return  Python, Keras, Tensorflow, Java 
Gunduz_2017  Borsa Istanbul 100 Stocks  20112015  75 technical indicators, \acrshortochlv of price    1h  \acrshortcnn  Accuracy  Keras 
In the literature, some of the studies used technical indicators & price data & fundamental data at the same time. Table 11 tabulates the trend forecasting papers using technical indicators, price data, fundamental data. In addition, these studies are clustered into three subgroups: \glsann, \glsmlp, \glsdbn, and \glsrbm models; \glslstm and \glsgru models; novel methods. \glsann, \glsmlp, \glsdbn, and \glsrbm methods were used with technical indicators, price data and fundamental data in some of the studies. In Raza_2017, several classical, \glsml models and \glsdbn were compared for trend forecasting. In Sezer_2017, technical analysis indicator’s (\glsrsi) buy & sell limits were optimized with \glsga which was used for buysell signals. After optimization, \glsdmlp was also used for function approximation. The authors of Liang_2017 used technical analysis parameters, \glsochlv of prices and \glsrbm for stock trend prediction.
Besides, \glslstm and \glsgru methods with technical indicators & price data & fundamental data were also used in some of the papers. In Troiano_2018, the crossover and \glsmacd signals were used to predict the trend of the Dow 30 stocks prices. The authors of Nelson_2017 used \glslstm for stock price movement estimation. The author of song_2018 used stock prices, technical analysis features and four different \glsml Models (\glslstm, \glsgru, \glssvm and \glsxgboost) to predict the trend of the stocks prices.
In addition, there were also novel and new methods that used \glscnn with the price data and technical indicators. The authors of Gudelek_2017 converted the time series of price data to 2dimensional images using technical analysis and classified them with deep \glscnn. Similarly, the authors of Sezer_2018 also proposed a novel technique that converted financial time series data that consisted of technical analysis indicator outputs to 2dimensional images and classified these images using \glscnn to determine the trading signals. The authors of Gunduz_2017 proposed a method that used \glscnn with correlated features combined together to predict the trend of the stocks prices.
Besides, there were also studies that used text mining techniques in the literature. Table 12 tabulates the trend forecasting papers using text mining techniques. Different methods/models are represented within four subgroups in that table: \glsdnn, \glsdmlp, and \glscnn with text mining models; \glsgru model; \glslstm, \glscnn, and \glslstm+\glscnn models; novel methods. In the first group of studies, \glsdnn, \glsdmlp, \glscnn with text mining were used for trend forecasting. In Huang_2016, the authors used different models that included \glshmm, \glsdmlp and \glscnn using Twitter moods to predict the next days’ move. In Peng_2016, the authors used the combination of text mining and word embeddings to extract information from financial news and \glsdnn model for prediction of the stock trends.
Moreover, \glsgru methods with text mining techniques were also used for trend forecasting. The authors of Huynh_2017 used financial news from Reuters, Bloomberg and stock prices data and \glsbigru model to predict the stock movements in the future. The authors of Dang_2018 used Stock2Vec and \glstgru models to generate input data from financial news and stock prices. Then, they used the sign difference between the previous close and next open for the classification of the stock prices. The results were better than the stateoftheart models.
lstm, \glscnn and \glslstm+\glscnn models were also used for trend forecasting. The authors of Verma_2017 combined news data with financial data to classify the stock price movement and assessed them with certain factors. They used \glslstm model as the \glsnn architecture. The authors of Pinheiro_2017 proposed a novel method that used characterbased neural language model using financial news and \glslstm for trend prediction. In Prosky_2017, sentiment/mood prediction and price prediction based on sentiment, price prediction with text mining and \glsdl models (\glslstm, \glsnn, \glscnn) were used for trend forecasting. The authors of Liu_2018 proposed a method that used two separate \glslstm networks to construct an ensemble network. One of the \glslstm models was used for word embeddings with word2Vec to create a matrix information as input to \glscnn. The other one was used for price prediction using technical analysis features and stock prices.
In the literature, there were also novel and different methods to predict the trend of the time series data. In Yoshihara_2014, the authors proposed a novel method that uses a combination of \glsrbm, \glsdbn and word embedding to create word vectors for \glsrnn\glsrbm\glsdbn network to predict the trend of stock prices. The authors of Shi_2018 proposed a novel method (called DeepClue) that visually interpretted textbased \glsdl models in predicting stock price movements. In their proposed method, financial news, charts and social media tweets were used together to predict the stock price movement. The authors of Zhang_2018 proposed a method that performed information fusion from several news and social media sources to predict the trend of the stocks. The authors of Hu_2018 proposed a novel method that used text mining techniques and Hybrid Attention Networks based on financial news for the forecast of the trend of stocks. The authors of Wang_2018_a combined technical analysis and sentiment analysis of social media (related financial topics) and created \glsdrse method for classification. The authors of MATSUBARA_2018 proposed a method that used \glsdgm with news articles using Paragraph Vector algorithm to create the input vector for the prediction of the trend of stocks. The authors of Li_2018 implemented intraday stock price direction classification using financial news and stocks prices.
Art.  Data Set  Period  Feature Set  Lag  Horizon  Method  Performance Criteria  Env. 

Huang_2016  \acrshortsp500, \acrshortnyse Composite, \acrshortdjia, \acrshortnasdaq Composite  20092011  Twitter moods, index data  7d  1d  \acrshortdnn, \acrshortcnn  Error rate  Keras, Theano 
Peng_2016  News from Reuters and Bloomberg, Historical stock security data  20062013  News, price data  5d  1d  \acrshortdnn  Accuracy   
Huynh_2017  News from Reuters, Bloomberg  20062013  Financial news, price data    1d, 2d, 5d, 7d  \acrshortbigru  Accuracy  Python, Keras 
Dang_2018  News about Apple, Airbus, Amazon from Reuters, Bloomberg, \acrshortsp500 stock prices  20062013  Price data, news, technical indicators      Twostream \acrshortgru, stock2vec  Accuracy, precision, \acrshortauroc  Keras, Python 
Verma_2017  \acrshortnifty50 Index, \acrshortnifty Bank/Auto/IT/Energy Index, News  20132017  Index data, news  1d, 2d, 5d  1d  \acrshortlstm  \acrshortmcc, Accuracy   
Pinheiro_2017  News from Reuters, Bloomberg, stock price/index data from \acrshortsp500  20062013  News and sentences    1h, 1d  \acrshortlstm  Accuracy   
Prosky_2017  30 \acrshortdjia stocks, \acrshortsp500, \acrshortdji, news from Reuters  20022016  Price data and features from news articles  1m  1d  \acrshortlstm, \acrshortnn, \acrshortcnn and word2vec  Accuracy  VADER 
Liu_2018  APPL from \acrshortsp500 and news from Reuters  20112017  News, \acrshortochlv, Technical indicators    1d  \acrshortcnn + \acrshortlstm, \acrshortcnn+\acrshortsvm  Accuracy, F1score  Tensorflow 
Yoshihara_2014  News, Nikkei Stock Average and 10Nikkei companies  19992008  News, \acrshortmacd    1d  \acrshortrnn, \acrshortrbm+\acrshortdbn  Accuracy, Pvalue   
Shi_2018  News from Reuters and Bloomberg for \acrshortsp500 stocks  20062015  Financial news, price data  1d  1d  DeepClue  Accuracy  Dynet software 
Zhang_2018  Price data, index data, news, social media data  2015  Price data, news from articles and social media  1d  1d  Coupled matrix and tensor  Accuracy, \acrshortmcc  Jieba 
Hu_2018  News and Chinese stock data  20142017  Selected words in a news  10d  1d  \acrshorthan  Accuracy, Annual return   
Wang_2018_a  Sina Weibo, Stock market records  20122015  Technical indicators, sentences      DRSE  F1score, precision, recall, accuracy, \acrshortauroc  Python 
MATSUBARA_2018  Nikkei225, \acrshortsp500, news from Reuters and Bloomberg  20012013  Price data and news  1d  1d  \acrshortdgm  Accuracy, \acrshortmcc, %profit   
Li_2018  News, stock prices from Hong Kong Stock Exchange  2001  Price data and \acrshorttfidf from news  60min  (1..6)*5min  \acrshortelm, \acrshortdlr, \acrshortpca, \acrshortbelm, \acrshortkelm, \acrshortnn  Accuracy  Matlab 
Moreover, there were also studies that used different data variations in the literature. Table 13 tabulates the trend forecasting papers using these various data clustered into two subgroups: \glslstm, \glsrnn, \glsgru models; \glscnn model.
lstm, \glsrnn, \glsgru methods with various data representations were used in some trend forecasting papers. In Tsantekidis_2017, the authors used the limit order book time series data and \glslstm method for trend prediction. The authors of Sirignano_2018 proposed a novel method that used limit order book flow and history information for the determination of the stock movements using \glslstm. The results of the proposed method were remarkably stationary. The authors of Chen_2018_e used social media news, \glslda features and \glsrnn model to predict the trend of the index price. The authors of Buczkowski_2017 proposed a novel method that used expert recommendations (Buy, Hold or Sell), emsemble of \glsgru and \glslstm to predict the trend of the stocks prices.
cnn models with different data representations were also used for trend prediction. In Tsantekidis_2017_a, the authors used the last 100 entries from the limit order book to create images for the stock price prediction using \glscnn. Using the limit order book data to create 2D matrixlike format with \glscnn for predicting directional movement was innovative. In Doering_2017, \glshft microstructures forecasting with \glscnn was implemented.
Art.  Data Set  Period  Feature Set  Lag  Horizon  Method  Performance Criteria  Env. 

Tsantekidis_2017  Nasdaq Nordic (Kesko Oyj, Outokumpu Oyj, Sampo, Rautaruukki, Wartsila Oyj)  2010  Price and volume data in \acrshortlob  100s  10s, 20s, 50s  \acrshortlstm  Precision, Recall, F1score, Cohen’s k   
Sirignano_2018  Highfrequency record of all orders  20142017  Price data, record of all orders, transactions  2h    \acrshortlstm  Accuracy   
Chen_2018_e  Chinese, The ShanghaiShenzhen 300 Stock Index (\acrshorths300  20152017  Social media news (Sina Weibo), price data  1d  1d  \acrshortrnnBoost with \acrshortlda  Accuracy, \acrshortmae, \acrshortmape, \acrshortrmse  Python, Scikit learn 
Buczkowski_2017  ISMIS 2017 Data Mining Competition dataset    Expert identifier, class predicted by expert      \acrshortlstm + \acrshortgru + \acrshortfcnn  Accuracy   
Tsantekidis_2017_a  Nasdaq Nordic (Kesko Oyj, Outokumpu Oyj, Sampo, Rautaruukki, Wartsila Oyj)  2010  Price, Volume data, 10 orders of the \acrshortlob      \acrshortcnn  Precision, Recall, F1score, Cohen’s k  Theano, Scikit learn, Python 
Doering_2017  London Stock Exchange  20072008  Limit order book state, trades, buy/sell orders, order deletions      \acrshortcnn  Accuracy, kappa  Caffe 
5 Current Snaphot of The Field
After reviewing through all the research papers specifically targeted for financial time series forecasting implementations using \glsdl models, we are now ready to provide some overall statistics about the current state of the studies. The number of papers that we were able to locate to be included in our survey was 140. We categorized the papers according to their forecasted asset type. Furthermore, we also analyzed the studies through their \glsdl model choices, frameworks for the development environment, data sets, comparable benchmarks, and some other differentiating criteria like feature sets, number of citations, etc. which we were not able to include in the paper due to space constraints. We will now summarize our notable observations to provide important highlights for the interested researchers within the field.
Figure 5 presents the various asset types that the researchers decided to develop their corresponding forecasting models for. As expected, stock marketrelated prediction studies dominate the field. Stock price forecasting, trend forecasting and index forecasting were the top three picks for the financial time series forecasting research. So far, 46 papers were published for stock price forecasting, 38 for trend forecasting and 33 for index forecasting, respectively. These studies constitute more than 70% of all studies indicating high interest. Following those include 19 papers for forex prediction and 7 papers for volatility forecasting. Meanwhile cryptocurrency forecasting has started attracting researchers, however, there were just 3 papers published yet, but this number is expected to increase in coming years Fischer_2019. Figure 6 highlights the rate of publication counts for various implementation areas throughout the years. Meanwhile Figure 7 provides more details about the choice of DL models over various implementation areas.
Figure 8 illustrates the accelerating appetite in the last 3 years by researchers for developing \glsdl models for the financial time series implementations. Meanwhile, as Figure 9 indicates, most of the studies were published in journals (57 of them) and conferences (49 papers) even though a considerable amount of arXiv papers (11) and graduate theses (6) also exist.
One of the most important questions for a researcher is where he/she can publish their research findings. During our review of the papers, we also carefully investigated where each paper was published. We tabulated our results for the top journals for financial time series forecasting in Fig 10. According to these results, the journals with the most published papers include Expert Systems with Applications, Neurocomputing, Applied Soft Computing, The Journal of Supercomputing, Decision Support Systems, Knowledgebased Systems, European Journal of Operational Research and IEEE Access. The interested researchers should also consider the trend within the last 3 years, as tendencies can be slightly varying depending on the particular implementation areas.
Carefully analyzing Figure 11 clearly validates the dominance of \glsrnn based models (65 papers) among all others for \glsdl model choices, followed by \glsdmlp (23 papers) and \glscnn (20 papers). The innercircle represents all years considered, meanwhile the outer circle just provides the studies within the last 3 years. We should note that \glsrnn is a general model with several versions including \glslstm, \glsgru, etc. Within \glsrnn, the researchers mostly prefer \glslstm due to its relative easiness of model development phase, however, other types of \glsrnn are also common. Figure 12 provides a snapshot of the \glsrnn model distribution. As mentioned above, \glslstm had the highest interest among all with 58 papers, while Vanilla \glsrnn and \glsgru had 27 and 10 papers respectively. Hence, it is clear that \glslstm was the most popular \glsdl model for financial time series forecasting or regression studies.
Meanwhile, \glsdmlp and \glscnn generally were preferred for classification problems. Since the time series data generally consists of temporal components, some data preprocessing might be required before the actual classification can occur. Hence, a lot of these implementations utilize feature extraction, selection techniques along with possible dimensionality reduction methods. A lot of researchers decided to use \glsdmlp mostly due to the fact that its shallow version \glsmlp has been used extensively before and has a proven successful track record for many different financial applications including financial time series forecasting. Consistent with our observations, \glsdmlp was also mostly preferred in the stock, index or in particular trend forecasting, since it is by definition, a classification problem with two (uptrend or downtrend) and three (uptrend, stationary or downtrend) class instances.
In addition to \glsdmlp, \glscnn was also a popular choice for classification type financial time series forecasting implementations. Most of these studies appeared within the last 3 years. As mentioned before, in order to convert the temporal timevarying sequential data into a more stationary classifiable form, some preprocessing might be necessary. Even though some 1D representations exist, the 2D implementation for \glscnn was more common, mostly inherited through image recognition applications of \glscnn from computer vision implementations. In some studies Chen_2016_d; Sezer_2019; Sezer_2017; Sezer_2018; Tsantekidis_2017_a, innovative transformations of financial time series data into an imagelike representation has been adapted and impressive performance results have been achieved. As a result, \glscnn might increase its share of interest for financial time series forecasting in the next few years.
As one final note, Figure 13 shows which frameworks and platforms the researchers and developers used while implementing their work. We tried to extract this information from the papers to the best of our effort. However, we need to keep in mind that not every publication provided their development environment. Also in most of the papers, generally, the details were not given preventing us from a more thorough comparison chart, i.e. some researchers claimed they used Python, but no further information was given, while some others mentioned the use of Keras or TensorFlow providing more details. Also, within the “Other" section the usage of Pytorch is on the rise in the last year or so, even though it is not visible from the chart. Regardless, Pythonrelated tools were the most influential technologies behind the implementations covered in this survey.
6 Discussion and Open Issues
From an application perspective, even though financial time series forecasting has a relatively narrow focus, i.e. the implementations were mainly based on price or trend prediction, depending on the underlying \glsdl model, very different and versatile models exist in literature. We need to keep in mind that, even though financial time series forecasting is a subset of timeseries studies, due to the embedded profitmaking expectations through successful prediction models, some differences exist, such that higher prediction accuracy sometimes might not reflect a profitable model. Hence, the risk and reward structure must also be taken into consideration. At this point, we will try to elaborate on our observations about these differences in various model designs and implementations.
6.1 DL Models for financial time series forecasting
According to the publication statistics, \glslstm was the preferred choice of most researchers for financial time series forecasting. \glslstm and its variations utilized the timevarying data with feedback embedded representations, resulting in higher performances for time series prediction implementations. Since most of the financial data, one way or another, included timedependent components, \glslstm was the natural choice in financial time series forecasting problems. Meanwhile, \glslstm is a special \glsdl model deriven from a more general classifier family, namely \glsrnn.
Careful analysis of Figure 11 illustrates the dominance of \glsrnn (which is highly consisted of \glslstm). As a matter of fact, more than half of the published papers for time series forecasting studies fall into the \glsrnn model category. Regardless of its problem type, price or trend prediction, the ordinal nature of the data representation forced the researchers to consider \glsrnn, \glsgru and \glslstm as viable preferences for their model choices. Hence, \glsrnn models were chosen, at least for benchmarking, in a lot of studies for performance comparison against other developed models.
Meanwhile, other models were also used for time series forecasting problems. Among those, \glsdmlp had the most interest due to the market dominance of its shallow cousin, \glsmlp and its wide acceptance and long history within \glsml society. However, there is a fundamental difference in how \glsdmlp and \glsrnn based models were used for financial time series prediction problems.
dmlp fits well for both regression and classification problems. However, in general, data order independence must be preserved for better utilizing the internal working dynamics of such networks, even though through the learning algorithm configuration, some adjustments can be performed. In most cases, either trend components of the data need to be removed from the underlying time series, or some data transformations might be needed so that the resulting data becomes stationary. Regardless, some careful preprocessing might be necessary for the \glsdmlp model to be successful. In contrast, \glsrnn based models can directly work with timevarying data, making it easier for researchers to develop \glsdl models.
As a result, most of the \glsdmlp implementations had embedded data preprocessing before the learning stage. However, this inconvenience did not prevent the researchers to use \glsdmlp and its variations during their model development process. Instead, a lot of versatile data representations were attempted in order to achieve higher overall prediction performances. A combination of fundamental and/or technical analysis parameters along with other features like financial sentiment through text mining was embedded into such models. In most of the \glsdmlp studies, the corresponding problem was treated as classification, especially in trend prediction models, whereas \glsrnn based models directly predicted the next value of the time series. Both approaches had some success in beating the underlying benchmark; hence it is not possible to claim victory of one model type over the other. However, for the general rule of thumb, researchers prefer \glsrnn based models for time series regression and \glsdmlp for trend classification (or buysell point identification)
Another model that started becoming popular recently is \glscnn. \glscnn also works better for classification problems and unlike \glsrnn based models, it is more suitable for either nontime varying or static data representations. The comments for \glsdmlp are also mostly valid for \glscnn. Furthermore, unlike \glsdmlp, \glscnn mostly requires locality within the data representation for betterperforming classification results. One particular implementation area of \glscnn is imagebased object recognition problems. In recent years, \glscnn based models dominated this field, handily outperforming all other models. Meanwhile, most financial data is timevarying and it might not be easy to implement \glscnn directly for financial applications. However, in some recent studies, various independent research groups followed an innovative transformation of 1D timevarying financial data into 2D mostly stationary imagelike data so that they could utilize the power of \glscnn through adaptive filtering and implicit dimensionality reduction. Hence, with that approach, they were able to come up with successful models.
There is also a rising trend to use deep \glsrl based financial algorithmic trading implementations; these are mostly associated with various agentbased models where different agents interact and learn from their interactions. This field even has more opportunities to offer with advancements in financial sentiment analysis through text mining to capture investor psychology; as a result, behavioral finance can benefit from these particular studies associated with \glsrl based learning models coupled with agentbased studies.
Other models including \glsdbn, \glsae and \glsrbm also were used by several researchers and superior performances were reported in some of their work; but the interested readers need to check these studies case by case to see how they were modelled both from the data representation and learning point of view.
6.2 Discussions on Selected Features
Regardless of the underlying forecasting problem, somehow the raw time series data was almost always embedded directly or indirectly within the feature vector, which is particularly valid for \glsrnnbased models. However, in most of the other model types, other features were also included. Fundamental analysis and technical analysis features were among the most favorable choices for stock/index forecasting studies.
Meanwhile, in recent years, financial text mining is particularly getting more attention, mostly for extracting the investor/trader sentiment. The streaming flow of financial news, tweets, statements, blogs allowed the researchers to build better and more versatile prediction and evaluation models integrating numerical and textual data. The general methodology involves in extracting financial sentiment analysis through text mining and combining that information with fundamental/technical analysis data to achieve better overall performance. It is logical to assume that this trend will continue with the integration of more advanced text and \glsnlp techniques.
6.3 Discussions on Forecasted Asset Types
Even though forex price forecasting is always popular among the researchers and practitioners, stock/index forecasting has always had the most interest among all asset groups. Regardless, price/trend prediction and algotrading models were mostly embedded with these prediction studies.
These days, one other hot area to financial time series forecasting research is involved with cryptocurrencies. Cryptocurrency price prediction has an increasing demand from the financial community. Since the topic is fairly new, we might see more studies and implementations coming in due to high expectations and promising rewards.
There were also a number of publications in commodity price forecasting research, in particular, the price of oil. Oil price prediction is crucial due to its tremendous effect on world economic activities and planning. Meanwhile, gold is considered a safe investment and almost every investor, at one time, considers allocating some portion of their portfolios for goldrelated investments. In times of political uncertainties, a lot of people turn to gold for protecting their savings. Even though we have not encountered a noteworthy study for gold price forecasting, due to its historical importance, there might be opportunities in this area for the years to come.
6.4 Open Issues and Future Work
Despite the general motivation for financial time series forecasting remaining fairly unchanged, the means of achieving the financial goals vary depending on the choices and tradeoff between the traditional techniques and newly developed models. Since our fundamental focus is on the application of \glsdl for financial time series studies, we will try to asses the current state of the research and extrapolate that into the future.
6.4.1 Model Choices for the Future
The dominance of \glsrnnbased models for price/trend prediction will probably not disappear anytime soon, mainly due to their easy adaptation to most asset forecasting problems. Meanwhile, some enhanced versions of the original \glslstm or \glsrnn models, generally integrated with hybrid learning systems started becoming more common. Readers need to check individual studies and assess their performances to see which one fits the best for their particular needs and domain requirements.
We have observed the increasing interest in 2D \glscnn implementations of financial forecasting problems through converting the time series into an imagelike data type. This innovative methodology seems to work quite satisfactorily and provides promising opportunities. More studies of this kind will probably continue in the near future.
Nowadays, new models are generated through older models via modifying or enhancing the existing models so that better performances can be achieved. Such topologies include \glsgan, Capsule networks, etc. They have been used in various nonfinancial studies, however, financial time series forecasting has not been investigated for those models yet. As such, there can be exciting opportunities both from research and practical point of view.
Another \glsdl model that is not investigated thoroughly is Graph \glscnn. Graphs can be used to represent portfolios, social networks of financial communities, fundamental analysis data, etc. Even though graph algorithms can directly be applied to such configurations, different graph representations can also be implemented for the time series forecasting problems. Not much has been done on this particular topic, however, through graph representations of the time series data and implementing graph analysis algorithms, or implementing \glscnn through these graphs are among the possibilities that the researchers can choose.
As a final note for the future models, we believe deep \glsrl and agentbased models offer great opportunities for the researchers. \glshft algorithms, roboadvisory systems highly depend on automated algorithmic trading systems that can decide what to buy and when to buy without any human intervention. These aforementioned models can fit very well in such challenging environments. The rise of the machines will also lead to a technological (and algorithmic) arms race between Fintech companies and quant funds to be the best in their neverending search for “achieving alpha". New research in these areas can be just what the doctor ordered.
6.4.2 Future Projections for Financial Time Series Forecasting
Most probably, for the foreseeable future, the financial time series forecasting will have a close research cooperation with the other financial application areas like algorithmic trading and portfolio management, as it was the case before. However, changes in the available data characteristics and introduction of new asset classes might not only alter the forecasting strategies of the developers, but also force the developers to look for new or alternative techniques to better adapt to these new challenging working conditions. In addition, metrics like \glscrps for evaluating probability distributions might be included for more thorough analysis.
One rising trend, not only for financial time series forecasting, but for all intelligent decision support systems, is the humancomputer interaction and \glsnlp research. Within that field, text mining and financial sentiment analysis areas are of particular importance to financial time series forecasting. Behavioral finance may benefit from the new advancements in these fields.
In order to utilize the power of text mining, researchers started developing new data representations like Stock2Vec Dang_2018 that can be useful for combining textual and numerical data for better prediction models. Furthermore, \glsnlp based ensemble models that integrate data semantics with timeseries data might increase the accuracy of the existing models.
One area that can benefit a lot from the interconnected financial markets is the automated statistical arbitrage trading model development. It has been used in forex and commodity markets before. In addition, a lot of practitioners currently seek arbitrage opportunities in the cryptocurrency markets Fischer_2019, due to the existence of the huge number of coins available on various marketplaces. Price disruptions, high volatility, bidask spread variations cause arbitrage opportunities across different platforms. Some opportunists develop software models that can track these price anomalies for the instant materialization of profits. Also, it is possible to construct pairs trading portfolios across different asset classes using appropriate models. It is possible that \glsdl models can learn (or predict) these opportunities faster and more efficient than classical rulebased systems. This will also benefit \glshft studies that are constantly looking for faster and more efficient trading algorithms and embedded systems with minimum latency. In order to achieve that, Graphics Processing Unit (GPU) or Field Programmable Gate Array (FPGA) based hardware solutions embedded with \glsdl models can be utilized. There is a lack of research accomplished on this hardware aspect of financial time series forecasting and algorithmic trading. As long as there is enough computing power available, it is worth investigating the possibilities for better algorithms, since the rewards are high.
6.5 Responses to our Initial Research Questions
We are now ready to go back to our initially stated research questions. Our question and answer pairs, through our observations, are as follows:

1.
Which DL models are used for financial time series forecasting ?
Response: \glsrnn based models (in particular \glslstm) are the most commonly used models. Meanwhile, \glscnn and \glsdmlp have been used extensively in classification type implementations (like trend classification) as long as appropriate data processing is applied to the raw data.

2.
How is the performance of \glsdl models compared with traditional machine learning counterparts ?
Response: In the majority of the studies, \glsdl models were better than \glsml. However, there were also many cases where their performances were comparable. There were even two particular studies (Dezsi_2016; Sermpinis_2014 where \glsml models performed better than \glsdl models. Meanwhile, appetite for preferrance of DL implementations over ML models is growing. Advances in computing power, availability of big data, superior performance, implicit feature learning capabilities and user friendly model development environment for DL models are among the main reasons for this migration.

3.
What is the future direction for \glsdl research for financial time series forecasting ?
Response: \glsnlp, semantics and text miningbased hybrid models ensembled with timeseries data might be more common in the near future.
7 Conclusions
Financial time series forecasting has been very popular among \glsml researchers for more than 40 years. The financial community got a new boost lately with the introduction of \glsdl implementations for financial prediction research and a lot of new publications appeared accordingly. In our survey, we wanted to review the existing studies to provide a snapshot of the current research status of \glsdl implementations for financial time series forecasting. We grouped the studies according to their intended asset class along with the preferred \glsdl model associated with the problem. Our findings indicate, even though financial forecasting has a long research history, overall interest within the \glsdl community is on the rise through utilizing new \glsdl models; hence, a lot of opportunities exist for researchers.
8 Acknowledgement
This work is supported by Scientific and Technological Research Council of Turkey (TUBITAK) grant no 215E248.