Abstract
A commonly used stochastic model for derivative and commodity market analysisis the BarndorffNielsen and Shephard (BNS) model. Though this model is veryefficient and analytically tractable, it suffers from the absence of long rangedependence and many other issues. For this paper, the analysis is restricted tocrude oil price dynamics. A simple way of improving the BNS model with theimplementation of various machine learning algorithms is proposed. This refinedBNS model is more efficient and has fewer parameters than other models whichare used in practice as improvements of the BNS model. The procedure and themodel show the application of data science for extracting a "deterministiccomponent" out of processes that are usually considered to be completelystochastic. Empirical applications validate the efficacy of the proposed modelfor long range dependence.
Quick Read (beta)
Refinements of BarndorffNielsen and Shephard model: an analysis of crude oil price with machine learning
Abstract
A commonly used stochastic model for derivative and commodity market analysis is the BarndorffNielsen and Shephard (BNS) model. Though this model is very efficient and analytically tractable, it suffers from the absence of long range dependence and many other issues. For this paper, the analysis is restricted to crude oil price dynamics. A simple way of improving the BNS model with the implementation of various machine learning algorithms is proposed. This refined BNS model is more efficient and has fewer parameters than other models which are used in practice as improvements of the BNS model. The procedure and the model show the application of data science for extracting a “deterministic component” out of processes that are usually considered to be completely stochastic. Empirical applications validate the efficacy of the proposed model for long range dependence.
Key Words: Machine Learning, Deep Learning, Stochastic Model, Lévy Processes, Subordinator.
1 Introduction
One of the most prominent tools in modern big data analysis is machine learning. Machine learning is about extracting knowledge from a significantly large data set. The application of machine learning methods has recently become ubiquitous in everyday life. Machine learning has had a tremendous influence on the way datadriven research is done today. The tools can be applied to diverse scientific problems such as understanding stars, finding distant planets, discovering new particles, analyzing DNA sequences, and providing personalized cancer treatments.
A commodity of fundamental importance is crude oil. Consequently an analysis of the dynamics of crude oil price time series seems to be crucial. This allows to ascertain the potential impacts of its shocks in several economies and on other financial assets (see [30]). As observed in [29], longrange dependence is evident in various energy futures markets. Many other existing works are dedicated to the analysis of the dynamics of crude oil prices. In [15], various econometric models used to forecast crude oil prices are summarized and interpreted. In [14], a deep learning model is applied to crude oil prices and a hybrid crude oil price forecasting model is provided. In [13], oil producers’ decisions in Cournot competitions are described through continuum dynamic mean field games. In related work (see [12]), a modified Hotelling’s rule for games with stochastic demand is discussed. In [26], machine learning algorithms are implemented to analyze the oil price dynamics for the Bakken region in the United States.
Paper [23] uses a convolutional neural network to forecast crude oil prices through online media text mining. Paper [2] discusses applications of the hierarchical conceptual model and the artificial neural networksquantitative model to crude oil prices. In [32], denoising autoencoders and bootstrap aggregation are combined to forecast crude oil prices. Paper [17] evaluates the accuracy of machine learning support vector regression models for forecasting crude oil prices.
The application of machine learning to other financial data is also becoming more common. In [21], a machine learning algorithm is applied to statecontingent claims and stochastic discount factors in financial markets. In [22], a machine learning algorithm is implemented to determine whether bankdifferentiating factors influence firm choices in initial public offerings. In [25], a multicriteria decision aid model is used in an attempt to replicate the credit ratings of Asian banks.
A commonly used stochastic model for the derivative and commodity market analysis is the BarndorffNielsen and Shephard (BNS) model (see see [4, 6, 7, 8, 19, 20, 28, 31]). Though this model is very efficient and simple to use, it suffers from the absence of a long range dependence and many other issues. In this paper, we propose a simple way of improving the BNS model with the implementation of various machine learning algorithms. After that, we validate the performance of the model. We use staging data sets that are close to production and see how our model behaves; if it gives good results, then the model is deployed and it is implemented. Finally, feedback is used to determine whether the model meets the business need for which it was built.
In this paper, we apply machine learning to the analysis of crude oil price data. In order to understand the data, we collect ten years of daily historical price data for crude oil. After that, we conduct the exploratory data analysis. In the exploratory data analysis, we look at the basic statistics of the data such as its mean, median, and mode and correlations between the different labels. This exploratory data analysis gives direction to the model building. Empirical analysis shows the presence of long memory in crude oil time series. However, the intensity of the longrange dependence decreases over time. It is well established that the classical BNS model is not good for such data. In this paper, based on machine learning algorithms, we derive and implement a refined BNS model to the crude oil price dynamics.
The organization of the paper is as follows. In Section 2, we briefly describe the BNS model and why an improvement of this model is necessary for the analysis of crude oil price data. We find that the improvement of the model depends on machine learning analysis of the crude oil price data. The data analysis is provided in Section 3. A brief conclusion is provided in Section 4.
2 An improved BarndorffNielsen and Shephard model
Many models in recent literature try to capture the stochastic behavior of time series. For example, in the case of the BNS model, the stock or commodity price $S={({S}_{t})}_{t\ge 0}$ on some filtered probability space $(\mathrm{\Omega},\mathcal{F},{({\mathcal{F}}_{t})}_{0\le t\le T},\mathbb{P})$ is modeled by
$${S}_{t}={S}_{0}\mathrm{exp}({X}_{t}),$$  (2.1) 
$$d{X}_{t}=(\mu +\beta {\sigma}_{t}^{2})dt+{\sigma}_{t}d{W}_{t}+\rho d{Z}_{\lambda t},$$  (2.2) 
$$d{\sigma}_{t}^{2}=\lambda {\sigma}_{t}^{2}dt+d{Z}_{\lambda t},{\sigma}_{0}^{2}>0,$$  (2.3) 
where the parameters $\mu ,\beta ,\rho ,\lambda \in \mathbb{R}$ with $\lambda >0$ and $\rho \le 0$ and $r$ is the riskfree interest rate where a stock or commodity is traded up to a fixed horizon date $T$. In this model ${W}_{t}$ is a Brownian motion and the process ${Z}_{t}$ is a subordinator. Also ${W}_{t}$ and ${Z}_{t}$ are assumed to be independent and $({\mathcal{F}}_{t})$ is assumed to be the usual augmentation of the filtration generated by the pair $({W}_{t},{Z}_{t})$.
However, the empirical data suggest that volatility (${\sigma}_{t}$) usually fails to respond immediately to the sudden fluctuation of a stock or commodity price. The issue of the market’s delayed response was raised in several papers (see [5, 11, 16]). Paper [3] deals this issue with a delayed option price formula where the volatility has the form $\sigma ({S}_{tb})$, for some delay parameter $b>0$.
However, the results and the theoretical framework are far from satisfactory. There are problems related to the above model:

1.
Empirical results show that the jumps in volatility and stock or commodity price are positively correlated. However, unlike what is suggested by the model, they may not occur at the same time.

2.
For empirical data, the delay parameter $b$ is not deterministic.

3.
The performance of the model varies considerably depending both on the length and the density of time in the observed time series. Slow convergence is essentially caused by high serial correlation between the latent variables and the parameters. The problem is particularly acute in the case of a sparsely observed time series, or any case in which the time series contains many data.

4.
The BNS model does not incorporate the long range dependence property. The model fails significantly for a longer range of time. In some occasions, even for time spans as small as two weeks, the model is unable to consistently capture the essential features of the related time series.
Some of these problems are addressed in various recent works. For example, in [27], the author presents a generalized version of the BNS model. Assuming ${Z}_{t}$ and ${Z}_{t}^{*}$ to be two independent Lévy subordinators, define
$$d{\stackrel{~}{Z}}_{\lambda t}={\rho}^{\prime}d{Z}_{\lambda t}+\sqrt{1{\rho}^{\prime 2}}d{Z}_{\lambda t}^{*},$$  (2.4) 
which is also a Lévy subordinator provided $0\le {\rho}^{\prime}\le 1$. Thus, for $0\le {\rho}^{\prime}\le 1$, $Z$ and $\stackrel{~}{Z}$ are positively correlated Lévy subordinators. Suppose the dynamics of ${S}_{t}$ are given by (2.1), (2.2), where ${\sigma}_{t}$ is given by
$$d{\sigma}_{t}^{2}=\lambda {\sigma}_{t}^{2}dt+d{\stackrel{~}{Z}}_{\lambda t},{\sigma}_{0}^{2}>0,$$  (2.5) 
where $\stackrel{~}{Z}=({\stackrel{~}{Z}}_{\lambda t})$ is a subordinator independent of $W$ but has a positive correlation with $Z$ as described above. Assume that the dynamics of $S=({S}_{t})$ is given by (2.1), (2.2) and (2.5). In [27], it is shown that this generalized model has the liberty to fit the option price and volatility in a correlated but different way, which is not possible for the case of the classical BNS model. This result is used for pricing vanilla options and developing theorems for parameter estimations of some particular variance processes.
The literature (see [4, 18]) shows that superpositions of OrnsteinUhlenbeck (OU) type processes can be used to achieve long range dependence. A limiting procedure creates processes that are selfsimilar with stationary increments. However, paper [6] warns against fitting a large quantity of OU processes via a formal likelihoodbased method. An alternative approach is to use heavytailed jump distributions in the model.
In this paper, we will address issues #2, #3 , and #4 described above. We will show that for crude oil price dynamics, the jump is not completely stochastic. On the contrary, there is a deterministic element in crude oil price that can be implemented to apply the existing models for an extended period of time. We will show from an empirical analysis that the dynamics of ${X}_{t}$ in (2.2) can be more accurately written when we use a convex combination of two independent subordinators, $Z$ and ${Z}^{(b)}$ as:
$$d{X}_{t}=(\mu +\beta {\sigma}_{t}^{2})dt+{\sigma}_{t}d{W}_{t}+\rho \left((1\theta )d{Z}_{\lambda t}+\theta d{Z}_{\lambda t}^{(b)}\right),$$  (2.6) 
where $\theta \in [0,1]$ is a deterministic parameter. We will use several machine learning algorithms to determine the value of $\theta $. The process ${Z}^{(b)}$ in (2.6) is a subordinator that has greater intensity than the subordinator $Z$. In (2.6), $\lambda >0$ is a scale parameter for the time. The subordinator ${Z}^{(b)}$, that has greater intensity than $Z$, corresponds to a greater Lévy density subordinator. For instance, if the Lévy densities of $Z$ and ${Z}^{(b)}$ are given by ${\nu}_{1}\alpha {e}^{\alpha x}$ and ${\nu}_{2}\alpha {e}^{\alpha x}$, respectively (for $\alpha ,{\nu}_{1},{\nu}_{2}>0$, and $x>0$), then ${\nu}_{2}>{\nu}_{1}$.
In this case (2.5) will be given by
$$d{\sigma}_{t}^{2}=\lambda {\sigma}_{t}^{2}dt+(1{\theta}^{\prime})d{Z}_{\lambda t}+{\theta}^{\prime}d{Z}_{\lambda t}^{(b)},{\sigma}_{0}^{2}>0,$$  (2.7) 
where, as before, ${\theta}^{\prime}\in [0,1]$ is deterministic. For simplicity, we assume $\theta ={\theta}^{\prime}$ for the rest of this paper.
Theorem 2.1.
If the jump measure associated with the subordinator $Z$ be ${J}_{Z}$, and $J\mathit{}\mathrm{(}s\mathrm{)}\mathrm{=}{\mathrm{\int}}_{\mathrm{0}}^{s}{\mathrm{\int}}_{{\mathrm{R}}^{\mathrm{+}}}{J}_{Z}\mathit{}\mathrm{(}\lambda \mathit{}d\mathit{}\tau \mathrm{,}d\mathit{}y\mathrm{)}$, then for the logreturn of the classical BNS model given by (2.1), (2.2), and (2.3),
$$\text{\mathit{C}\mathit{o}\mathit{r}\mathit{r}}({X}_{t},{X}_{s})=\frac{{\int}_{0}^{s}{\sigma}_{\tau}^{2}\mathit{d}\tau +{\rho}^{2}J(s)}{\sqrt{\left({\int}_{0}^{t}{\sigma}_{\tau}^{2}\mathit{d}\tau +t{\rho}^{2}\lambda \text{\mathit{V}\mathit{a}\mathit{r}}({Z}_{1})\right)\left({\int}_{0}^{s}{\sigma}_{\tau}^{2}\mathit{d}\tau +s{\rho}^{2}\lambda \text{\mathit{V}\mathit{a}\mathit{r}}({Z}_{1})\right)}},$$  (2.8) 
for $t\mathrm{>}s$.
Proof.
Clearly, for $t>s$,
$$\text{Cov}({X}_{t},{X}_{s})={\int}_{0}^{s}{\sigma}_{\tau}^{2}\mathit{d}\tau +{\rho}^{2}{\int}_{0}^{s}{\int}_{{\mathbb{R}}^{+}}{J}_{Z}(\lambda d\tau ,dy).$$ 
Note that the instantaneous variance of the logreturn is given by $({\sigma}_{t}^{2}+{\rho}^{2}\lambda \text{Var}({Z}_{1}))dt$. Consequently we obtain (2.8). ∎
Note that for a fixed $s$, if $t$ increases, then $\text{Corr}({X}_{t},{X}_{s})$ quickly decreases. The proof of the following result is very similar to the proof of Theorem 2.1.
Theorem 2.2.
If the jump measures associated with the subordinators $Z$ and ${Z}^{\mathrm{(}b\mathrm{)}}$ are ${J}_{Z}$ and ${J}_{Z}^{\mathrm{(}b\mathrm{)}}$ respectively, and $J\mathit{}\mathrm{(}s\mathrm{)}\mathrm{=}{\mathrm{\int}}_{\mathrm{0}}^{s}{\mathrm{\int}}_{{\mathrm{R}}^{\mathrm{+}}}{J}_{Z}\mathit{}\mathrm{(}\lambda \mathit{}d\mathit{}\tau \mathrm{,}d\mathit{}y\mathrm{)}$, ${J}^{\mathrm{(}b\mathrm{)}}\mathit{}\mathrm{(}s\mathrm{)}\mathrm{=}{\mathrm{\int}}_{\mathrm{0}}^{s}{\mathrm{\int}}_{{\mathrm{R}}^{\mathrm{+}}}{J}_{Z}^{\mathrm{(}b\mathrm{)}}\mathit{}\mathrm{(}\lambda \mathit{}d\mathit{}\tau \mathrm{,}d\mathit{}y\mathrm{)}$; then for the logreturn of the refined BNS model given by (2.1), (2.6), and (2.7),
$\text{\mathit{C}\mathit{o}\mathit{r}\mathit{r}}({X}_{t},{X}_{s})={\displaystyle \frac{{\int}_{0}^{s}{\sigma}_{\tau}^{2}\mathit{d}\tau +{\rho}^{2}{(1\theta )}^{2}J(s)+{\rho}^{2}{\theta}^{2}{J}^{(b)}(s)}{\sqrt{\alpha (t)\alpha (s)}}},$  (2.9) 
for $t\mathrm{>}s$, where $\alpha \mathit{}\mathrm{(}\nu \mathrm{)}\mathrm{=}{\mathrm{\int}}_{\mathrm{0}}^{\nu}{\sigma}_{\tau}^{\mathrm{2}}\mathit{}\mathit{d}\tau \mathrm{+}\nu \mathit{}{\rho}^{\mathrm{2}}\mathit{}\lambda \mathit{}\mathrm{(}{\mathrm{(}\mathrm{1}\mathrm{}\theta \mathrm{)}}^{\mathrm{2}}\mathit{}\text{\mathit{V}\mathit{a}\mathit{r}}\mathit{}\mathrm{(}{Z}_{\mathrm{1}}\mathrm{)}\mathrm{+}{\theta}^{\mathrm{2}}\mathit{}\text{\mathit{V}\mathit{a}\mathit{r}}\mathit{}\mathrm{(}{Z}_{\mathrm{1}}^{\mathrm{(}b\mathrm{)}}\mathrm{)}\mathrm{)}$.
Proof.
We observe for $t>s$,
$$\text{Cov}({X}_{t},{X}_{s})={\int}_{0}^{s}{\sigma}_{\tau}^{2}\mathit{d}\tau +{\rho}^{2}{(1\theta )}^{2}{\int}_{0}^{s}{\int}_{{\mathbb{R}}^{+}}{J}_{Z}(\lambda d\tau ,dy)+{\rho}^{2}{\theta}^{2}{\int}_{0}^{s}{\int}_{{\mathbb{R}}^{+}}{J}_{Z}^{(b)}(\lambda d\tau ,dy).$$ 
Also, the variance of the logreturns ${X}_{t}$ and ${X}_{s}$ are given by ${\int}_{0}^{t}{\sigma}_{\tau}^{2}\mathit{d}\tau +\nu {\rho}^{2}\lambda ({(1\theta )}^{2}\text{Var}({Z}_{1})+{\theta}^{2}\text{Var}({Z}_{1}^{(b)}))$, and ${\int}_{0}^{s}{\sigma}_{\tau}^{2}\mathit{d}\tau +\nu {\rho}^{2}\lambda ({(1\theta )}^{2}\text{Var}({Z}_{1})+{\theta}^{2}\text{Var}({Z}_{1}^{(b)}))$, respectively. Consequently we obtain (2.9). ∎
Note that as $\theta $ is constantly adjusted, for a fixed $s$, the value of $t$ always has an upper limit. Consequently, $\text{Corr}({X}_{t},{X}_{s})$ never becomes “too small”. This is the major difference between the results in Theorem 2.1 and Theorem 2.2.
The advantages of the dynamics given by the refined BNS model given by (2.1), (2.6), and (2.7), over the existing models are significant. First of all, this minor change in the model incorporates long range dependence without actually changing the model. This model will be more efficient, but at the same time have many fewer parameters than the superposition models. Secondly, the performance of this model for a sparsely observed time series will be improved. Thirdly, an estimation the delay parameter $b$ (mentioned in #2) can be obtained. Finally, and possibly most importantly, the procedure and the model show the application of data science for extracting a deterministic component out of processes that are thus far considered to be completely stochastic. For this paper, we restrict our analysis for crude oil price dynamics. However, this method possibly can be implemented for any compatible time series.
3 Data analysis
A commodity of fundamental importance is the crude oil. Consequently an analysis of the dynamics of crude oil price time series seems to be crucial. This allows to ascertain the potential impacts of its shocks in several economies and on other financial assets (see [30]). As observed in [29], longrange dependence is evident in various energy futures markets. Empirical analysis shows the presence of long memory in crude oil time series. However, the intensity of the longrange dependence decreases over time. As described in the beginning of Section 2, the classical BNS model is not good for such data. On the other hand, Theorem 2.2 shows that the refined BNS model proposed in this paper can be implemented in this case.
We consider crude oil price data over a period of 10 years. We use the West Texas Intermediate (WTI or NYMEX) crude oil prices data set for the period June 1, 2009 to May 30, 2019 (Figure 1). There are a total of $2,530$ available data in this set. For convenience, we index the dates (for available data) from 0 (for June 1, 2009) to 2529 (for May 30, 2019). The following table (Table 1) summarizes various estimates for the data set.
Daily Price Change  Daily Price Change %  

Mean  0.0047  0.01370 % 
Median  0.04399  0.06521 % 
Maximum  7.62  12.32 % 
Minimum  8.90  10.53 % 
We implement the following procedure (Step 1 through Step 5) that creates a classification problem for the data set. For the data set:

1.
We conduct exploratory data analysis.

2.
We consider the daily close price for the historical oil price data. From the plots we identify a value of $K$ to define a “big jump” in the crude oil close price. We identify the dates for which the close price is $K$ “points” less than the close price of the previous day (for example, if $K=1\%$, we will find the dates for which the close price is $1\%$ below the previous business day).

3.
We create a new dataframe from the old one where “features” (columns) will be seven consecutive close prices. For example, if the close prices are
$${a}_{1},{a}_{2},{a}_{3},{a}_{4},{a}_{5},{a}_{6},{a}_{7},{a}_{8},{a}_{9},{a}_{10},\mathrm{\cdots};$$ then the first row of the data set will contain
$${a}_{1},{a}_{2},{a}_{3},{a}_{4},{a}_{5},{a}_{6},{a}_{7};$$ second row of the data set will contain
$${a}_{2},{a}_{3},{a}_{4},{a}_{5},{a}_{6},{a}_{7},{a}_{8};$$ etc.

4.
We create a new target column for the new dataframe (as created in the preceding step) as follows: $\theta =1$ for those set of seven close prices that immediately precede at least two jumps of size $K$ (or more) in the following seven days. Otherwise we label the target column by $\theta =0$.
For example: suppose we identified ${a}_{8}$ and ${a}_{10}$ as “big jumps”. Then the $\theta =1$ for the first row ${a}_{1},{a}_{2},{a}_{3},{a}_{4},{a}_{5},{a}_{6},{a}_{7}$.

5.
We run various classification algorithms from machine learning where the input is the close price for seven consecutive days, and output is $\theta $value (0 or 1). We evaluate the classification report and confusion matrix in each case.
We will show that we can find $\theta $ with reasonable accuracy and use this for (2.6). The result can be improved by adjusting the value of $K$ in Step 2. The result can be further improved by increasing the number of days (in Step 3) from seven to a higher number. It is worth noting that the various deep learning models provide a value of $\theta $ between $0$ and $1$. In Step 4, we approximate that by $0$ or $1$. However, the actual value of $\theta $ may be directly used in (2.6).
Figures 2, 3, and 4 provide various visualizations of crude oil close prices. Figures 5 and 6 provide a histogram of the daily price change and a histogram of daily percentage change, respectively. We partition this data set in various ways. For each partition we use a traintestsplit, with respect to a given date. We summarize the list of figures.

Figure 1:
West Texas Intermediate (WTI or NYMEX) crude oil prices data set for the period June 1, 2009 to May 30, 2019 (crude oil close price).

Figure 2:
Yearly boxplot for the close oil price.

Figure 3:
Distribution plot for close oil price.

Figure 4:
Bar chart for close oil price.

Figure 5:
Histogram for daily change in close oil price.

Figure 6:
Histogram for daily change percentage in close oil price.
For the following analysis we use $K=2\%$, i.e., $\theta =1$ for the set of seven close prices that immediately precede at least two jumps of size $2\%$ (or more) in the following seven days. Otherwise, we use $\theta =0$.
We run various supervised learning algorithms on the crude oil price data. We begin with the logistic regression (LR) and the random forest (RF) classification of the data set. It is well known that for the logistic regression classification, given a testing data $X$, $\mathbb{P}(\theta =1X)=\frac{1}{1+{e}^{{\beta}_{0}{\beta}_{1}\cdot X}}$, where the quantity ${\beta}_{0}$ and the vector ${\beta}_{1}$ are determined from the training set with the help of an appropriate loglikelihood function. The random forest classification of many decision trees with a random sample of features is used. By randomly leaving out candidate features from each split, random forest decorrelates the trees, such that the averaging process can reduce the variance of the resulting model.
After that, we implement various deep learning techniques:

(A)
A neural network with two hidden layers (with activations $\mathrm{tanh}$ and ReLU) and an output layer (with a softmax activation function). For simplicity we approximate $\theta $ in (2.6) with 0 (“no big jump”) and 1 (“big jump”). For this approximation, we take $\theta =1$ if the output probability for the softmax activation function corresponding to $\theta =1$ is more than $0.3$.

(B)
Long shortterm memory (LSTM) along with the neural network described in (A). LSTM is an artificial recurrent neural network (RNN) architecture that is implemented to avoid the vanishing gradient problem. The vanishing gradient problem is especially prominent when a vanilla RNN, constructed from regular neural network nodes, is implemented to model dependencies between time series values that are separated by a significant number of days. LSTM has inbuilt feedback connections that make it appropriately implementable for a financial time series. A common LSTM unit is composed of a cell, an input gate, an output gate, and a forget gate. The cell retains values over arbitrary time intervals and the other three gates regulate the flow of information into and out of the cell.

(C)
LSTM along with a batch normalizer (BN) and the neural network described in (A). A batch normalizer standardizes and rescales the output of a given layer in the deep network. To increase the stability of a neural network, batch normalization normalizes the output of a previous activation layer by subtracting the batch mean and dividing by the batch standard deviation. It also reduces the amount the hidden unit values shift around (i.e., its covariance shift). This process centers all the inputs around zero. This way, there is not much change in each layer input. Consequently, layers in the network can learn from the backpropagation simultaneously, without waiting for the previous layer to learn. This speeds up the training of networks.
Note that, once the value of $\theta $ is obtained from the training data, we use this value for the refined BNS model (in (2.6) and (2.7)). In particular, we use this deterministic $\theta $ value for the testing data. In addition to that, this deterministic value of $\theta $ can be used for prediction using the refined BNS model.
For the following tables (Table 2 through Table 13), we provide classification reports for various machine learning algorithms. For the testing data, true positive, true negative, false positive, and false negative are denoted as TP, TN, FP, and FN, respectively. In the context of this study, “TP” and “TN” are the cases when the model correctly predicts $\theta =1$, and $\theta =0$, respectively. Also, in the context of this study, “FP” is the case when $\theta =0$ is predicted as $\theta =1$; and “FN” is the case when $\theta =1$ is predicted as $\theta =0$. The following measurements are standard:
$$\text{\mathit{p}\mathit{r}\mathit{e}\mathit{c}\mathit{i}\mathit{s}\mathit{i}\mathit{o}\mathit{n}}=\frac{\text{TP}}{\text{TP}+\text{FP}},$$ 
$$\text{\mathit{r}\mathit{e}\mathit{c}\mathit{a}\mathit{l}\mathit{l}}=\frac{\text{TP}}{\text{TP}+\text{FN}}.$$ 
The f1score gives the harmonic mean of precision and recall. The scores corresponding to every class gives the accuracy of the classifier in classifying the data points in that particular class compared to all other classes. The support is the number of samples of the true response that lie in that class.
LR  RF  Neural Network (A)  LSTM (B)  BN (C)  
precision $\theta =0$  0.56  0.57  0.56  0.62  0.56 
recall $\theta =0$  0.96  0.91  0.70  0.79  0.65 
f1score $\theta =0$  0.71  0.70  0.62  0.69  0.60 
support $\theta =0$  57  57  57  57  57 
precision $\theta =1$  0.00  0.50  0.41  0.57  0.43 
recall $\theta =1$  0.00  0.11  0.27  0.36  0.34 
f1score $\theta =1$  0.00  0.19  0.33  0.44  0.38 
support $\theta =1$  44  44  44  44  44 
LR  RF  Neural Network (A)  LSTM (B)  BN (C)  
precision $\theta =0$  0.83  0.83  0.83  0.83  0.81 
recall $\theta =0$  0.99  0.91  0.54  0.62  0.62 
f1score $\theta =0$  0.91  0.87  0.65  0.71  0.70 
support $\theta =0$  168  168  168  168  168 
precision $\theta =1$  0.00  0.12  0.15  0.16  0.11 
recall $\theta =1$  0.00  0.06  0.42  0.36  0.24 
f1score $\theta =1$  0.00  0.08  0.23  0.22  0.15 
support $\theta =1$  33  33  33  33  33 
LR  RF  Neural Network (A)  LSTM (B)  BN (C)  
precision $\theta =0$  0.92  0.92  0.91  0.91  0.92 
recall $\theta =0$  1.00  0.92  0.58  0.58  0.58 
f1score $\theta =0$  0.96  0.92  0.71  0.71  0.71 
support $\theta =0$  185  185  185  185  185 
precision $\theta =1$  0.00  0.07  0.06  0.07  0.07 
recall $\theta =1$  0.00  0.06  0.31  0.38  0.38 
f1score $\theta =1$  0.00  0.06  0.10  0.12  0.12 
support $\theta =1$  16  16  16  16  16 
LR  RF  Neural Network (A)  LSTM (B)  BN (C)  
precision $\theta =0$  0.48  0.48  0.47  0.48  0.50 
recall $\theta =0$  1.00  1.00  0.96  1.00  0.98 
f1score $\theta =0$  0.64  0.65  0.63  0.65  0.66 
support $\theta =0$  48  48  48  48  48 
precision $\theta =1$  0.00  1.00  0.50  1.00  0.86 
recall $\theta =1$  0.00  0.02  0.04  0.04  0.11 
f1score $\theta =1$  0.00  0.04  0.07  0.07  0.20 
support $\theta =1$  53  53  53  53  53 
LR  RF  Neural Network (A)  LSTM (B)  BN (C)  
precision $\theta =0$  0.45  0.48  0.52  0.43  0.48 
recall $\theta =0$  0.96  0.94  0.62  0.45  0.83 
f1score $\theta =0$  0.62  0.64  0.56  0.44  0.61 
support $\theta =0$  47  47  47  47  47 
precision $\theta =1$  0.00  0.70  0.60  0.50  0.60 
recall $\theta =1$  0.00  0.13  0.50  0.48  0.22 
f1score $\theta =1$  0.00  0.22  0.55  0.49  0.32 
support $\theta =1$  54  54  54  54  54 
LR  RF  Neural Network (A)  LSTM (B)  BN (C)  
precision $\theta =0$  0.56  0.51  0.54  0.61  0.50 
recall $\theta =0$  0.17  0.59  0.12  0.10  0.06 
f1score $\theta =0$  0.26  0.55  0.20  0.17  0.11 
support $\theta =0$  114  114  114  114  114 
precision $\theta =1$  0.43  0.33  0.43  0.44  0.43 
recall $\theta =1$  0.83  0.26  0.86  0.92  0.92 
f1score $\theta =1$  0.57  0.29  0.57  0.59  0.58 
support $\theta =1$  87  87  87  87  87 
LR  RF  Neural Network (A)  LSTM (B)  BN (C)  
precision $\theta =0$  0.64  0.66  0.73  0.69  0.69 
recall $\theta =0$  0.60  0.71  0.26  0.25  0.18 
f1score $\theta =0$  0.62  0.68  0.38  0.37  0.29 
support $\theta =0$  136  136  136  136  136 
precision $\theta =1$  0.26  0.26  0.34  0.33  0.33 
recall $\theta =1$  0.29  0.22  0.80  0.77  0.83 
f1score $\theta =1$  0.28  0.24  0.48  0.46  0.47 
support $\theta =1$  65  65  65  65  65 
LR  RF  Neural Network (A)  LSTM (B)  BN (C)  
precision $\theta =0$  0.75  0.77  0.65  0.81  0.72 
recall $\theta =0$  1.00  0.82  0.22  0.38  0.34 
f1score $\theta =0$  0.86  0.79  0.33  0.52  0.46 
support $\theta =0$  76  76  76  76  76 
precision $\theta =1$  0.00  0.30  0.21  0.28  0.23 
recall $\theta =1$  0.00  0.24  0.64  0.72  0.60 
f1score $\theta =1$  0.00  0.27  0.32  0.40  0.33 
support $\theta =1$  25  25  25  25  25 
LR  RF  Neural Network (A)  LSTM (B)  BN (C)  
precision $\theta =0$  0.92  0.93  0.95  0.92  0.93 
recall $\theta =0$  1.00  0.96  0.39  0.76  0.67 
f1score $\theta =0$  0.96  0.94  0.55  0.84  0.78 
support $\theta =0$  93  93  93  93  93 
precision $\theta =1$  0.00  0.20  0.10  0.08  0.09 
recall $\theta =1$  0.00  0.12  0.75  0.25  0.38 
f1score $\theta =1$  0.00  0.15  0.17  0.12  0.14 
support $\theta =1$  8  8  8  8  8 
LR  RF  Neural Network (A)  LSTM (B)  BN (C)  
precision $\theta =0$  0.94  0.93  0.96  0.95  0.94 
recall $\theta =0$  1.00  0.84  0.67  0.56  0.67 
f1score $\theta =0$  0.97  0.88  0.79  0.70  0.79 
support $\theta =0$  95  95  95  95  95 
precision $\theta =1$  0.00  0.00  0.09  0.07  0.06 
recall $\theta =1$  0.00  0.00  0.50  0.50  0.33 
f1score $\theta =1$  0.00  0.00  0.15  0.12  0.10 
support $\theta =1$  6  6  6  6  6 
LR  RF  Neural Network (A)  LSTM (B)  BN (C)  
precision $\theta =0$  0.74  0.76  0.75  0.76  0.78 
recall $\theta =0$  1.00  0.99  0.99  0.87  0.79 
f1score $\theta =0$  0.85  0.86  0.85  0.81  0.78 
support $\theta =0$  75  75  75  75  75 
precision $\theta =1$  0.00  0.67  0.50  0.38  0.36 
recall $\theta =1$  0.00  0.08  0.04  0.23  0.35 
f1score $\theta =1$  0.00  0.14  0.07  0.29  0.35 
support $\theta =1$  26  26  26  26  26 
LR  RF  Neural Network (A)  LSTM (B)  BN (C)  
precision, $\theta =0$  0.77  0.78  0.77  0.79  0.83 
recall $\theta =0$  1.00  0.96  0.92  0.92  0.75 
f1score $\theta =0$  0.87  0.86  0.84  0.85  0.79 
support $\theta =0$  154  154  154  154  154 
precision, $\theta =1$  0.00  0.45  0.32  0.43  0.38 
recall $\theta =1$  0.00  0.11  0.13  0.21  0.49 
f1score $\theta =1$  0.00  0.17  0.18  0.29  0.43 
support $\theta =1$  47  47  47  47  47 
To make the BNS model implementable for a long range, it is clear that a single Lévy subordinator is not effective. If a large fluctuation in the future can be apprehended from the historical data (i.e., $\theta =1$) with the help of machine learning algorithms, we can “switch” the initial Lévy subordinator ($Z$) to the more intense Lévy subordinator (${Z}^{(b)}$) that corresponds to larger fluctuations. On the other hand if no big fluctuation in the future can be apprehended from the historical data (i.e., $\theta =0$) with the help of machine learning algorithms, we can “switch” the Lévy subordinator ${Z}^{(b)}$ to $Z$. In this way, a single equation (2.6) can be used to describe the crude oil dynamics even for a longer time period.
It is clear from the various tables that the logistic regression is less efficient in detecting future big jumps ($\theta =1$) based on the historical data. For most of the cases the neural network technique (A), LSTM (B), or the LSTM with a batch normalizer (C), work better than the random forest classifier. Also, if the algorithms are trained on more data points, the predictions for $\theta =1$ are better. To keep the model simple, only two hidden layers are used. The results improve if the number of hidden layers is increased. Also, note that the softmax activation function in the output layers for (A), (B), or (C), in fact provides probabilities for $\theta =0$ and $\theta =1$. With appropriate scaling those probabilities can be used in lieu of $(1\theta )$ and $\theta $ in (2.6).
Once we have a good estimation of the value of $\theta $, we can implement that to (2.6). That would lead to one of two options: (1) if the initial description of the BNS dynamics incorporates $Z$ (or ${Z}^{(b)}$) as the Lévy subordinator and $\theta =0$ is established, we continue (or, update) the subordinator with $Z$; (2) if the initial description of the BNS dynamics incorporates $Z$ (or ${Z}^{(b)}$) as the Lévy subordinator and $\theta =1$ is established, we update (or, continue) the subordinator with ${Z}^{(b)}$. The machine learning algorithms can be performed dynamically in order to continue or update with the background driving Lévy process in the BNS model.
As a result, the analysis shows that for crude oil price dynamics, the jump is not completely stochastic. There is a deterministic element ($\theta $) in it that can be implemented to apply the existing models for an extended period of time. Thus the new model incorporates long term dependence without changing the tractability of the model. This model is more efficient, but at the same time has many fewer parameters than the superposition models.
4 Conclusion
We observe that a classical BNS model may not appropriately represent crude oil price dynamics. In this paper, we implement various machine learning algorithms to determine the possibility of an upcoming large fluctuation in the crude oil price. Once those possibilities are obtained, the classical BNS model is modified (or not, depending on the obtained possibilities) with respect to its background driving Lévy subordinator. This modification enables long range dependence in the new model without significantly changing the model. Also, this modification incorporates only one extra parameter (i.e., $\theta $) compared to the classical model. It is shown in this paper that the parameter $\theta $ is deterministic and can be obtained from the empirical data using various machine learning techniques.
In this paper we implement machine learning algorithms to the empirical data in order to improve the mathematical model for commodity price dynamics. In a sequel of this work, we plan to implement this analysis for other financial time series. Also, we observe that the stochastic equation related to the volatility dynamics does not play a crucial role in the present analysis. The situation will be different and improved if it can be appropriately analyzed for an empirical data set.
Acknowledgment: The authors would like to thank the anonymous reviewers for their careful reading of the manuscript and for suggesting points to improve the quality of the paper.
References
 [1]
 [2] Abdullah S. N. & Zeng X. (2010), Machine learning approach for crude oil price prediction with Artificial Neural NetworksQuantitative (ANNQ) model, The 2010 International Joint Conference on Neural Networks (IJCNN), doi: 10.1109/IJCNN.2010.5596602
 [3] Arriojas M., Hu Y., Mohammed SE. & Pap G. (2007), A Delayed Black and Scholes Formula, Stoch Anal Appl., 25, 471–492.
 [4] BarndorffNielsen O. E. (2001), Superposition of OrnsteinUhlenbeck Type Processes, Theory Probab. Appl., 45, 175194.
 [5] Bernard V. & Thomas J. (1989), Postearningsannouncement drift: delayed price response or risk premium?, J. Account. Res., 27, 136.
 [6] BarndorffNielsen O. E. & Shephard N.(2001), NonGaussian OrnsteinUhlenbeckbased models and some of their uses in financial economics, J. R. Stat. Soc. Ser. B Stat. Methodol., 63, 167241.
 [7] BarndorffNielsen O. E. & Shephard N. (2001), Modelling by Lévy Processes for Financial Econometrics, In Lévy Processes : Theory and Applications (eds O. E. BarndorffNielsen, T. Mikosch & S. Resnick), 283318, Birkhäuser.
 [8] BarndorffNielsen O. E. , Jensen J. L. & S$\xf8$rensen M. (1998), Some stationary processes in discrete and continuous time, Adv. in Appl. Probab., 30, 9891007.
 [9] Benth F. E., Karlsen K. H. & K. Reikvam (2003), Merton’s portfolio optimization problem in a Black and Scholes market with nonGaussian stochastic volatility of OrnsteinUhlenbeck type, Math. Finance, 13, 215244.
 [10] Black F. & Scholes M. (1973), The pricing of options and corporate liabilities, J. Political Econ., 81, 637659.
 [11] Booth G., Kallunki J., & Martikainen T. (1997), Delayed price response to the announcements of earnings and its components in Finland, European Account. Rev., 6, 377392.
 [12] Brown I., Funk J., & Sircar R. (2017), Oil Prices & Dynamic Games Under Stochastic Demand, Available at SSRN: https://ssrn.com/abstract=3047390orhttp://dx.doi.org/10.2139/ssrn.3047390.
 [13] Chan P. & Sircar R. (2017), Fracking, Renewables, and Mean Field Games, SIAM Review, 59(3), 588615.
 [14] Chen Y., Kaijian H. & Tso G. K.F. (2017), Forecasting Crude Oil Prices: a Deep Learning based Model, Procedia Computer Science, 122, 300307.
 [15] Frey G., Manera M., Markandya A., & Scarpa E. (2009), Econometric Models for Oil Price Forecasting: A Critical Survey, CESifo Forum, ifo Institute  Leibniz Institute for Economic Research at the University of Munich, 10(1), 2944.
 [16] Grinblatt M. & Keloharju M. (2001), What makes investors trade?, J. Finance, 56, 589616.
 [17] He X. J. (2018), Crude Oil Prices Forecasting: Time Series vs. SVR Models, Journal of International Technology and Information Management, 27(2), 2542.
 [18] Habtemicael S., Ghebremichael M., & SenGupta I. (2019), Volatility and Variance Swap Using Superposition of the BarndorffNielsen and Shephard type Lévy Processes, To appear in Sankhya B, https://doi.org/10.1007/s135710170145y.
 [19] Issaka, A. & SenGupta, I. (2017), Analysis of variance based instruments for OrnsteinUhlenbeck type models: swap and price index, Annals of Finance, 13(4), 401434.
 [20] Issaka, A. & SenGupta, I. (2017), Feynman path integrals and asymptotic expansions for transition probability densities of some Lévy driven financial markets, Journal of Applied Mathematics and Computing volume, 54, 159182.
 [21] Jiang J. & Tian W. (2018), Seminonparametric approximation and index options, Annals of Finance, in press, https://doi.org/10.1007/s1043601803414.
 [22] Kulkarni K.S. & Sabarwal T. (2017), To what extent are investment bankdifferentiating factors relevant for firms floating moderatesized IPOs?, Annals of Finance, 3 (3), 297–327.
 [23] Li X., Shang W., & Wang S. (2019), Textbased crude oil price forecasting: A deep learning approach, International Journal of Forecasting, 35 (4), 15481560.
 [24] Nicolato E. & Venardos E. (2003), Option Pricing in Stochastic Volatility Models of the OrnsteinUhlenbeck type, Math. Finance, 13, 445466.
 [25] Pasiouras, F., Gaganis, C. & Doumpos, M. (2007), A multicriteria discrimination approach for the credit rating of Asian banks, Annals of Finance, 3(3), 351367.
 [26] Roberts M. & SenGupta I. (2019), Infinitesimal generators for twodimensional Lévy processdriven hypothesis testing, To appear in Annals of Finance , https://doi.org/10.1007/s1043601900355y.
 [27] SenGupta I. (2016), Generalized BNS stochastic volatility model for option pricing, International Journal of Theoretical and Applied Finance, 19(02), 1650014 (23 pages).
 [28] SenGupta I., Wilson W., & Nganje W. (2019), BarndorffNielsen and Shephard model: oil hedging with variance swap and option, Mathematics and Financial Economics, 13(2), 209226.
 [29] Sensoy A. & Hacihasanoglu E. (2014), Timevarying long range dependence in energy futures markets, Energy Economics, 46(C), 318327.
 [30] Tabak B. M. & Cajueiro D. O. (2007), Are the crude oil markets becoming weakly efficient over time? A test for timevarying longrange dependence in prices and volatility, Energy Economics, 29(1), 2838.
 [31] Wilson W., Nganje W., Gebresilasie S., & SenGupta I. (2019), BarndorffNielsen and Shephard model for hedging energy with quantity risk, High Frequency, 2 (34), 202214.
 [32] Zhao Y., Li J., & Yu, L. (2017), A deep learning ensemble approach for crude oil price forecasting, Energy Economics, 66(C), 916.