Stochastic Volatility Models With Longmemory in Discrete and Continuous Time

Stochastic Volatility Model

The class of stochastic volatility models (Shephard, 2005) would be a typical example.

From: Handbook of Statistics , 2012

Nonparametric Estimation in a Stochastic Volatility Model

Jürgen Franke , ... Jens-Peter Kreiss , in Recent Advances and Trends in Nonparametric Statistics, 2003

2 A nonparametric stochastic volatility model

We consider some asset with price S(t) at, time t and, following Taylor (1994), define the return from an integer time t − 1 to time t as

$R_{t} = log \frac{S (t)}{S (t - 1)} .$

To estimate a stochastic volatility model like (1.1) and (1.2), discretized versions of these equations are considered. Wiggins (1987) and Chesney and Scott (1989) use the Euler approximation

(2.3) $R_{t} = μ + σ_{t - 1} W_{t}$

(2.4) $log σ_{t} = α + ϕ \{log σ_{t - 1} - α\} + ϑ W_{t}^{*}$

(W_t , W* _t ) denote i.i.d. bivariat.e standard normal random variables with zero mean and correlation ρ. In (2.3), the lagged quantity σ _{t − 1} appears as the stochastic volatility for period t. This is rather advantageous for statistical purposes, as we will clearly see later on.

As another simplification of (1.1), Taylor (1994) considers

(2.5) $R_{t} = μ + σ_{t} W_{t},$

and he called (2.3), (2.4) a lagged autoregressive random variance (LARV) model, as log σ_t follows a linear autoregressive scheme. Analogously, (2.5), (2.4), together, is called a contemporaneous autoregressive random variance (CARV) model.

In this paper, we consider nonparametric generalizations of these models. We start with the lagged case and study it in detail, whereas we give a short discussion of the contemporaneous case at the end of Section 3.

We replace (2.4) by a nonlinear nonparametric model for ξ_t = log σ_t :

(2.6) $ξ_{t} = m (ξ_{t - 1}) + η_{t},$

where η_t denote i.i.d. zero-mean normal random variables with variance σ_η ², and m is an arbitrary autoregression function for which we only require certain smoothness assumptions.

In order to ensure that the Markov chain (ξ_t ) possesses nice probabilistic properties - e.g. geometric ergodicity and β-mixing (absolute regularity) or α-mixing (strongly mixing) with geometrically decaying mixing coefficients - it suffices (because of the assumption of normally distributed innovations η_t ) to assume an appropriate drift condition on m, e.g.

(A1) $\underset{[x] \to \infty}{lim sup} | \frac{m (x)}{x} | < 1,$

cf. Doukhan (1994), Proposition 6 (page 107). Then, in particular, ξ_t has a unique stationary distribution with density p_ξ .

We want to estimate m using kernel-type estimates. The usual Nadaraya-Watson estimates are, however, not applicable as we cannot observe the volatility σ_t or its logarithm ξ_t directly. The available data are the asset prices S_t or the returns R_t which are related to σ_t by (2.3). Taking logarithms and using the abbreviations

$X_{t} = \frac{1}{2} log {(R_{t} - μ)}^{2} - μ_{ε}, ε_{t} = \frac{1}{2} log W_{t}^{2} - μ ε$

with $μ_{ϵ} = ε (\frac{1}{2} log W_{t}^{2}) = - 0.63518 \dots$ (Scott (1987)), we get

(2.5) $X_{t} = ξ_{t - 1} + ε_{t},$

where the ε_t are i.i.d. zero-mean random variables distributed as $\frac{1}{2}$ times the logarithm of a χ ² ₁-random variable centered around 0. The correlation between the standard normal random variable W_t , appearing in the definition of ε_t , and η_t of (2.6) is ρ. (2.6), (2.5), together, form a nonparametric autoregressive model with errors-in-variables as ξ_t cannot be observed directly but is known only through its convolution with the i.i.d. random variables ε_t . Plugging (2.5) into (2.6) we obtain the following equation for X_t alone

(2.6) $X_{t} = m (X_{t - 1} - ε_{t - 1}) + η_{t - 1} + ε_{t} .$

Remark Assumption (A.1) also implies geometric ergodicity including geometrically β - and strong mixing for the process (X _t).

Read full chapter

URL:

https://www.sciencedirect.com/science/article/pii/B978044451378650020X

Special Volume: Mathematical Modeling and Numerical Methods in Finance

Huyên Pham , ... Wolfgang J. Runggaldier , in Handbook of Numerical Analysis, 2009

5.1 The model

We consider a stochastic volatility model where for simplicity we have only one risky asset with observable price ( S _k) whose dynamics is given by

$\begin{array}{l} S_{k + 1} = S_{k} exp [(r - \frac{1}{2} X_{k}^{2}) δ + X_{k} {\sqrt{δ}}_{\in_{k + 1}}], k = 1, …, n, \\ S_{0} = s_{0} > 0 \end{array}$

where (ɛ_k)_k is a Gaussian white noise sequence, X _k is the unobservable volatility process, δ = 1/n represents the discretization time step over the interval [0, 1], and r is the riskless interest rate per unit of time.

We denote by S ⁰ the riskless asset price with dynamics

$S_{k + 1}^{0} = S_{k}^{0} e^{r^{δ}} .$

Notice that the conditional law of S _k+1 given (X _k, S _k) has a density given by

$g (X_{k} . S_{k} . s^{'}) = \frac{1}{s^{'} \sqrt{2 π δ X_{k}^{2}}} exp [- \frac{{(ln s^{'} - ln S_{k} - (r - \frac{1}{2} X_{k}^{2}) δ)}^{2}}{2 X_{k}^{2} δ}], s^{'} > 0,$

and notice that, as the first derivative of g with respect to s′ is bounded, the hypothesis H4 is satisfied.

The volatility (X _k) is described by a Markov chain taking three possible values x ^b < x ^m < x ^h in (0, ∞). Its probability transition matrix is given by

(5.1) $P_{k} = (\begin{matrix} 1 - {(P b m + P b h)}^{δ} & P b m^{δ} & P b h^{δ} \\ P m b^{δ} & 1 - {(P m b + P m h)}^{δ} & P m h^{δ} \\ P h b^{δ} & P h m^{δ} & 1 - {(P h b + P h m)}^{δ} \end{matrix}) .$

The volatility (X _k) is a Markov-chain approximation à la Kushner (see Kushner and Dupuis [2001]) of a mean-reverting process

$d X_{t} = λ (x_{0} - X_{t}) d t + η d W_{t} .$

Denoting by Δ > 0 the spatial step, this corresponds to a probability transition matrix of the form (5.1) with

$x^{b} = x_{0} - Δ, x^{m} = x_{0}, x^{h} = x_{0} + Δ,$

and

$\begin{array}{l} ℙ b m = λ + \frac{η^{2}}{2 Δ^{2}}, P b h = 0 \\ P m b = \frac{η^{2}}{2 Δ^{2}}, P m h = \frac{η^{2}}{2 Δ^{2}} \\ P h b = 0, P h m = λ + \frac{η^{2}}{2 Δ^{2}}, \end{array}$

with the condition that $1 - λ - \frac{η^{2}}{2 Δ^{2}} > 0$ and $1 - \frac{η^{2}}{Δ^{2}} > 0$ .

In order to hedge the European put option with strike K, we invest an initial capital v ₀ in the risky asset following a self-financing strategy. Recall that the wealth process is given by

(5.2) $V_{k + 1}^{α} = V_{k}^{α} e^{r δ} + α_{k} [S_{k + 1} - S_{k} e^{r δ}],$

where α_k represents the number of shares of asset S _k held in the portfolio at time k. Observe that (5.2) verifies the hypothesis H2, and recall that the control process (α_k) is adapted with respect to the filtration (F _k ^S) generated by the observation process.

In what follows, we will work with the log price instead of the price and we set Y _k = ln S _k.

Read full chapter

URL:

https://www.sciencedirect.com/science/article/pii/S1570865908000094

Special Volume: Mathematical Modeling and Numerical Methods in Finance

Olivier Pironneau , Yves Achdou , in Handbook of Numerical Analysis, 2009

7.3 American options with stochastic volatility

In this paragraph, we discuss the pricing of an American put option with the Stein—Stein stochastic volatility model. The parameters of the model are given by (7.51). The domain truncation is also the same as in the example of the European option. Piecewise linear finite elements are used for the discretization. We have chosen to use the first-order operator splitting or projection scheme described in Section 6.4. A similar method has been studied by Ikonen and Toivanen [2004] for Heston's model. In order to capture the exercise boundary, we have adapted the mesh in the variables S and y. In Fig. 7.3, we have plotted the contours of the pricing function 1 year to maturity: the exercise zone clearly appears, indeed, it corresponds to the zone where the pricing function matches the function S ↦ K - S, that is, where the contours are vertical straight lines. Figure 7.3 has to be compared with Fig. 7.1 for the European option. In Fig. 7.4, we have plottedthe exercise region 1 year to maturity. The mesh is visible. It is refined near the exercise boundary.

Ikonen and Toivanen [2006] have proposed specific finite-difference methods and solution procedures for American options with Heston's model. Special discretization and grid are designed in such a way that the resulting matrix be an M matrix. The scheme is a seven-point scheme, and upwinding is used when necessary. A specific alternating direction splitting scheme is proposed. Each substep consists of solving a one-dimensional linear complementarity problem by the Brennan—Schwartz algorithm (see Brennan and Schwartz [1977]). Three directions are used: the two axes and the first diagonal. For these algorithms to work, one needs that the tridiagonal matrices used in the substeps are M matrices. This is not true, in general, but the scheme and the grid have been designed so that this condition holds. It is also necessary that the exercise boundary intersects the directions once at most. This condition is not proved, but no counterexample has been found in the computations. Ikonen and Toivanen [2006] have compared this solution procedure with four other methods, in particular,the previously described projection scheme and a multigrid algorithm; they have shown that the alternating direction method performs best. On the other hand, the last solution procedure has been tailored for this problem, and its robustness has to be assessed.

Read full chapter

URL:

https://www.sciencedirect.com/science/article/pii/S1570865908000112

Special Volume: Mathematical Modeling and Numerical Methods in Finance

Chuan-Hsiang Han , Jean-Pierre Fouque , in Handbook of Numerical Analysis, 2009

4.1 Multiscale stochastic volatility models

Following Fouque, Papanicolaou, Sircar and Solna [2003] , we consider the following class of multiscale stochastic volatility models, under a risk-neutral pricing probability measure IP ^* parametrized by the combined market prices of volatility risk (Λ₁, Λ₂):

(4.1) $d S_{t} = r S_{t} d t + σ_{t} S_{t} d W_{t}^{(0) ⋆},$

$\begin{array}{l} σ_{t} = f (Y_{t}, Z_{t}), \\ d Y_{t} = [\frac{1}{ɛ} c_{1} (Y_{1}) + \frac{g_{1} (Y_{t})}{\sqrt{ɛ}} Λ_{1} (Y_{t}, Z_{t})] d t \\ + \frac{g_{1} (Y_{t})}{\sqrt{ɛ}} (ρ_{1} s W_{t}^{(0) ⋆} + \sqrt{1 - ρ_{1}^{2} d W_{t}^{(1) ⋆}}), \\ d Z_{t} = [δ c_{2} (Z_{t}) + \sqrt{δ} g_{2} (Z_{t}) Λ_{2} (Y_{t}, Z_{t})] d t \\ + \sqrt{δ} g_{2} (Z_{t}) (ρ_{2} d W_{t}^{(0) ⋆} + ρ_{12} d W_{t}^{(1) ⋆} + \sqrt{1 - ρ_{2}^{2} d W_{t}^{(2) ⋆}}), \end{array}$

where S _t is the underlying asset price process with a constant risk-free interest rate r. The random stochastic volatility σ_t is driven by two stochastic processes Y _t and Z _t, varying on the time scales ɛ and 1/δ, respectively (ɛ is intended to be a short time scale, while 1/δ is thought as a longer time scale). The vector (W _t ^(0)*, W _t ^(1)*, W _t ^(2)*) consists of three independent standard Brownian motions. The instant correlation coefficients ρ₁, ρ₂, and ρ₁₂ satisfy |ρ₁| < 1 and |ρ² ₂ + ρ₁₂ ²| < 1. The volatility function f is assumed to be bounded and bounded away from zero to avoid degeneracy though these assumptions are not crucial and can be relaxed to accommodate, for instance, Heston-type models with a Cox–Ingersoll-Ross (CIR) stochastic volatility factor. The coefficient functions of Y _t, namely, c ₁ and g ₁, are assumed to be such that under the physical probability measure (Λ₁ = Λ₂ = 0), Y _t is ergodic. The Ornstein–Uhlenbeck process is a typical example by defining c ₁(y) = m ₁ - y and g ₁(y) = v ₁ $\sqrt{2}$ such that 1/ɛ is the rate of mean reversion, m ₁ is the long-run mean, and v ₁ is the long-run standard deviation. Its invariant distribution is $\tilde{N}$ (m ₁, v ₁ ²).

The coefficient functions of Z _t, namely, c ₂ and g ₂, are assumed to be smooth enough in order to satisfy existence and uniqueness conditions for diffusions. The combined risk premia Λ₁ and Λ₂ are assumed to be smooth, bounded, and dependent on the variables y and z only. Within this setup, the joint process (S _t, Y _t, Z _t) is Markovian. We refer to Fouque, Papanicolaou, Sircar and Solna [2003] for a detailed discussion on this class of models.

Under the stochastic volatility models considered, the American option price at time 0 with an integrable payoff function H is given by

(4.2) $P^{ɛ, δ} (t, x, y, z) = ess sup_{t \leq τ \leq T} {IE}^{*} {e^{- r (τ - t)} H (S_{τ}) | S_{t} = x, Y_{t} = y, Z_{t} = z},$

where τ denotes any stopping time greater than or equal to t, bounded by T, and is adapted to the completion of the natural filtration generated by Brownian motions (W _t ^(0)*, W _t ^(1)*, W _t ^(2)*). We consider a typical American put option pricing problem, namely, H(x) = (K - x)⁺.

Read full chapter

URL:

https://www.sciencedirect.com/science/article/pii/S1570865908000045

Bayesian nonparametric methods for financial and macroeconomic time series analysis

Maria Kalli , in Flexible Bayesian Regression Modelling, 2020

4.5 Conclusion

In this chapter, we showed how Bayesian nonparametric priors can be used to estimate the conditional distribution of asset returns, capture long-range dependence in SV models and explain the joint dynamic behaviour of macroeconomic time series. For all three cases we showed that the out-of-sample predictive performance of the resulting Bayesian nonparametric model was superior to other competitive models.

However, the Dirichlet process, DPM and SBP are not the only Bayesian nonparametric priors that can be used in the analysis of financial and macroeconomic time series. In the 2010s other Bayesian nonparametric priors have been developed using normalisations of complete random measures (see [54]). We believe that these priors should be used in the analysis of financial time series, because they provide a more flexible construction for the weights of mixture models. For example, they can be used in the analysis of ultrahigh-frequency data, where evidence of nonstationarity together with long-range dependence exists.

Since the seminal work of [57] there has been a lot of work in developing dependent random measures based on stick breaking constructions but these had not been used in financial time series analysis until recently. [44] use the hierarchical Dirichlet process (HDP) of [73] to capture the time dependence of the realised covariance (RCOV) matrix and estimate its conditional distribution. The HDP is a distribution over multiple correlated probability measures, $G_{1}, \dots, G_{r}$ , sharing the same atom locations. Each probability measure is generated from independent Dirichlet processes with shared precision parameter and base measure, which is generated from a Dirichlet process itself. [9] generalise the HDP to hierarchical constructions with normalised random measures, while [31] develop correlated random measures which do not involve such hierarchical construction. We believe that these measures are a more flexible alternative to the stick breaking constructed ones, as they are not constrained by the stochastic ordering of the mixture weights. These methods will be useful for modelling more complex financial and macroeconomic data.

Read full chapter

URL:

https://www.sciencedirect.com/science/article/pii/B9780128158623000123

Conceptual Econometrics Using R

Yong Li , ... Tao Zeng , in Handbook of Statistics, 2019

6.2.2 Specification testing for SV models

The dataset used here contains the daily returns on AUD/USD exchange rates from January 2005 to December 2012. Following a suggestion of a referee, before we apply BMT to the SV model, we first test the i.i.d. normal model with constant mean and constant variance given by

(27) $y_{t} = α + ɛ_{t}, ɛ_{t} \overset{i . i . d .}{\sim} N (0, σ^{2})$

An AR(1) model is used as the expanded model

(28) $y_{t} = α + β y_{t - 1} + ɛ_{t}, ɛ_{t} \overset{i . i . d .}{\sim} N (0, σ^{2}) .$

The Bayesian MCMC method is implemented to estimate the parameters with the following vague prior

$α ~ N (0, 100 σ^{2}), β ~ N (0, 100 σ^{2}), σ^{- 2} ~ Γ (0.001, 0.001) .$

For the above two models, we draw 20,000 MCMC samples from the posterior distribution and compute BMT.

The critical value of χ ²(1) is 6.63 at the 1% significance level. BMT is 251.52, rejecting the i.i.d. normal model. This conclusion is not surprising as the volatility of stock returns is stochastic. However, J ₁ is 0.2858 (i.e., J ₀ = 251.23) which is less than the critical value of χ ²(1). Using J ₁ alone only suggests that we cannot reject β = 0 in Model (28). This conclusion is also not surprising as the weekly returns have very weak serial correlations.

Next, we change the null model to the following basic SV model,

(29) $\begin{array}{l} y_{t} = α + exp (h_{t} / 2) u_{t}, u_{t} \overset{i . i . d .}{\sim} N (0, 1), \\ h_{t} = μ + ϕ (h_{t - 1} - μ) + τ v_{t}, v_{t} \overset{i . i . d .}{\sim} N (0, 1) . \end{array}$

The expanded model is as follows:

(30) $\begin{array}{l} y_{t} = α + β_{1} y_{t - 1} + exp (h_{t} / 2) u_{t}, u_{t} \overset{i . i . d .}{\sim} N (0, 1) . \\ h_{t} = μ + ϕ (h_{t - 1} - μ) + τ v_{t}, v_{t} \overset{i . i . d .}{\sim} N (0, 1) . \end{array}$

The following vague priors are used

$\begin{array}{l} α ~ N (0, 100), ϕ ~ Beta (1, 1), \\ τ^{- 2} ~ Γ (0.001, 0.001), β_{1} ~ N (0.5, 100) . \end{array}$

To obtain BMT, we draw 110,000 MCMC samples from the posterior distribution and discard the first 10,000 as burning-in observations, and store the remaining samples as effective observations in both models. In this case, BMT = 0.4279 which is less than the critical value of χ ²(1), suggesting that the basic SV model is not misspecified.

Read full chapter

URL:

https://www.sciencedirect.com/science/article/pii/S016971611830107X

Special Volume: Mathematical Modeling and Numerical Methods in Finance

Hélyette Geman , in Handbook of Numerical Analysis, 2009

2.7 The CGMY process with stochastic volatility

Carr, Geman, Madan and Yor [2002] introduced a pure jump Lévy process to model stock prices, defined by its Lévy density

$k_{CGMY} (x) = {\begin{matrix} \frac{C e^{- M x}}{x^{1 + Y}}, & x > 0 \\ \frac{C e^{- G | x |}}{{| x |}^{1 + Y}}, & x < 0 \end{matrix}$

and showed that the parameter Y characterizes the activity intensity of the market to which the process is calibrated.

As any Lévy process, the CGMY process has independent increments that do not allow to capture effects such as volatility clustering that have been well documented in the finance literatures. In order to better calibrate the volatility surface, Carr, Geman, Madan and Yor [2003] proposed to introduce in the CGMY model stochastic volatility in the form of a time change, leading to a return process

(2.3) $R (t) = X_{CGMY} (T (t)),$

where the time change is meant to create autocorrelations of returns and clustering of volatility. Since the time change has to be increased, they chose for T(t) the integral of a mean-reverting positive process, namely, the square-root process:

$T (t) = \int_{0}^{t} y (u) d u,$

where

$d y (t) = k (η - y) d t + λ \sqrt{y} d B (t) .$

The process described in (2.3) performs much better when calibrating S&P option prices through strikes and maturities (see Carr, Geman, Madan and Yor [2003]).

To conclude this section, we should observe that the representation

(2.4) $R (t) = X (T (t)),$

where X is not necessarily a Brownian motion T, the time change, chosen to translate the desired properties of stochastic volatility may be quite powerful if the two processes X and T are fully known, in particular, in terms of trajectories. In order to price exotic options, one can build Monte Carlo simulations of the stock process and avoid the hurdles created by the unobservable nature of volatility in stochastic volatility models.

We have shown that by changing the probability measure (and the numéraire in the economy) or changing the clock, asset price processes can be expressed as martingales or even Brownian motion. The martingale representation is immediately extended to contingent claims in the case of complete markets where there is a unique martingale measure for each chosen numéraire. In the case of incomplete markets, we are facing many martingale measures; moreover, self-financing portfolios are not in general numéraire invariant, or in turn the pricing and hedging of contingent claims (see Gouriéroux and Laurent [1998] for the case of the minimal variance measure). These elements, among others, illustrate the numerous difficulties attached to incomplete markets. Given the importance of the numéraire-invariance property, for instance, when managing a book of options involving several currencies, this feature may be a constraint one wishes to incorporate when choosing among the different answers to market incompleteness.

Read full chapter

URL:

https://www.sciencedirect.com/science/article/pii/S1570865908000161

Volatility as an Asset Class and the Smile

Salih N. Neftci , in Principles of Financial Engineering (Second Edition), 2008

13. How to Explain the Smile

The volatility smile is an empirical phenomenon that violates the assumptions of the Black-Scholes world. At the same time, the volatility smile is related to the implied volatilities obtained from the Black-Scholes formula. This may give rise to confusion. The smile suggests that the Black-Scholes formula is not valid, while at the same time, the trader obtains the smile using the very same Black-Scholes formula. Is there an internal inconsistency?

The answer is no. To clarify the point, we use an analogy that is unrelated to the present discussion, but illustrates what market conventions are. Consider the 3-month Libor rate L_t . What is the present value of, say, $100 that will be received in 3 months' time? We saw in Chapter 3 that all we need to do is calculate the ratio:

(64) $\frac{100}{(1 + L_{t} \frac{1}{4})}$

An economist who is used to a different de-compounding may disagree and use the following present value formula:

(65) $\frac{100}{{(1 + L_{t})}^{\frac{1}{4}}}$

Who is right? The answer depends on the market convention. If L_t is quoted under the condition that formula (64) be used, then formula (65) would be wrong if used with the same L_t . However, we can always calculate a new $L_{t}^{*}$ using the equivalence:

(66) $\frac{100}{(1 + L_{t} \frac{1}{4})} = \frac{100}{{(1 + L_{t}^{*})}^{\frac{1}{4}}}$

Then, the formula

(67) $\frac{100}{{(1 + L_{t}^{*})}^{\frac{1}{4}}}$

used with $L_{t}^{*}$ would also yield the correct present value. The point is, the market is quoting an interest rate L_t with the condition that it is used with formula (64). If for some odd reason a client wants to use formula (65), then the market would quote $L_{t}^{*}$ instead of L_t . The result would be the same since, whether we use formula (64) with L_t , as the market does, or formula (65) with $L_{t}^{*}$ , we would obtain the same present value. In other words, the question of which formula is correct depends on how the market quotes the variable of interest.

This goes for options also. The Black-Scholes formula may be the wrong formula if we substitute one particular volatility, but may give the right answer if we use another volatility. And the latter may be different than the real-world volatility at that instant. But traders can still use a particular volatility to obtain the right option price from this "wrong" formula, just as in the earlier present value example. This particular volatility, when associated with the Black-Scholes formula, may give the correct value for the option even though the assumptions leading to the formula are not satisfied.

Thus, suppose the arbitrage-free option price obtained under the "correct" assumptions is given by

(68) $C (S_{t}, t, T, K, σ_{t}^{*}, θ_{t})$

where K is the strike price, T is the expiration date, and S_t is the underlying asset price. The (vector) variable θ_t represents all the other parameters that enter the "correct" formula and that may not be taken into account by the Black-Scholes world. For example, the volatility may be stochastic, and some parameters that influence the volatility dynamics may indirectly enter the formula and be part of θ_t . ¹⁴ The critical point here is the meaning that is attached to $σ_{t}^{*}$ . We assume for now that it is the correct instantaneous volatility as of time t.

The (correct) pricing function in equation (68) may be more complex and may not even have a closed form solution in contrast to the Black-Scholes formula, F(S_t, t, σ). Suppose traders ignore equation (68) but prefer to use the formula F(S_t, t,σ), even though the latter is "wrong." Does this mean traders will calculate the wrong price?

Not necessarily. The "wrong" formula F(S_t, t,σ) can very well yield the same option price as $C (S_{t}, t, K, σ_{t}^{*}, θ_{t})$ if the trader uses in F(S_t, t,σ), another volatility, σ, such that the two formulas give the same correct price:

(69) $C (S_{t}, t, T, K, σ_{t}^{*}, θ_{t}) = F (S_{t}, t, σ)$

Thus, we may be able to get the correct option price from the "unrealistic" Black-Scholes formula if we proceed as follows:

1.

We quote the K_i -strike option volatilities σ_i directly at every instant t, under the condition that the Black-Scholes formula be used to obtain the option value. Then, liquid and arbitrage-free markets will supply "correct" observations of the ATM volatility σ0. ¹⁵

2.

For out-of-the-money options, we use the Black-Scholes formula with a new volatility denoted by $σ (\frac{S}{K_{i}}, S)$ , and

(70) $σ (\frac{S}{K_{i}}, S) = σ_{0} + f (\frac{S}{K_{i}}, S)$

where f(.) is, in general, positive and implies a smile effect. The adjustment made to the ATM volatility, σ₀, is such that when

σ (\frac{S}{K_{i}}, S)

is used in the Black-Scholes formula, it gives the correct value for the K_i strike option:

(71) $F (S_{t}, t, K_{i}, σ_{0} + f (\frac{S}{K_{i}}, S)) = C (S_{t}, t, K_{i}, σ_{t}^{*}, θ_{t})$

The adjustment factor $f (\frac{S}{K_{i}}, S)$ is determined by the trader's experience, knowledge, and the trading environment at that instant. The relationships between risk reversal, butterfly, and ATM volatilities discussed in the previous section can also be used here. ¹⁶

The trader, thus, adjusts the volatility of the non-ATM options so that the wrong formula gives the correct answer, even though what is used in the Black-Scholes formula may not be the "correct" instantaneous realized volatility of the S_t process.

The $f (\frac{S}{K_{i}}, S)$ is, therefore, an adjustment required by the imperfections of the Black-Scholes formula in adequately representing the real-world environment. The upshot is that when we plot $σ (\frac{S}{K_{i}}, S)$ against K_i/S or K_i we get a smile, or a skew curve, depending on the time and the sector we are working with.

For what types of situations should the volatilities be adjusted? At least three inconsistencies of the Black-Scholes assumptions with the real world can be corrected for by adjusting the volatilities across the strike K_i . The first is the lognormal process assumption. The second is the fact that if asset prices fall dramatically during a relatively short period of time, this could increase the "fear factor" and volatility would increase. The third involves the organizational and regulatory assumptions concerning financial markets. We discuss these in more detail next.

13.1. Case 1: Nongeometric Price Processes

Suppose the underlying obeys the true risk-neutral dynamics described by the SDE:

(72) $\begin{matrix} d S_{t} = r S_{t} d t + σ S_{t}^{α} d W_{t} & t \in [0, \infty) \end{matrix}$

With α = 1,S _t would be lognormal. Everything else being conformable to the Black-Scholes world, there would be no smile in the implied volatilities.

The case of α < 1 would require an adjustment to the volatility coefficient used in the Black-Scholes formula as the strike changes. This is true, since, unlike in the case of α = 1, now the percentage volatility is dependent on the level of S_t . We divide by S_t to obtain

(73) $\frac{d S_{t}}{S_{t}} = \begin{matrix} r d t + σ S_{t}^{α - 1} d W_{t} & t \in [0, \infty) \end{matrix}$

The percentage volatility is given by the term $σ S_{t}^{α - 1}$ . This percentage volatility will be a decreasing function of S_t if α < 1. As S_t declines, the percentage volatility increases. Thus, the trader needs to use higher implied volatility parameters in the Black-Scholes formula for put options with lower and lower strike prices. This means that the more out-of-the-money the put option is, the higher the volatility used in the Black-Scholes formula must be.

This illustrates the idea that although the trader knows that the Black-Scholes world is far from reality, the volatility is adjusted so that the original Black-Scholes framework is preserved and that a "wrong" formula can still give the correct option value.

13.2. Case 2: Possibility of Crash

Suppose a put option series has an expiration of two months. All options are identical except for their strike. They run from ATM to deep out-of-the-money. Suppose also that the current level of S_t is 100. The liquid put options have strikes 90, 80, 70, and 60.

Here is what the 90-strike option implies. If the option expires in-the-money, then the market would have fallen by at least 10% in two months. This is a big fall, perhaps, but not a disaster. In contrast, if the 60-strike put expires in-the-money, this would imply a 40% drop in two months. This is clearly an unusual event, and unusual events lead to sudden spikes in volatility. Thus, the 60-strike option is relatively more associated with events that are labeled as crises and, everything else the same, this option would, in all likelihood, be in-the-money when the volatility is very high. But when this option becomes in-the-money, its gamma, which originally is close to zero, will also be higher. Thus, the trader who sells this option would have higher cash payouts due to delta hedge adjustments. So, to compensate for these potentially higher cash payments, the trader would use higher and higher vol parameters in put options that are more and more out-of-the-money, and, hence, are more and more likely to be associated with a crisis situation.

This explanation is consistent with the smiley shapes observed in reality. Note that in FX markets, sudden drops and sudden increases would mean higher volatility because in each case one of the observed currencies could be falling dramatically. So the smile will be more or less symmetric. But in the case of equity markets, a sudden increase in equity prices may be an important event, but not a crisis at all. For traders (excluding the shorts) this is a "happy" outcome, and the volatility may not increase much. In contrast, when asset prices suddenly crash, this increases the fear factor and the volatilities may spike. Thus, in equity markets the smile is expected to be mostly one-sided if this explanation is correct. It turns out that empirical data support this contention. Out-of-the-money equity puts have a smile; but out-of-the-money equity calls exhibit almost no smile.

Example:

Consider Table 15-2 which displays the prices of options with June 2002 expiry, on January 10, 2002, and ignore issues related to Americanness or any possible payouts. These data are collected at the same time as those discussed in the earlier example. In this case, the options are longer dated and expire in about 6 months. First, we obtain the volatility smile for these data.

TABLE 15-2. OEX Options with June 21, 2002, Expiry (collected 9:46 CBOT on January 10, 2002)

Calls	Bid	Ask	Puts	Bid	Ask
Jun 440	153.4	156.4	Jun 440	4.2	4.8
Jun 460	134.8	137.8	Jun 460	5.6	6.3
Jun 480	116.7	119.7	Jun 480	7.4	8.1
Jun 500	99.2	102.2	Jun 500	9.9	10.6
Jun 520	82.6	85.6	Jun 520	12.9	14.4
Jun 540	67.2	69.7	Jun 540	17.2	18.7
Jun 560	52.7	55.2	Jun 560	22.7	24.2
Jun 580	39.8	41.8	Jun 580	29.3	31.3
Jun 600	28.6	30.6	Jun 600	38.3	40.3
Jun 620	19.9	21.4	Jun 620	49.5	51.5
Jun 640	12.8	14.3	Jun 640	62.2	64.7
Jun 660	8	8.7	Jun 660	76.9	79.9
Jun 680	4.7	5.4	Jun 680	93.7	96.7
Jun 700	2.55	3.2	Jun 700	111.6	114.6

The data are collected at the same instant, and since the current value of the underlying index is the same in each case, the division by S_t₀ is not a major issue, but we still prefer to graph the volatility smile against the $\frac{K}{S}$ .

We extract ask prices for the eight out-of-the-money puts and consider the 600-put as being in-the-money. This way we can calculate nine implied vols. The price data that we use are shown in Table 15-2. We consider first the out-of-the-money put asking prices listed in the sixth column of this table. This will give nine prices.

Ignoring other complications that may exist in reality, we use the Black-Scholes formula straightforwardly with

(74) $S_{t_{0}} = 589.15, r = 1.90 %, t = \frac{152}{365} = 0.416$

We solve the equations

(75) $\begin{matrix} P (589.15, K_{i}, 1.90, σ_{K_{i}}, 0.416) = P_{K_{i}} & i = 1, \dots, 9 \end{matrix}$

and obtain the nine implied volatilities σ_{K_i}. Using Mathematica, we obtain the following result, which shows the value of Ki/S and the corresponding implied vols for out-of-the-money puts:

$\frac{K}{S}$	Vol
0.74	0.26
0.78	0.26
0.81	0.26
0.84	0.25
0.88	0.25
0.91	0.24
0.95	0.23
0.98	0.22
1.01	0.21

This is shown in Figure 15-15. Clearly, as the moneyness of the puts decreases, the volatility increases. Option market makers will conclude that, if in 6 months, U.S. equity markets were to drop by 25%, then the fear factor would increase volatility from 21% to 26%. By selecting the seven out-of-the-money call prices, we get the implied vols for out-of-the-money calls.

$\frac{K}{S}$	Vol
0.98	0.23
1.01	0.22
1.05	0.21
1.08	0.20
1.12	0.19
1.15	0.19
1.18	0.18

Here, the situation is different. We see that as moneyness of the calls decreases, the volatility also decreases.

Option market makers may now think that if, in 6 months, U.S. equity markets were to increase by 20%, then the fear factor would decrease and so would volatility.

The fear of a crash that leads to a smile phenomenon can, under some conditions, be represented analytically using the so-called jump processes. We discuss this modeling approach next.

13.2.1. Modeling Crashes

Consider again the standard geometric Brownian motion case:

(76) $d S_{t} = \begin{matrix} r S_{t} d t + σ S_{t} d W_{t} & t \in [0, \infty) \end{matrix}$

W_t is a Wiener process under the risk-neutral probability $\tilde{P}$ . Now, keep the volatility parameterization the same, but instead, add a jump component as discussed in Lipton (2002). For example, let

(77) $d S_{t} = \begin{matrix} r S_{t} d t + σ S_{t} d W_{t} + S_{t} [(e^{j} - 1) d J_{t} - λ m d t] & t \in [0, \infty) \end{matrix}$

Some definitions are needed regarding the term (e ^j-1)dJ_t-λmdt]. The j is the size of a random logarithmic jump. The size of the jump is not related to the occurrence of the jump, which is represented by the term dJ _t. If the jump is of size zero, then (e ^j-1)=0 and the jump term does not matter.

The term dJ_t is a Poisson-type process. In general, at time t, it equals zero. But, with "small" probability, it can equal one. The probability of this happening depends on the length of the interval we are looking at, and on the size of the intensity coefficient λ. The jump can heuristically be modeled as follows

(78) $d J_{t} = {\begin{matrix} 0 & with probability (1 - λ d t) \\ 1 & with probability λ d t \end{matrix}$

where 0 < dt is an infinitesimally short interval. Finally, m is the expected value of (e^j -1):

(79) $E_{t}^{\tilde{P}} [(e^{j} - 1)] = m$

Thus, we see that, for an infinitesimal interval we can heuristically write

(80) $E_{t}^{\tilde{P}} [(e^{j} - 1) d J_{t}] = E_{t}^{\tilde{P}} [(e^{j} - 1)] E_{t}^{\tilde{P}} [d J_{t}]$

(81) $= m [0. (1 - λ d t) + 1. λ d t]$

(82) $= m λ d t$

According to this, the expected value of the term (e ^j-1)dJ_t -λmdt is zero.

This jump-diffusion model captures some crash phenomena. Stock market crashes, major defaults, 9/11-type events, and currency devaluations can be modeled as rare but discrete events that lead to jumps in prices.

The way these types of jumps create a smile can be heuristically explained as follows: In a world where the Black-Scholes assumptions hold, with a geometric S_t process, a constant volatility parameter $\tilde{σ}$ , and no jumps, the volatility trade yields the arbitrage relation:

(83) $\frac{1}{2} C_{s s} {\tilde{σ}}^{2} S^{2} + C_{t} + r C_{s} S - r C = 0$

With a jump term added to the geometric process as in equation (77), the corresponding arbitrage relation becomes

(84) $\frac{1}{2} C_{s s}^{*} σ^{2} S^{2} + C_{t}^{*} + (r - λ m) C_{s}^{*} S - r C^{*} + λ E_{t}^{\tilde{P}} [C^{*} (S e^{j}, t) - C^{*} (S, t)] = 0$

where $\tilde{P}$ is the risk-neutral probability. Suppose we decide to use, as a convention, the Black-Scholes formula, but believe that the true PDE is the one in equation (83). Then, we would select $\tilde{σ}$ such that the Black-Scholes formula yields the same option value as the other PDE would yield.

For example, out-of-the-money options will have much smaller gammas, C_ss . If the expected jump is negative, then $\tilde{σ}$ will be bigger, and the more out-of-the-money the options are. As the expiration date T increases, C_ss will increase and the smile will be less pronounced.

13.3. Other Explanations

Many other effects can cause a volatility smile. One is stochastic volatility. Consider a local volatility specification using

(85) $\begin{matrix} d S_{t} = μ S_{t} d t + σ S_{t}^{α} d W_{t} & t \in [0, \infty) \end{matrix}$

with, say, α < 1. In this specification, percentage volatility will be stochastic since it depends on the random variable S_t . But often this specification does not express what is meant by models of stochastic volatility. What is captured by stochastic volatility is a situation where an additional Wiener process dB_t , possibly correlated with dW_t , affects the dynamics of percentage volatility. For example, we can write

(86) $\begin{matrix} d S_{t} = μ S_{t} d t + σ_{t} S_{t} d W_{t} & t \in [0, \infty) \end{matrix}$

(87) $\begin{matrix} d σ_{t} = a (σ_{t}, S_{t}) d t + κ σ_{t} d B_{t} & t \in [0, \infty) \end{matrix}$

where κ is the parameter representing the (constant) percentage volatility of the volatility of S_t . In this model, the volatility itself is driven by some random increments that originate in the volatility market only. These shocks are only partially correlated with the innovation terms dW_t affecting the price data.

It can be shown that stochastic volatility generates a volatility smile. In fact, with stochastic volatility, we can perform an analysis similar to the PDE with a jump process (see Lipton (2002)). The result will essentially be similar. However, it is important to emphasize that, everything else being the same, this model may be incomplete in the sense that there may not be enough instruments to hedge the risks associated with dW_t and dB_t completely, and form a risk-free, self-financing portfolio. The jump-diffusion model discussed in the previous section may entail the same problem. To the extent that the jump part and the diffusion part are affected by different processes, the model may not be complete.

13.3.1. Structural and Regulatory Explanations

Tax effects (Merton, 1976) and the capital requirements associated with carrying out-of-the-money options in options books may also lead to a smile in implied volatility. We briefly touch on the second point.

The argument involves the concept of gamma. A negative gamma position is considered to be more risky, the more out-of-the-money the option is. Essentially, negative gamma means that the market maker has sold options and delta hedged them, and that he or she is paying the gamma through the rebalancing of this hedge. If the option is deep out-of-the-money, gamma would be close to zero. Yet, if the option suddenly becomes in-the-money, the gamma could spike, especially if the option is about to expire. This may cause significant losses. Out-of-the-money options, therefore, involve substantial risk and require more capital. Due to such costs, the market maker may want to sell the out-of-the-money option at a higher price than warranted by the ATM volatility.

Read full chapter

URL:

https://www.sciencedirect.com/science/article/pii/B9780123735744500184

Time Series Analysis: Methods and Applications

Dag Tjøstheim , in Handbook of Statistics, 2012

5 Time-varying parameters and state-space models

5.1 Introduction

A very general model is

$\begin{array}{l} y_{t} = g (z_{t}, θ, ε_{t}) \end{array}$

where g is a known function, z _t is a vector of explanatory variables possibly including lagged values of y_t , θ is an unknown parameter vector, and ε_t is an error term. The parameter θ contributes to describing the dynamics of the process. In this sense, for a fixed θ , it represents the state of the system. Different values of θ , different states, may lead to quite different dynamics.

Sometimes, it is assumed that $θ = θ_{t}$ varies in time, yielding a time-varying dynamics, and then it becomes important to estimate θ _t as a function of time, not least if one's objective is to make forecasts of future values of the series {y_t }. Time dependence can be introduced in two ways, deterministic or stochastic. The first option leads to nonstationarity of {y_t }. Unless relatively strict regularity conditions are imposed on the time-variation of ${θ_{t}}$ , it is difficult to analyze in practice and to make forecasts. For instance, one may assume that ${θ_{t}}$ is constant in time except for sudden changes or breaks at (usually) unknown time points, or there may be a smooth parameterized transition, or ${θ_{t}}$ may be slowly time-varying in some specified way, for instance resulting in a slowly time-varying spectrum; see Priestley (1965), Dahlhaus (1997, 2001), Dahlhaus et al. (1999), and Dahlhaus and Subba Rao (2006).

In spite of the progress in this area, the other alternative of modelling ${θ_{t}}$ as a random process has been more common. Letting ${θ_{t}}$ be stochastic still allows {y_t } to be stationary under some regularity conditions. Moreover, employing the structural properties of ${θ_{t}}$ , if it is estimated, it can be predicted, which in turn can be used to make forecasts for {y_t }. Using such a set-up leads to so-called state-space processes, see e.g., Durbin and Koopman (2001) for an introduction. State-space models are sometimes divided into observation and parameter driven models using the terminology of Cox (1981); see also Davis et al. (2003, 2005). In observation driven models, the process ${θ_{t}}$ is generated by observations. One example is the conditional variance process {h_t } in a GARCH model, where {h_t } is an unobserved component process driven by {y_t }, although GARCH models are not usually thought of as being state-space models. In a parameter driven model, the observations are not involved in the driving mechanism for ${θ_{t}}$ . The class of stochastic volatility models ( Shephard, 2005) would be a typical example.

The case where the state space for ${θ_{t}}$ is continuous corresponds to smooth changes of the dynamics for the {y_t }-process. There is also a growing literature for the situation where the state space of ${θ_{t}}$ is discrete, and then usually finite. These models are usually called finite regime models or hidden Markov chain models, when a Markov assumption is added.

5.2 Nonlinear state-space models

Although there seems to be no consensus precisely as to what constitutes a nonlinear state-space model, several authors have considered models of the form

(34) $\begin{array}{l} y_{t} & = a (θ_{t}) + b (z_{t}) + ε_{t} \end{array}$

(35) $\begin{array}{l} θ_{t} & = c (θ_{t - 1}) + η_{t} \end{array}$

where a, b, and c are vector functions. A model that combines the z-dependence and the time-varying parameter aspect is the one in which the above observational Eq. (34) is replaced by

(36) $\begin{array}{l} y_{t} = g (z_{t}, θ_{t}) + ε_{t} \end{array}$

which, unlike (34), is nonadditive in z _t and θ _t . To our knowledge, such models have not been much treated in the literature. A related example, though, is the observation driven STAR model with a stochastically varying parameter, which has been analyzed in Anderson and Low (2006).

The conventional nonlinear state-space model (34)–(35) has been treated essentially by three approaches, the extended Kalman filter, the Kitagawa grid approximation, and Monte Carlo methods. These approaches complement each other in that different model assumptions are needed. A fourth method is based on Gaussian approximation using linearization as in the extended Kalman filter combined with Monte Carlo techniques.

The idea of the extended Kalman filter is to linearize $a (θ_{t})$ and $c (θ_{t})$ around $θ_{t | t - 1}$ and $θ_{t | t}$ , respectively. Here $θ_{t | s} = E {θ_{t} | ℱ_{s}^{z} \lor ℱ_{s}^{y}}$ , where $ℱ_{s}^{z}$ and $ℱ_{s}^{y}$ are the σ-algebras generated by ${z_{u}, u \leq s}$ and ${y_{u}, u \leq s}$ , respectively. We then have

$\begin{array}{l} a (θ_{t}) = a (θ_{t | t - 1}) + \frac{d a}{d θ} (θ_{t | t - 1}) (θ_{t} - θ_{t | t - 1}) \end{array}$

where higher-order terms are neglected. Similarly,

$\begin{array}{l} c (θ_{t}) = c (θ_{t | t}) + \frac{d c}{d θ} (θ_{t | t}) (θ_{t} - θ_{t | t}) . \end{array}$

Inserting these in (34) and (35), we have

(37) $\begin{array}{l} y_{t} = a (θ_{t | t - 1}) + \frac{d a}{d θ} (θ_{t | t - 1}) (θ_{t} - θ_{t | t - 1}) + b (z_{t}) + ε_{t} \end{array}$

and

(38) $\begin{array}{l} θ_{t + 1} = c (θ_{t | t}) + \frac{d c}{d θ} (θ_{t | t}) (θ_{t} - θ_{t | t}) + η_{t + 1} . \end{array}$

Now, using the definitions of $θ_{t | t - 1}$ and $θ_{t | t}$ , these are functions of variables observed at time t − 1 and t, and it is seen that the above equations can be identified with a linear time-varying parameter Kalman system with a time-varying intercept in the state equation, and a Kalman algorithm can then be set up.

If there is strong nonlinearity in the series, the first-order extended Kalman filter will not work too well. A useful alternative is the Kitagawa (1987) grid approximation. This method has the extra advantage that no Gaussian assumption is needed for the generating processes ${ε_{t}}$ and ${η_{t}}$ . Such an assumption is crucial for the extended Kalman algorithm, since its derivation is still based on the simple formulas for conditional Gaussian distributions.

In the absence of a Gaussian assumption, the first two conditional moments no longer describe the contemporaneous conditional structure of the model. Instead of updating the first two conditional moments, the task is to update the entire density function $f (θ_{t} | ℱ_{t - 1}^{y})$ to $f (θ_{t + 1} | ℱ_{t}^{y})$ . This is too ambitious, but, following Kitagawa, the update will be made on a finite grid of points $θ^{(0)}, \dots, θ^{(N)}$ . This implies that the input consists of the N + 1 values $f (θ_{t} = θ^{(i)} | ℱ_{t - 1}^{y})$ , $i = 0, \dots, N$ , and the problem consists in producing the update $f (θ_{t + 1} = θ^{(i)} | ℱ_{t}^{y})$ , $i = 0, \dots, N$ . Using the Kitagawa grid approach, one can update the conditional density $f (θ_{t} | ℱ_{t - 1}^{y})$ to $f (θ_{t + 1} | ℱ_{t}^{y})$ for a finite (and fixed) set of grid points $θ^{(0)}, \dots, θ^{(N)}$ using numerical integration. For complex (and multivariate) problems, one may encounter numerical instabilities. A standard way of evaluating integrals is via Monte Carlo simulation. It is, therefore, perhaps not so surprising that recently a number of Monte Carlo methods have been developed where numerical integration is avoided in updating the filter density $f (θ_{t} | ℱ_{t - 1}^{y})$ . Typically, the filter is updated for a set of stochastic values of θ _t (as opposed to a fixed set of values in the Kitagawa method). This is often combined with importance sampling techniques to obtain the so-called particle filters.

In Koopman et al. (2005), Koopman and Ooms (2006), and Menkveld et al. (2007), the authors look at a nonlinear Gaussian system and use a Gaussian approximation to obtain computationally efficient algorithms, which they then have applied on a number of economic problems

All of the above is concerned with a continuous state space. There is also a large branch of literature, see e.g., Cappé et al. (2005), on the discrete case, again distinguishing between observation and parameter driven models. The class of hidden Markov chains belong to the latter category. The hidden Markov model represents a mixture of regimes in that different AR or other parametric models result for each value of θ _t . Such a mixture can be obtained by various means and can be extended to a mixture of other types of models. The modeling is typically made directly on the conditional density of {y_t } given past values of {y_t } and possible explanatory variables. This kind of models are often called mixture models (cf. Wong and Li, 2000). Estimation of parameters in nonlinear state-space models is still in its infancy, and the nonstationary case has hardly been touched upon. Maximum likelihood methods combined with importance sampling have been considered by Shephard and Pitt (1997) and by Durbin and Koopman (2000) and by Davis and Rodrigues-Yam (2005). Work on the central limit theorem, asymptotic distributions has been done by Bickel et al. (1998), Jensen and Petersen (1999), and Douc et al. (2004). A full overview of the statistical inference in hidden Markov chains can be found in Cappé et al. (2005).

Read full chapter

URL:

https://www.sciencedirect.com/science/article/pii/B9780444538581000041

Dynamic nonparametric filtering with application to volatility estimation

Ming-Yen Cheng , ... Vladimir Spokoiny , in Recent Advances and Trends in Nonparametric Statistics, 2003

4 Applications to volatility estimation

Let S ₁,…, S_T be the prices of an asset or yields of a bond. The return of such an asset or bond process is usually described via the conditional heteroscedastic model:

(9) $R_{t} = σ_{t} ε_{t}$

where R_t = log(S_t /S _{t − 1}), σ_t is the predictable volatility process and ε ^' _t s are the standardized innovations.

4.1 Connections with filtering problems

The volatility is associated with the notion of risks. For many purposes in financial practice, such as portfolio optimization, option pricing and prediction of Value-at-Risk, one would be interested in predicting the volatility σ_t based on the past observations S ₁,…, S _{t − 1} of the asset process. The distribution of the innovation process can be skewed or have heavy tails. To produce a robust procedure, following [15], we consider the power transformation Y_t = |R_t | ^γ for some γ. Then, the model (9) can be written in the following semi-martingale form:

(10) $Y_{t} = C_{γ} σ_{t}^{γ} + D_{γ} σ_{t}^{γ} ξ_{t} \equiv f_{t} + v_{t} ξ_{t}$

with C_γ = E |ε_t | ^γ , D ² _γ = Var |ε_t | ^γ and ξ_t = D ^{− 1} _γ (|ε_t | ^γ − C_γ ). Mercurio and Spokoiny [15] argued that the choice γ = 1/2 leads to a nearly Gaussian distribution of the 'innovations' ξ_t , when ε_t ~ N(0,1). In particular, $E e^{u ξ_{t}} \leq e^{u^{2} / (2 a)}$ with a ≈ 1.005, a condition in Lemma 1.

Now the original problem is clearly equivalent to estimating the drift coefficient f_t = C_γσ^γ from the 'observations' Y_s = |R_s | ^γ , s = 1,…, t − 1. The semi-martingale representation (10) is a specific case of the model (1) with v_t = D_γσ^γ _t . Hence, the techniques introduced in Section 3 are still applicable.

There is a large literature on the estimation of volatility. In addition to the famous parametric models such as ARCH and G ARCH (see [10] and [12 ]), stochastic volatility models have also received a lot of attention (see, for example, [ 1], [2] and [4] and references therein). We here consider only the ARCH and GARCH models in addition to the nonparametric methods (MA and ES) in Section 3.

4.2 Choice of orders of ARCH and GARCH

Commonly used parametric techniques for modeling volatility are ARCH [6] and GARCH [3] models. See [10] and [12] for an overview of the field. In the current context, ARCH model assumes the following autoregressive structure:

$E [Y_{t} | ℱ_{t - 1}] = θ_{1} Y_{t - 1} + \dots + θ_{p} Y_{t - p}$

The coefficients θ = (θ ₁,…, θ_p )^⊤ can be estimated by using the least-squares approach:

${\hat{θ}}_{p} = {(\sum_{s = t_{0}}^{t - 1} X_{s, p} X_{s, p}^{⊤})}^{- 1} \sum_{s = t_{0}}^{t - 1} X_{s, p} X_{s}$

with X _s,p = (Y _{s − 1},…, Y _{s − 1})^⊤ The estimate ${\hat{f}}_{t, p}$ is then defined by ${\hat{f}}_{t, p} = X_{t, p}^{⊤} {\hat{θ}}_{p}$ . As in Section 3, the order p can be chosen by minimizing the prediction error:

(11) $\hat{P} = \underset{p \leq p *}{arg inf} \sum_{s = t_{0}}^{t - 1} {(Y_{s} - {\hat{f}}_{s, p})}^{2}$

The upper bound p* should be sufficiently large to reduce possible approximation errors. To facilitate computation, t in (11) can be replaced by T, the length of the time series in the in-sample period. The approach is a global choice of the order of an ARCH model.

The volatility process σ_t in GARCH(p, q) is modeled as

$σ_{t}^{2} = c_{0} + α_{1} σ_{t - 1}^{2} + \dots + α_{p} σ_{t - p}^{2} + β_{1} R_{t - 1}^{2} + \dots + β_{q} R_{t - q}^{2} .$

The coefficients α_j ,β_k can be estimated by using the maximum likelihood method. See for example Fan and Yao [10]. The estimates ${\hat{σ}}_{j}$ and ${\hat{β}}_{k}$ are then used to construct the filter

${\hat{f}}_{t, p, q} = C_{γ} {(\sum_{j = 1}^{p} {\hat{α}}_{j} σ_{t - j}^{2} + \sum_{k = 1}^{p} {\hat{β}}_{k} R_{t - k}^{2})}^{γ / 2} .$

The order (p,q) can be chosen to minimize a quantity that is similar to (11).

GARCH(1,1) is one of most frequently used models in volatility estimation in financial time series. It has been observed to fit well many financial time series. To simplify the computation efforts, we mainly focus on the GARCH(1,1) rather than general GARCH(p, q) in our simulation studies.

4.3 Simulated financial time series

We simulated time series from the volatility model

$\begin{array}{l} GARCH (1, 1) : & σ_{t}^{2} & = & 0.00005 + 0.85 σ_{t - 1}^{2} + 0.1 R_{t - 1}^{2} \\ GARCH (1, 3) : & σ_{t}^{2} & = & 0.00002 + 0.8 σ_{t - 1}^{2} + 0.02 R_{t - 1}^{2} + 0.05 R_{t - 1}^{2} + 0.11 R_{t - 3}^{2} . \\ ARCH (2) : & σ_{t}^{2} & = & 0.00085 + 0.1 R_{t - 1}^{2} + 0.05 R_{t - 2}^{2} . \end{array}$

As shown in (10), the problem of volatility estimation is closely related to the filtering problems in Section 3. Therefore, the measure of effectiveness of each method can be gauged by MAFE and MSFE in Section 3.4. Tables 2 and 3 summarize the result for γ = 0.5 and γ = 2 in a similar format to Table 1. Table 4 summarizes the results using "rank" as a measure. For example, for the GARCH(1,3) model (second block), using untransformed data transformation (right block), in terms of MAFE, among 500 simulations, the GARCH(1,1), ES and AR methods ranked respectively, 334, 162 and 4 times in the first place, 159, 309 and 32 times in the second place and 7, 29 and 464 times in the third place.

Table 2. Relative MAFE performance. ES and AR filtering of Y_t = |R_t |^1/2. Empirical mean (first row), sample standard deviation (second row), first quartile (third row), median (fourth row), and third quartile (fifth row) of MAFE ratios.

Model	Relative to GARCH(1,1)		Relative to ideal counterparts
Model	ES global	AR global	ES global	AR global
GARCH(1,1)	2.898	3.078	1.026	1.095
	1.747	2.045	0.060	0.164
	1.816	1.900	0.998	1.000
	2.464	2.544	1.006	1.060
	3.381	3.564	1.050	1.187
GARCH(1,3)	1.485	1.610	1.034	1.063
	0.246	0.304	0.070	0.122
	1.314	1.401	1.000	1.000
	1.482	1.571	1.000	1.034
	1.639	1.794	1.051	1.120
ARCH(2)	2.914	1.330	1.000	1.061
	1.448	0.731	0.000	0.131
	1.899	0.797	1.000	1.000
	2.575	1.139	1.000	1.000
	3.473	1.626	1.000	1.111

Table 3. Relative MAFE performance. ES and AR filtering of Y_t = R ² _t . Empirical mean (firstrow), sample standard deviation (second row), first quartile (third row), median (fourth row), and third quartile (fifth row) of MAFE ratios.

Model	Relative to GARCH(1,1)		Relative to ideal counterparts
Model	ES global	AR global	ES global	AR. global
GARCH(1,1)	2.119	2.789	1.115	1.171
	.413	1.823	0.152	0.249
	1.283	1.655	1.010	1.000
	1.815	2.340	1.055	1.101
	2.449	3.318	1.165	1.308
GARCH(1,3)	1.111	2.147	1.132	1.179
	0.222	2.381	0.181	0.291
	0.971	1.448	1.000	1.000
	1.092	1.778	1.070	1.108
	1.220	1.171	1.181	1.325
ARCH(2)	2.484	0.964	1.002	1.152
	1.229	0.562	0.032	0.353
	1.632	0.565	1.000	1.000
	2.166	0.816	1.000	1.000
	2.939	1.237	1.000	1.213

Table 4. Rank performance of GARCH(1,1), ES global, and AR global.

Model	Filtering Y_t = \|R_t \|^1/2			Filtering Y_t = R ² _t
Model	GARCH(1,1)	ES	AR	GARCH (1,1)	ES	AR
GARCH(1,1)	487	9	4	451	42	7
	11	286	203	40	383	77
	2	205	293	9	75	416
GARCH(1,3)	491	7	2	334	162	4
	8	347	145	159	309	32
	1	146	353	7	29	64
ARCH(2)	299	0	201	183	0	317
	197	4	299	310	8	182
	4	496	0	7	492	1

First, of all, from Tables 2 and 3, the ES and AR with their parameters chosen from data perform nearly as well as their corresponding estimators using the ideal filtering parameters. This is consistent with our theoretical result, which is the theme of our study. The GARCH(1,1) and ES estimation methods are quite robust. When the true model is GARCH(1,1), the GARCH(1,1) method performs the best, as expected, followed by ES global and then AR global. When the true model is the GARCH(1,3), which can still reasonably be well approximated by a GARCH(1,1) model, the performance of the GARCH(1,1) method and the ES method is nearly the same, though the GARCH(1,1) method performs somewhat better. It is clear that the relative performance of the GARCH(1,1) method gets deteriorated from the GARCH(1,1) model to the GARCH(1,3) model. When the series comes from the ARCH(2) model, the AR filter performs the best, as expected.

4.4 Applications

We apply the GARCH(1,1) approach ${\hat{f}}_{t, 1, 1}$ the adaptive global ES smoothing ${\hat{f}}_{t, \hat{λ}}$ and the global AR smoothing ${\hat{f}}_{t, \hat{p}}$ to estimate the volatility of the log-returns of the S&P500 index and the three-month Treasury Bills. For the ES and AR approaches, we consider the square root transformation Y_t = |R_t |^1/2, which yields more stable estimates than the square transformation Y_t = R ² _t . The order of the AR filtering was searched among the candidate set $P = \{1 \dots 15\}$ and the collection of grids of ES smoothing parameters was taken to be $Λ = \{[5 \times 1 . 2^{k}], k = 0, 1, \dots, 15\}$ . For the real data, we don't know the true volatility. Hence, we use the Average of Prediction Errors (APE) as a measure of effectiveness:

$APE 1 = \frac{1}{T - t_{0} + 1} \sum_{t = t_{0}}^{T} {(| R_{t} | - C_{1} {\hat{σ}}_{t})}^{2} and APE 2 = \frac{1}{T - t_{0} + 1} \sum_{t = t_{0}}^{T} | R_{t}^{2} - {\hat{σ}}_{t}^{2} |^{2} .$

As noted in [8], the prediction errors consist of stochastic errors and estimation (filtering) errors. The former is independent of estimation methods and dominates the latter. Therefore, a small percentage of improvement in prediction errors implies a large improvement- in the filtering error.

The three-month Treasury bills data consist of weekly observations (Fridays' closing) of interest rates of the three-month Treasury bills, from August 1954 to December 1999. The rates are based on quotes at the official close of the U.S. government securities market on a discount basis. To attenuate the time effects, we divided the entire series into four sub-series. The gaps between the time periods are the length t ₀ used for the subsequent series. The volatility is computed based on the difference of the yields series. The relative performance of global ES and global AR smoothing and GARCH(1,1) is given in Table 5. The values are smaller than one most of the time and are sometimes as small as 0.696. This implies that with the adaptive choice of filtering parameters, the exponential smoothing and the autoregressive model outperform the GARCH(1,1) model for the periods studied. Figure 4 depicts first one hundred lags of the autocorrelation of the absolute returns and the absolute returns divided by the standard deviations estimated by the three methods. The horizontal lines indicate the 95% confidence limits. All of the three estimation methods explain well the volatility: the standardized returns rarely exhibit significant correlations.

Table 5. Relative prediction performance for yields of three-month Treasury Bills over four different periods

Time period	ES ${\overset{⌢}{f}}_{t,}_{\overset{⌢}{λ}}$ relative to GARCH(1,1)		AR ${\overset{⌢}{f}}_{t,}_{\overset{⌢}{p}}$ relative to GARCH(1,1)
Time period	APE1	APE2	APE1	APE2
12/09/55–07/02/65	1.012	1.038	1.051	0.979
06/09/67–12/31/76	0.956	0.889	0.983	0.858
12/08/78–07/01/88	0.772	0.696	0.840	0.724
06/08/90–12/31/99	1.004	0.879	0.989	0.948

The S&P500 data consist of the daily closing of the Standard and Poor 500 index. The volatility estimation methods are applied to the data in the time periods 03/08/90–18/07/94 and 08/12/94–20/11/98. Again the AR and ES methods with our adaptive choice of filtering parameters provide satisfactory estimate of the underlying volatility. The ACF plots of the standardized log-returns (not shown here, similar to Figure 4) indicate success of the three methods. The relative performance against GARCH(1,1) is shown in Table 6. Again, the ES and AR filters with filtering parameters chosen by data outperform the GARCH(1,1).

Table 6. Relative prediction performance for the Standard and Poor 500 index over two time periods.

Time period	ES ${\overset{⌢}{f}}_{t, \overset{⌢}{λ}}$ relative to GARCH(1,1)		AR ${\overset{⌢}{f}}_{t, \overset{⌢}{p}}$ relative to GARCH(1,1)
Time period	APE1	APE2	APE1	APE2
03/08/90–18/07/94	0.950	0.883	1.002	0.983
08/12/94–20/11/98	0.993	0.952	1.031	0.898

The adaptive local ES filter and local AR filter were also applied to the above two data sets. We do not report the details here to save space. They both perform reasonably well. However, the local ES method does not perform as well as global one. The local AR filter performs quite well and is often better than the global AR filter, for the two financial series data that we examined.

Read full chapter

URL:

https://www.sciencedirect.com/science/article/pii/B9780444513786500211

holtermannupoester.blogspot.com

Source: https://www.sciencedirect.com/topics/mathematics/stochastic-volatility-model

Stochastic Volatility Models With Longmemory in Discrete and Continuous Time