Deep Simulation

We have understood controlled neural differential equations as structural roof of many sorts of neural networks. So far we have abstractly considered the initial value as input, even though it became clear in applications as Deep Portfolio Optimization or Deep Hedging that the dependence on control variable is equally interesting. In the sequel we shall investigate this further.

Consider a controlled neural differential equation $$ dX_t = \sum_{i=0}^d V_i(X_t) d u^i(t) \, , \; X_0 = y \in \mathbb{R}^m $$ for neural networks $ V_i : \mathbb{R}^m \to \mathbb{R}^m $, $ i = 1,\ldots,d$ and $ d $ continuous, finite variation controls $ u^i $. We are interested in the behaviour of the map $$ \text{ control $u$ } \mapsto \text{ solution process } X \, . $$ Taylor expansion along controls gives a satisfying answer to this question. Let us study this in some detail.

By the fundamental theorem of calculus we obtain for every smooth test function $ f $ that $$ f(X_t) = x + \sum_{i=0}^d \int_0^t V_i f(X_s) du^i(s) $$ for $ 0 \leq t $. This equation can be inserted into itself which leads to a generalized version of Taylor expansion for controlled ODEs $$ \sum_{k=0}^M \sum_{(i_1,\ldots,i_k) \in {\{0,\ldots,d\}}^k} V_{i_1} \ldots V_{i_k} f(x) \int_{0 \leq t_1 \leq \ldots \leq t_k \leq t} du^{i_1}(t_1) \dots du^{i_k}(t_k) + R_M(f,t) \, , $$ for $ M \geq 0 $, with the remainder satisfying $$ R_M(f,t) = \sum_{(i_1,\ldots,i_{M+1}) \in {\{0,\ldots,d\}}^{M+1}} \int_{0 \leq t_1 \leq \ldots \leq t_{M+1} \leq t} V_{i_1} \ldots V_{i_{M+1}} f(X_{t_{1}}) du^{i_1}(t_1) \dots du^{i_k}(t_{M+1}) \, . $$ Notice that the vector field $ V $ acts on test function $ f $ as transport operator, i.e. $$ Vf(x) : = df(x)(V(x)) $$ for $ x \in \mathbb{R}^m $.

Let us put this into an algebraic setup to understand better the structure of iterated integrals. Consider an algebra of non-commutative power series generated from $ d +1 $ non-commuting (free) indeterminates $ e_0,\ldots,e_d $ (this mimicks the action of the vector fields, which generically do not commute). Every element of this algebra can be written as $$ Z_t = \sum_{k=0}^\infty \sum_{(i_1,\ldots,i_k) \in {\{0,\ldots,d\}}^k} a_{i_1,\ldots,i_k} e_{i_1} \dots e_{i_k} \, , $$ without any further convergence assumption. This algebra is denoted by $ \mathbb{A}_{d+1} $. Dually we can consider the free vector space on finite words in letters $ \{ 0, \ldots, d \} $. As a subspace of $ \mathbb{A}_{d+1} $ we distinguish the Lie algebra $ \mathfrak{g} $ generated by $ e_0,\ldots,e_d$. Its associated exponential image is denoted by $ G $ and is a group.

By Chow's theorem 'every' point in $G$ can be represented as $$ \sum_{k \geq 0} \sum_{(i_1,\ldots,i_k) \in {\{0,\ldots,d\}}^k} \int_{0 \leq t_1 \leq \dots \leq t_k \leq t} d u^{i_1}(t_1)\ldots d u^{i_k}(t_k) e_{i_1} \ldots e_{i_k} $$ for some continuous finite variation paths $ u $ taking values in $ \mathbb{R}^{d+1} $ and some time $ t > 0$. Of course each expression of the above type lies in $G$. This actually means that iterated integrals 'fill' the group $G$ and therefore $ G $ constitutes all algebraic relations among iterated integrals.

In order to apply the above theory to the actual calculation of controlled ODEs one has to make the calculation of iterated integrals tractable, which is non-trivial since they constitute an infinite dimensional system. In order to calculate iterated integrals up to order $M$ in dimension $ d $ actually $ \frac{(d+1)^{M+1}-1}{d} $ quantities have to be calculated which is exponential in $ M $.

An elegant way out might be to consider lower dimensional representations of iterated integrals which share their properties. This will be achieved by an application of the Johnson-Lindenstrauss Theorem, whose proof we shall sketch in the sequel.

Let us first state the formal assertion: for any $ 0 < \epsilon < 1 $ and any integer $ N \geq 1 $, any dimension $ k $ with $$ k \geq \frac{24 \log N}{3 \epsilon^2 - 2 \epsilon^3} $$ and any set $ A $ of $ N $ points in some $ \mathbb{R}^m $ there exists a map $ \pi: \mathbb{R}^m \to \mathbb{R}^k $ such that $$ {|| x - y||}^2 (1-\epsilon) \leq{|| \pi(x) -\pi( y) ||}^2 \leq {|| x - y ||}^2 (1+\epsilon) $$ for all point $ x,y \in A $, i.e. the geometry of $ A $ is almost preserved after the projection. Norms are $ L^2$ norms here.

The proof consists of three steps: let $ R $ denote a random matrix with independent identically distributed standard normal entries of dimension $ k \times m $.

  1. For $ u \in \mathbb{R}^m $ and $ v := \frac{1}{\sqrt{k}} R u $ it holds that $$ E[{||v||}^2] = {||u||}^2 \, , $$ which is just a calculation with independent standard normals. Notice also that the entries of $ v $ are independent standard normal random variables with variance $ \frac{{||u||}^2}{k} $
  2. It holds that $$ P\big( {||v||}^2 \geq {||u||}^2 (1+\epsilon) \big) \leq \frac{1}{N^2} \, . $$ Indeed $$ P\big( {||v||}^2 \geq {||u||}^2 (1+\epsilon) \big) = P\big (\exp(\lambda Z) \geq \exp(\lambda (1+\epsilon) k) \big ) $$ with a $ \chi^2 $ random variable of dimension $ k $. By Markov's inequality we arrive at $$ P\big( {||v||}^2 \geq {||u||}^2 (1+\epsilon) \big) \leq {\big( \frac{1}{\sqrt{1-2\lambda}\exp(\lambda(1+\epsilon))}\big)}^k \, . $$ For $ \lambda = \frac{\epsilon}{2(1+\epsilon)} $ we obtain $$ P\big( {||v||}^2 \geq {||u||}^2 (1+\epsilon) \big) \leq \big( (1+\epsilon)\exp(-\epsilon) \big)^{k/2} \, , $$ which yields by $ \log (1+a) \leq a - a^2/2 + a^3/3 $ that $$ P\big( {||v||}^2 \geq {||u||}^2 (1+\epsilon) \big) \leq \frac{1}{N^2} \, . $$ In an completely analogous manner (replace $ \lambda $ by $ - \lambda $ and arrive for $ \lambda = \frac{\epsilon}{2(1-\epsilon)} $ at an upper bound $$ \big( (1-\epsilon)\exp(-\epsilon) \big)^{k/2} \, , $$ which in turn can be estimated by $ \log (1-a) \leq -a - a^2/2 $. Whence the result $$ P\big( {||v||}^2 \geq {||u||}^2 (1+\epsilon) \text{ or } {||v||}^2 \leq {||u||}^2 (1-\epsilon) \big) \leq \frac{2}{N^2} \, . $$
  3. Summing this up over all possible combination of two points yields that $$ P\big(\text{ The ratio of norms of distances lies in } [1-\epsilon,1+\epsilon] \big) \geq \frac{1}{N} , $$ whence there exists a matrix $ R $ which does the job.

This result can now be applied to the differential equation in $ \mathbb{A}_{d+1} $ which generates the iterated integrals, namely $$ d Z_t = \sum_{i=0}^d Z_t e_i du^i(t) \, . $$ The random projection of $ Z_t $ will almost preserve the geometry of the iterated integrals (which is necessary for the quality of the regression), and the random projection can be calculated by means of a random controlled differential equation, which facilitates calculations tremendously.

In the sequel we shall see two instances where this is applied: first we learn an unknown stochastic differential equation.

In [99]:
import numpy as np
import scipy as sp
from matplotlib import pyplot as plt
In [100]:
d=2
M=30 

def nilpotent(M):
    B = np.zeros((M,M))
    for i in range(2,M):
        B[i,i-1]=1.0
    return B

def canonical(i,M):
    e = np.zeros((M,1))
    e[i,0]=1.0
    return e

def vectorfieldoperator(state,increment):
    d = np.shape(increment)[0]
    N = np.shape(state)[0]
    direction = np.zeros((N,1))
    for i in range(d):
        helper = np.zeros((N,1))
        for j in range(N):
            helper[j]=np.sin((j+1)*state[j,0])
        direction=direction + helper*increment[i]
    return direction

def vectorfield2d(state,increment):
    return np.array([(2.0*np.sqrt(state[1]**2))**0.7,1.0*state[1]])*increment[0]+np.array([(2.0*np.sqrt(state[1]**2))**0.7,0.0*state[1]])*increment[1]

def vectorfield3d(state,increment):
    return np.array([np.sin(state[0])*np.exp(-state[2]),np.cos(state[1]),-state[2]*state[1]])*increment[0]+np.array([np.sin(state[1]),np.cos(state[0]),-state[0]*state[1]])*increment[1]
def vectorfield(state,increment):
    return 5*np.exp(-state)*increment[0] + 5*np.cos(state)*increment[1]
def randomAbeta(d,M):
    A = []
    beta = []
    for i in range(d):
        B = 0.0*nilpotent(M) + np.random.normal(0.0,0.5,size=(M,M)) 
        B = np.random.permutation(B)
        A = A + [B]
        beta = beta + [canonical(i,M)+np.random.normal(0.0,0.1,size=(M,1))]
    return [A,beta]

Abeta = randomAbeta(d,M)
A = Abeta[0]
beta = Abeta[1]

def sigmoid(x):
    return np.tanh(x)

def reservoirfield(state,increment):
    value = np.zeros((M,1))
    for i in range(d):
        value = value + sigmoid(np.matmul(A[i],state) + beta[i])*increment[i]
    return value
In [112]:
class SDE:
    def __init__(self,timehorizon,initialvalue,dimension,dimensionBM,dimensionR,vectorfield,timesteps,):
        self.timehorizon = timehorizon
        self.initialvalue = initialvalue # np array
        self.dimension = dimension
        self.dimensionBM = dimensionBM
        self.dimensionR = dimensionR
        self.vectorfield = vectorfield
        self.timesteps = timesteps

    def path(self):
        BMpath = [np.zeros(self.dimensionBM)]
        SDEpath = [np.array([1.0, self.initialvalue])]
        for i in range(self.timesteps):
            helper = np.random.normal(0,np.sqrt(self.timehorizon/self.timesteps),self.dimensionBM)
            BMpath = BMpath + [BMpath[-1]+helper]
            SDEpath = SDEpath + [np.exp(-0.0*self.timehorizon/self.timesteps)*(SDEpath[-1]+self.vectorfield(SDEpath[-1],helper))]

        return [BMpath, SDEpath]
    
    def anypath(self):
        BMpath = [np.zeros(self.dimensionBM)]
        SDEpath = [np.array([1.0, self.initialvalue])]#[np.ones((self.dimension,1))*self.initialvalue]
        
        for i in range(self.timesteps):
            helper = np.cos(BMpath[-1]*50)*self.timehorizon/self.timesteps#np.random.normal(0,np.sqrt(self.timehorizon/self.timesteps),self.dimensionBM)
            BMpath = BMpath + [BMpath[-1]+helper]
            SDEpath = SDEpath + [np.exp(-0.0*self.timehorizon/self.timesteps)*(SDEpath[-1]+self.vectorfield(SDEpath[-1],helper))]
            
        return [BMpath, SDEpath]
        
    def reservoir(self,BMpath):
        reservoirpath = [canonical(0,self.dimensionR)*self.initialvalue]
        for i in range(self.timesteps):
            increment = BMpath[i+1]-BMpath[i]
            reservoirpath = reservoirpath + [np.exp(-0.0*self.timehorizon/self.timesteps)*(reservoirpath[-1]+reservoirfield(reservoirpath[-1],increment))]
        return reservoirpath    
        
In [113]:
Sabr = SDE(1,1.0,2,d,M,vectorfield2d,10000)
training = Sabr.path()
In [114]:
f1,p1=plt.subplots(2,1,figsize=(6,6),sharey=True)
p1[0].plot(training[0][:10000],'r')
p1[1].plot(training[1][:10000],'g')
#plt.savefig('trainingpath.pdf')
plt.show()
In [115]:
BMpath=training[0]
Y = training[1]
Ydata = np.squeeze(Y)
Ydatadiff = np.diff(Ydata,axis=0)
Ytrain = np.concatenate((Ydata[:1000],Ydatadiff[:1000:1]),axis=0)
np.shape(Ytrain)
Out[115]:
(2000, 2)
In [116]:
X=Sabr.reservoir(BMpath)
np.shape(X)
Xdata = np.squeeze(X)
Xdatadiff = np.diff(Xdata,axis=0)
Xtrain=np.concatenate((Xdata[:1000],Xdatadiff[:1000:1]),axis=0)
np.shape(Xtrain)
Out[116]:
(2000, 30)
In [117]:
from sklearn import linear_model
import pandas as pd
lm = linear_model.LinearRegression()
model = lm.fit(Xtrain,Ytrain)
plt.plot(model.predict(Xtrain),'r')
plt.plot(Ytrain,'b')
plt.show()
model.score(Xtrain,Ytrain)
model.coef_
Out[117]:
array([[ 1.00012792,  0.55387202, -0.97116232,  0.92713923,  1.56673332,
        -0.43665076, -1.29558234, -0.24080606, -0.5466374 , -0.33390654,
        -0.38413194, -0.33618094, -1.33634164,  0.02514181, -0.31821885,
        -0.72255828, -0.24063695, -0.10196031, -0.46445299,  0.14928589,
        -0.9466267 ,  0.34732252, -0.46316145, -1.40821566,  0.07139127,
        -0.38030789, -0.44398579,  0.27376517,  0.8513087 , -0.52421979],
       [ 1.00040623,  0.73596326, -0.78360438,  0.40981875,  0.24787034,
         0.07532344, -0.4046941 ,  0.19191621, -0.1745398 ,  0.25490734,
         0.63030905, -0.57391784, -0.39330352, -0.19727881, -0.30068266,
        -0.50753692, -0.20748391,  0.09986918, -0.36194681, -0.03092786,
        -0.47247133,  0.38407192, -0.02708139, -0.16990073,  0.26360567,
        -0.46333975, -0.44003509,  0.48343491,  0.24871577, -0.02894263]])
In [118]:
f,p=plt.subplots(2,1,figsize=(6,6),sharey=True)

N=2

for i in range(N):
    p[i].plot(model.predict(Xdata[:2000])[:,i],'b')
    p[i].plot(Ydata[:2000][:,i],'g')
plt.savefig('training.pdf')
plt.show()
In [119]:
generalization = Sabr.path()
BMpath = generalization[0]
Y = generalization[1]
Ydata = np.squeeze(Y)
In [120]:
X = Sabr.reservoir(BMpath)
Xdata = np.squeeze(X)

N=2

fig,p=plt.subplots(N, 1, figsize=(6,6),sharex=True, sharey=True)
for i in range(N):
    p[i].plot(model.predict(Xdata[:1000])[:,i],'b')
    p[i].plot(Ydata[:1000][:,i],'g')
#plt.savefig('generalization.pdf')
plt.show()

An second very interesting strand of applications is to build econometric models, i.e. data-driven scenario generators which are estimated by regression.

In [24]:
import numpy as np
from scipy import linalg
from matplotlib import pyplot as plt
import pandas as pd
import pandas_datareader.data as web
In [25]:
end = '2015-01-01'
start = '2007-01-01'
get_px = lambda x: web.DataReader(x, 'yahoo', start=start, end=end)['Adj Close']

symbols = ['GOOG', 'AAPL','TLT','MSFT']
# raw adjusted close prices
data = pd.DataFrame({sym:get_px(sym) for sym in symbols})
# log returns
lrets = np.log(data/data.shift(1)).dropna()
In [121]:
lrets.loc['2007-01-04'].values
Out[121]:
array([ 0.02195283,  0.032963  , -0.00167618,  0.00604542])
In [122]:
Omega = (lrets
         .rolling(30)
         .cov()
         .dropna()) # this will get you to 66 windows instead of 125 with NaNs
In [123]:
dates = lrets.index#Omega.index.get_level_values(0) # or just the index of your base returns
In [124]:
dates = dates[30:]
In [125]:
covdata = dict(zip(dates, [Omega.loc[date].values for date in dates]))
In [126]:
marketBM = dict(zip(dates, [np.matmul(np.linalg.inv(linalg.sqrtm(Omega.loc[date].values)),lrets.loc[date].values) for date in dates]))
In [127]:
plt.plot([marketBM[date][0] for date in dates])
plt.show()
In [128]:
K=4
BMmarketpath = np.zeros((len(dates),K))
mean=np.zeros(K)
for k in range(K):
    mean[k] = np.mean([marketBM[l][k] for l in dates])
In [129]:
for i in range(K):
    BMmarketpath[:,i] = np.cumsum([np.sqrt(1/250)*(marketBM[date][i]-mean[i])
                                                   for date in dates])
plt.plot(BMmarketpath)
plt.show()
In [141]:
Kmarket = 4
Ymarket = np.zeros((len(dates),Kmarket)) 
for i in range(Kmarket):
    Ymarket[:,i]=np.cumsum([lrets.loc[date].values[i] for date in dates])
Ymarket = np.exp(Ymarket)
In [142]:
plt.plot(Ymarket)
plt.show()
In [132]:
Ymarketdiff = np.diff(Ymarket,axis=0)
Ymarkettrain = np.concatenate((Ymarket[:1000],Ymarketdiff[:1000:1]),axis=0)
np.shape(Ymarkettrain)
Out[132]:
(2000, 4)
In [133]:
d=K
M=150

def nilpotent(M):
    B = np.zeros((M,M))
    for i in range(2,M):
        B[i,i-1]=1.0
    return B

def canonical(i,M):
    e = np.zeros((M,1))
    e[i,0]=1.0
    return e

def randomAbeta(d,M):
    A = []
    beta = []
    for i in range(d):
        #B = 0.1*np.identity(M)+np.random.normal(0.0,.5,size=(M,M))
        B = np.random.normal(0.0,0.5,size=(M,M)) # 0.1 for scen-gen, 1.5 for SABR
        B = np.random.permutation(B)
        #B = np.identity(M)
        #B = sp.linalg.sqrtm(np.matmul(B,np.transpose(B)))
        A = A + [B]
        beta = beta + [canonical(i,M)+np.random.normal(0.0,0.5,size=(M,1))]
    return [A,beta]

Abeta = randomAbeta(d,M)
A = Abeta[0]
beta = Abeta[1]

def sigmoid(x):
    return np.tanh(x)

def reservoirfield(state,increment):
    value = np.zeros((M,1))
    for i in range(d):
        value = value + sigmoid(np.matmul(A[i],state) + beta[i])*increment[i]
    return value
In [134]:
BMmarketpathlist = [BMmarketpath[i,:] for i in range(len(dates))]
Reservoir = SDE(1,0.5,2,K,M,vectorfield2d,len(dates)-1)
X=Reservoir.reservoir(BMmarketpathlist)
np.shape(X)
Xdata = np.squeeze(X)
Xdatadiff = np.diff(Xdata,axis=0)
Xtrain=np.concatenate((Xdata[:1000],Xdatadiff[:1000:1]),axis=0)
np.shape(Xtrain)
Out[134]:
(2000, 150)
In [135]:
from sklearn import linear_model
import pandas as pd
lm = linear_model.LinearRegression()
model = lm.fit(Xtrain,Ymarkettrain)
plt.plot(model.predict(Xtrain),'r')
plt.plot(Ymarkettrain,'b')
plt.show()
model.score(Xtrain,Ymarkettrain)
model.coef_
Out[135]:
array([[ 3.87425924e-01,  5.32648960e-03, -1.96880709e-02,
         2.52835122e-01, -1.64763329e-02, -8.92314000e-02,
        -3.33018458e-02, -2.35884695e-02, -8.61521405e-02,
        -1.43609911e-01,  9.20825754e-02, -4.26181724e-02,
        -9.98374619e-02, -1.24301632e-01, -1.39950926e-01,
         1.01470291e-01, -1.24969025e-01,  5.22818689e-02,
        -1.65588396e-01,  4.31918438e-02, -9.13134832e-02,
         3.45690687e-03,  3.18623426e-02, -2.24204371e-02,
        -2.49312511e-02,  1.30594129e-01, -1.96040147e-03,
        -7.04446092e-02, -1.85248801e-01, -9.57481680e-02,
        -1.49362091e-02,  1.40466049e-01, -3.20604405e-02,
        -6.93874805e-02,  2.89453956e-02, -2.33432277e-02,
        -2.31122435e-01,  1.43090223e-01, -1.09219853e-01,
        -9.52942561e-02,  4.22228256e-02, -7.65850294e-02,
        -1.17715170e-01,  4.00837332e-02, -4.66936370e-03,
        -3.62145808e-02,  8.64870272e-02, -2.91464473e-02,
        -2.03345835e-01,  2.11719054e-01,  1.19724381e-02,
         6.91977614e-02,  2.22857427e-01,  1.18312269e-01,
        -1.42051643e-01, -2.68333925e-01, -1.67474242e-01,
        -1.68027558e-02, -9.95033925e-02, -2.31630531e-02,
         1.13611310e-01, -5.26135938e-03, -1.92299648e-01,
         3.67076738e-02, -2.86069650e-03,  1.29837136e-01,
         1.04399187e-01, -1.46022632e-01,  2.09892226e-02,
        -3.62316794e-02,  1.01983477e-02, -5.43894027e-02,
        -1.42983824e-01,  1.65131118e-01, -9.56667854e-02,
         1.09194021e-01, -1.19504233e-01,  5.91681569e-03,
        -1.88037790e-01, -1.14629572e-01, -1.08601208e-02,
         1.14791424e-01, -7.12450279e-02,  5.29549764e-02,
        -1.41886648e-01,  1.17968175e-01,  1.48981962e-01,
         2.18427156e-02,  1.09157012e-01,  4.82874284e-02,
         6.51042014e-02, -2.22000001e-02,  9.64544898e-02,
        -6.36452984e-02,  7.91956821e-02, -8.52629260e-02,
        -1.62389101e-02, -5.74210516e-02, -3.14517798e-02,
         2.48157306e-02,  7.70363495e-02, -1.44419076e-01,
         6.32327763e-02,  6.29867911e-02,  4.71182135e-02,
         1.55965425e-01,  3.51069571e-02, -5.46220756e-03,
         1.02490675e-01, -7.08994253e-02,  5.73203677e-02,
         1.25688590e-01, -3.19018088e-02, -8.56503979e-02,
         7.30878012e-02, -2.30468947e-02, -4.88634854e-02,
         7.83423920e-02, -1.41198876e-01,  3.60121540e-02,
         3.20544485e-02, -4.07535431e-02,  1.75268368e-01,
        -1.53373721e-02,  6.23978290e-02, -5.77027370e-02,
         1.69583819e-01,  1.62756440e-01,  5.28125852e-02,
        -2.05673578e-01,  1.62468090e-01, -1.46212848e-01,
         7.60689937e-02, -1.91741899e-02, -5.29541483e-02,
        -2.06552672e-02, -2.59675048e-02, -1.88793439e-02,
         3.40968009e-02, -1.59187714e-01, -6.73754578e-03,
        -1.14189825e-01, -4.65978956e-02,  7.25952349e-02,
         2.00464030e-01, -1.42987836e-01, -2.53991484e-02,
        -1.09031305e-01, -1.85561815e-01,  2.85503998e-01],
       [ 4.04678128e-01,  1.47950550e-02, -1.18232025e-01,
         2.02222376e-01, -3.26191521e-03, -9.35432977e-02,
        -1.76085030e-02, -5.47971640e-02, -6.20843475e-02,
        -1.02060532e-01,  4.03317352e-02, -9.16328121e-02,
        -3.87101100e-02, -1.18960209e-01, -6.42457313e-02,
         4.08951110e-02, -8.48951053e-02,  3.12234231e-02,
        -1.07079841e-01,  6.14771401e-02, -9.00857927e-02,
         9.61397808e-03,  7.47995646e-02, -6.26145326e-02,
         1.02300490e-02,  9.75286569e-02, -1.19730654e-02,
         6.58770318e-03, -1.59628427e-01, -4.37386970e-03,
         3.29486748e-02,  1.28300948e-01, -7.61714721e-02,
        -9.43261567e-02,  2.97493256e-02, -1.55241277e-03,
        -1.37911557e-01,  1.09361021e-01, -1.03180151e-01,
        -1.19960730e-01,  1.50593712e-02, -1.07065913e-01,
        -5.78548947e-02,  2.76557726e-02,  3.93451885e-03,
        -5.11979697e-02,  1.52505670e-01, -5.22032700e-02,
        -1.20929474e-01,  1.70451476e-01,  5.20060563e-02,
         5.15345309e-02,  2.44960058e-01,  4.52378949e-02,
        -1.15261700e-01, -2.68201394e-01, -1.87604336e-01,
         4.83872085e-03, -8.42707843e-02, -3.88687139e-02,
         7.24737233e-02, -2.06860141e-02, -1.36362896e-01,
         3.76417742e-02,  3.89738125e-02,  1.01325582e-01,
         8.84148403e-02, -1.03813065e-01,  1.73987636e-02,
        -4.60784845e-02,  1.85129621e-02, -1.06561262e-01,
        -1.61256862e-01,  8.63610787e-02, -9.08970995e-02,
         1.29140510e-01, -1.01589311e-01, -3.71680508e-02,
        -1.88761960e-01, -1.17205690e-01, -4.49057415e-02,
         1.14982937e-01, -3.59492609e-02,  8.23084695e-02,
        -1.56460678e-01,  1.07465860e-01,  9.44813761e-02,
         4.02265784e-02,  8.76473374e-02,  5.70784205e-02,
         4.32767238e-02, -3.45713029e-02, -2.02583217e-04,
        -5.56839836e-02,  3.29583694e-02, -1.44423151e-01,
         2.00924411e-02, -6.18444972e-02, -7.98389343e-02,
         4.35140136e-02,  9.46395887e-02, -1.05509011e-01,
         9.17891400e-02,  4.62799529e-02,  1.05967602e-01,
         1.34030928e-01,  3.55247410e-03,  1.11125238e-04,
         1.10101146e-01, -4.71013145e-02,  9.17171993e-02,
         1.19331218e-01, -6.45639710e-03, -1.00667785e-01,
         8.79272755e-02, -5.32827510e-02, -1.65489205e-02,
         1.34292299e-02, -7.36712367e-02,  4.97272152e-03,
        -1.52602496e-02, -1.76076689e-02,  2.21933349e-01,
         1.71918410e-02,  5.86634167e-02, -7.43276917e-02,
         8.36396878e-02,  1.10780219e-01,  5.19483842e-02,
        -1.95307520e-01,  1.93100930e-01, -1.14880556e-01,
         7.65862423e-02, -3.74654749e-02, -9.12920727e-02,
         4.95033201e-03,  9.13997429e-03, -5.36833584e-02,
         4.06406057e-02, -1.04496223e-01, -5.55926885e-02,
        -1.61649399e-01, -2.02630613e-02,  5.94294445e-02,
         1.63710769e-01, -1.47199727e-01, -3.98678832e-02,
        -9.25450016e-02, -1.48617642e-01,  1.77282017e-01],
       [ 3.84980259e-01,  3.75960879e-02, -1.10140274e-01,
         2.03920380e-01, -9.52179251e-03, -7.45344407e-02,
        -2.74594971e-02, -7.06466001e-02, -3.20097652e-02,
        -1.15536022e-01,  2.89697909e-02, -8.21330670e-02,
        -3.32206226e-02, -9.62349073e-02, -5.10514694e-02,
         2.84012421e-02, -9.43805811e-02,  3.18692780e-02,
        -9.61116257e-02,  7.25567801e-02, -8.03076012e-02,
        -8.07981532e-03,  7.47232500e-02, -5.93689566e-02,
        -3.91902272e-05,  9.51621202e-02,  3.49701913e-03,
         4.44918055e-02, -1.54711655e-01,  1.19292062e-02,
         8.61452432e-03,  1.16012281e-01, -5.39185334e-02,
        -9.82126479e-02,  3.29019676e-02, -5.24636768e-04,
        -1.21211997e-01,  1.15927055e-01, -9.09826327e-02,
        -1.23081291e-01,  1.50395346e-02, -1.10907717e-01,
        -7.46041352e-02,  4.51271286e-02, -2.75979336e-02,
        -6.60339775e-02,  1.69140981e-01, -6.01047215e-02,
        -1.05622239e-01,  1.56582819e-01,  5.84663660e-02,
         4.75460481e-02,  1.84504371e-01,  1.76332885e-02,
        -9.13964657e-02, -2.46368488e-01, -1.90456340e-01,
        -1.28962082e-02, -8.74739364e-02, -5.11990072e-02,
         5.08990593e-02, -2.86628157e-02, -9.80873867e-02,
        -9.39782028e-03, -3.50255735e-04,  6.75898193e-02,
         8.97296024e-02, -8.34264533e-02,  4.79534218e-03,
        -4.95662032e-02,  4.76885213e-02, -1.20169771e-01,
        -1.35411318e-01,  6.59882456e-02, -9.32186922e-02,
         1.28298729e-01, -7.25537924e-02, -2.02321277e-02,
        -1.22714819e-01, -1.01038150e-01, -4.34630460e-02,
         1.05432353e-01, -8.87280951e-02,  7.71318283e-02,
        -1.17136023e-01,  9.76263964e-02,  7.81205172e-02,
         5.42472047e-02,  7.80202977e-02,  2.83719420e-02,
         5.93532874e-02, -7.24702985e-03, -9.14648282e-03,
        -6.38746741e-02,  4.64610243e-02, -1.48107594e-01,
         1.33057173e-02, -4.56443213e-02, -7.61325992e-02,
         6.33689653e-02,  7.78302674e-02, -1.14546381e-01,
         9.07161778e-02,  3.40992436e-02,  1.09546082e-01,
         1.45684625e-01,  1.97283270e-03, -8.42277092e-03,
         9.26473510e-02, -7.24642473e-02,  9.38050785e-02,
         1.03324952e-01,  1.51955883e-02, -8.48475145e-02,
         7.71827766e-02, -4.87655592e-02, -2.04297268e-03,
         1.78096570e-02, -8.64994310e-02,  2.07876661e-02,
        -3.24142139e-02, -2.32847283e-02,  2.16842353e-01,
         1.37065444e-02,  4.96057504e-02, -6.89914460e-02,
         8.67294674e-02,  8.87189115e-02,  6.72406531e-02,
        -1.87091306e-01,  1.94377303e-01, -1.13590897e-01,
         6.09525570e-02, -4.81337838e-02, -6.73892533e-02,
         2.45272587e-02,  1.57998278e-02, -4.71789337e-02,
         3.97621997e-02, -7.25246664e-02, -5.63366000e-02,
        -1.59556904e-01, -1.41109730e-02,  8.86965679e-02,
         1.29731248e-01, -1.18661096e-01, -5.14155829e-02,
        -1.36656223e-02, -1.31702274e-01,  1.31716148e-01],
       [ 4.06434027e-01,  1.85540211e-02, -1.14171609e-01,
         1.93983139e-01,  1.62589730e-04, -9.98614598e-02,
        -9.48181686e-03, -8.03288301e-02, -1.68336716e-02,
        -1.12372772e-01,  5.71284950e-03, -6.79496275e-02,
        -1.62214885e-02, -9.95529149e-02, -5.80241662e-02,
         7.69814558e-03, -1.03083723e-01,  3.13877090e-02,
        -1.32112101e-01,  7.50265061e-02, -7.96493948e-02,
        -1.71059756e-02,  8.95064798e-02, -9.13629195e-02,
         3.04562716e-02,  1.05354295e-01,  1.19529344e-03,
         4.77547576e-02, -1.39333477e-01,  2.12374748e-02,
        -9.63173405e-03,  9.74698682e-02, -4.76431367e-02,
        -9.59366362e-02,  4.79636990e-02, -2.57412420e-03,
        -1.33227053e-01,  1.13122885e-01, -1.02434353e-01,
        -1.07275519e-01,  2.25157246e-02, -1.40003181e-01,
        -6.75239929e-02,  4.19066962e-02, -3.77802732e-02,
        -6.36657659e-02,  1.60493507e-01, -3.47313310e-02,
        -1.22884560e-01,  2.01149700e-01,  3.06012289e-02,
         5.18859793e-02,  1.85733301e-01,  2.22909915e-02,
        -6.40796009e-02, -2.58660666e-01, -1.90412084e-01,
        -1.50071368e-02, -7.42092153e-02, -6.11473693e-02,
         4.71220784e-02, -1.61049541e-02, -1.13033864e-01,
        -5.21344558e-04,  1.75719194e-02,  9.66113475e-02,
         7.01908664e-02, -8.60291442e-02,  1.83516187e-02,
        -5.18338385e-02,  5.57515080e-02, -1.22623217e-01,
        -1.28869785e-01,  5.76722825e-02, -8.84545707e-02,
         1.26459707e-01, -7.20228235e-02, -6.35045228e-02,
        -1.27352155e-01, -1.04085331e-01, -2.87760786e-02,
         1.04159797e-01, -9.86291263e-02,  9.09029735e-02,
        -1.42546410e-01,  9.38539734e-02,  1.04217374e-01,
         5.79736116e-02,  7.98731873e-02,  1.93039972e-02,
         7.08583735e-02, -1.02833442e-02, -1.27377595e-02,
        -7.11248674e-02,  4.56246432e-02, -1.55930630e-01,
         1.93303461e-02, -6.62468032e-02, -7.89725182e-02,
         6.29326863e-02,  9.04316318e-02, -1.08852262e-01,
         1.05923711e-01,  1.46391662e-02,  1.37881153e-01,
         1.25539018e-01, -2.70825675e-03, -3.40100804e-02,
         8.68870140e-02, -7.69915674e-02,  7.20164191e-02,
         1.16653993e-01,  1.86228764e-02, -9.23331327e-02,
         9.42758264e-02, -6.02543933e-02, -3.83338388e-03,
         1.19286469e-02, -5.95492915e-02, -4.26503979e-03,
        -3.12865092e-02, -1.34487444e-02,  2.37362262e-01,
         1.33825851e-02,  4.57456339e-02, -6.34220955e-02,
         9.69300313e-02,  9.78532393e-02,  7.27899209e-02,
        -1.92108100e-01,  1.88422629e-01, -8.30514579e-02,
         8.21034275e-02, -4.76077310e-02, -8.18529064e-02,
         2.09567599e-02,  1.36311413e-02, -6.49621273e-02,
         6.30175289e-02, -1.38291892e-02, -3.86323678e-02,
        -1.36641356e-01, -9.10860822e-03,  5.31535709e-02,
         1.32625102e-01, -1.17540869e-01, -3.33742301e-02,
        -2.68216848e-02, -1.23394804e-01,  1.58725485e-01]])
In [136]:
f,p=plt.subplots(Kmarket,1,figsize=(6,6),sharey=True)

for i in range(Kmarket):
    p[i].plot(model.predict(Xdata[:2000])[:,i],'b')
    p[i].plot(Ymarket[:2000][:,i],'g')
plt.show()
In [137]:
BMmarketpathlistrandom = [np.zeros(K)]
for i in range(len(dates)):
    BMmarketpathlistrandom = BMmarketpathlistrandom + [BMmarketpathlistrandom[-1] + np.random.normal(0,1/np.sqrt(250),K)]
Reservoir = SDE(1,0.5,2,K,M,vectorfield2d,len(dates)-1)
Xtest=Reservoir.reservoir(BMmarketpathlistrandom)
np.shape(Xtest)
Xtestdata = np.squeeze(Xtest)

plt.plot(model.predict(Xtestdata[:1000]))
plt.show()

A third example goes for learning high dimensional or even infinite dimensional SDEs, like, e.g., HJM type equations.

In [1]:
import numpy as np
import scipy as sp
from matplotlib import pyplot as plt
In [15]:
d = 2 # number of driving Brownian motions
M = 50 # dimension of the reservoir driven by the same number of Brownian motions by
       # the reservoir vector field which uses random matrices defined below
In [16]:
def sigmoid(x):
    return np.tanh(x)

def grid(dimension):
    B = np.zeros((dimension,1))
    for i in range(dimension):
        B[i,0]=  i + 1.0
    return B

def nilpotent(M):
    B = np.zeros((M,M))
    for i in range(2,M):
        B[i,i-1]=1.0
    return B

def canonical(i,M):
    e = np.zeros((M,1))
    e[i,0]=1.0
    return e
In [30]:
def randomAbeta(d,M):
    A = []
    beta = []
    for i in range(d):
        B = 0.0*nilpotent(M) + np.random.normal(0.0,1.5,size=(M,M))
        A = A + [B]
        beta = beta + [canonical(i,M)+np.random.normal(0.0,0.5,size=(M,1))]
    return [A,beta]

Abeta = randomAbeta(d,M)
A = Abeta[0]
beta = Abeta[1]


def reservoirfield(state,increment):
    value = np.zeros((M,1))
    for i in range(d):
        value = value + sigmoid(np.matmul(A[i],state) + beta[i])*increment[i]
    return value

Now a HJM like vector field driven by $d$ Brownian motions is introduced. Maturity discretization is given by 'dimension' and vectorized by the function grid.

In [31]:
dimension = 500

def hjmfield(state,increment):
    value = np.zeros((dimension,1))
    for i in range(d):
        value = value + (0.5*((0.01*grid(dimension))**i)*
                         np.exp(-0.005*i*grid(dimension))*increment[i])
    return value

initialvalue = np.exp(-0.005*grid(dimension))

Here the essential class is defined. SDE chooses a random reservoir when instantiated. Then one can generate paths from it by method path (output: one trajectory and the corresponding Brownian motion). The second method reservoir allows to calculate the values of the reservoir given the Brownian path (which has to be given as argument). Packing everything in efficient arrays is not done well :-)

In [32]:
class SDE:
    def __init__(self,timehorizon,initialvalue,dimension,dimensionBM,dimensionR,vectorfield,timesteps,):
        self.timehorizon = timehorizon
        self.initialvalue = initialvalue # np array
        self.dimension = dimension
        self.dimensionBM = dimensionBM
        self.dimensionR = dimensionR
        self.vectorfield = vectorfield
        self.timesteps = timesteps

    def path(self):
        BMpath = [np.zeros(self.dimensionBM)]
        SDEpath = [self.initialvalue]
        
        for i in range(self.timesteps):
            helper = np.random.normal(0,np.sqrt(self.timehorizon/self.timesteps),self.dimensionBM)
            BMpath = BMpath + [BMpath[-1]+helper]
            SDEpath = SDEpath + [SDEpath[-1]+np.matmul(nilpotent(dimension)-np.identity(dimension),SDEpath[-1])*self.timehorizon/self.timesteps+self.vectorfield(SDEpath[-1],helper)]
        return [BMpath, SDEpath]
        
    def reservoir(self,BMpath):
        reservoirpath = [canonical(1,self.dimensionR)]
        for i in range(self.timesteps):
            increment = BMpath[i+1]-BMpath[i]
            reservoirpath = reservoirpath + [reservoirpath[-1]+np.matmul(nilpotent(self.dimensionR)-np.identity(self.dimensionR),reservoirpath[-1])*self.timehorizon/self.timesteps+reservoirfield(reservoirpath[-1],increment)]
        return reservoirpath    
        
In [33]:
plt.plot(initialvalue)
plt.show()

The class object is called HJM with the above initial value. One path is stored in training, which is then fed into a simple regression.

In [34]:
HJM = SDE(1,initialvalue,dimension,d,M,hjmfield,10000)
training = HJM.path()
In [35]:
f1,p1=plt.subplots(2,1,figsize=(6,6),sharey=True)
p1[0].plot(training[0][:10000],'r')
Yplot = training[1]
Yplot = np.squeeze(Yplot)
p1[1].plot(Yplot[:10000],'g')
plt.show()

Here the Brownian path is extracted which is then fed into the reservoir.

In [36]:
BMpath=training[0]
Y = training[1]
Ydata = np.squeeze(Y)
Ydatadiff = np.diff(Ydata,axis=0)
Ytrain = np.concatenate((Ydata[:3000],Ydatadiff[:3000:1]),axis=0)
np.shape(Ytrain)
Out[36]:
(6000, 500)
In [37]:
X=HJM.reservoir(BMpath)
np.shape(X)
Xdata = np.squeeze(X)
Xdatadiff = np.diff(Xdata,axis=0)
Xtrain=np.concatenate((Xdata[:3000],Xdatadiff[:3000:1]),axis=0)
np.shape(Xtrain)
Out[37]:
(6000, 50)
In [38]:
from sklearn import linear_model
import pandas as pd
lm = linear_model.LinearRegression()
model = lm.fit(Xtrain,Ytrain)
plt.plot(model.predict(Xtrain),'r')
plt.plot(Ytrain,'b')
plt.show()
model.score(Xtrain,Ytrain)
model.coef_
Out[38]:
array([[ 6.18111345e-03,  9.86496322e-01,  1.43285029e-02, ...,
         4.01697973e-02,  2.35281795e-01, -6.75035313e-03],
       [ 6.12089073e-03,  9.81579424e-01,  1.42832878e-02, ...,
         3.99536851e-02,  2.34069736e-01, -6.78774164e-03],
       [ 7.86811456e-04,  9.72573983e-01,  4.78589461e-02, ...,
         5.64196652e-02,  2.08432295e-01, -9.45880887e-03],
       ...,
       [-2.61377562e-03,  8.18831190e-02,  6.63718725e-03, ...,
         3.28503386e-03,  1.23161894e-02, -1.28382365e-02],
       [-2.61425597e-03,  8.14752239e-02,  6.62040471e-03, ...,
         3.27059459e-03,  1.22280024e-02, -1.28294225e-02],
       [-2.61473026e-03,  8.10693636e-02,  6.60370824e-03, ...,
         3.25625512e-03,  1.21402663e-02, -1.28206292e-02]])
In [39]:
plt.plot(model.predict(Xdata[:6000])[3000,:])
plt.plot(Ydata[:6000][3000,:])
plt.show()

Now generalizations are considered.

In [42]:
generalization = HJM.path()
BMpathgen = generalization[0]
Ygen = generalization[1]
Ygendata = np.squeeze(Ygen)

Xgen = HJM.reservoir(BMpathgen)
Xgendata = np.squeeze(Xgen)

N=4

fig,p=plt.subplots(N, 1, figsize=(6,6), sharey=True)
for i in range(N):
    j = np.random.randint(0,dimension)
    p[i].plot(model.predict(Xgendata[:1000])[:,j],'b')
    p[i].plot(Ygendata[:1000][:,j],'g')

plt.show()