Deep Hedging¶

The lecture is devoted to pricing and hedging from a machine learning perspective. First we develop some basic tools from mathematical finance and learn to understand pricing and hedging as questions from convex analysis. Then we consider several machine learning views on it.

Basic principles of pricing and hedging¶

We shall first go through basic principles of modeling in mathematical Finance.

To model financial market we shall consider a probability space $ (\Omega,\mathcal{F},P) $ together with a filtration $ (\mathcal{F}_t) $.

We shall denote by $ S^i $ the price of asset $i$ and by $ H^i $ the holdings in asset $i$ (can be any real number here, no market frictions assumed). $ H^i_t $ is fully determined at time $ t- 1 $ with all the information available there.

First we learn to write portfolio values, i.e. $ V_t = \sum_{i=0}^d H^i_tS^i_t $ as P & L processes in case the portfolio is self-financing. Self-financing means that $$ \sum_{i=0}^d H^i_{t+1}S^i_t = \sum_{i=0}^d H^i_tS^i_t \, , $$ which in turn leads to $$ V_{t+1} - V_t = \sum_{i=0}^d H^i_{t+1} (S^i_{t+1} - S^i_{t}) \, . $$ This means that the change in value of the portfolio comes from the change in value of the prices and nothing else.

This formula allows for a simplification. If we divide everything by the value of $S^0$, the price of the $0$-th asset, then in the above sum one term vanishes. We denote $ X^i_t = S^i_t/S^0_t $ When discounted, e.g. by $S^0$, this means $$ \frac{V_t}{S^0_t} - \frac{V_0}{S^0_0} = (H \bullet X)_t = \sum_{s \leq t} \sum_{i=1}^d H^i_{t+1} (X^i_{s+1} - X^i_{s})\, . $$ Notice that the inner sum only starts at $1$ because $ X^0_t = 1 $.

The right hand side is a P & L process. The argument can be turned around: given a portfolio value which is given by a constant plus a P & L process, then we can of course construct a self-financing portfolio.

We can ask whether arbitrages are possible and under which conditions (for which price) payoffs can be dominated (super-hedged) by self-financing portfolios.

A model is free of arbitrage if there is no self-financing portfolio which starts at zero and has a positive outcome¶

This leads us the valuation problem: what is the value of a payoff $f$ at time $T$? We can answer that by constructing appropriate (super-)hedging portfolios. Superhedging just means "dominating".

Let us consider this question in a one period case, i.e. $T=1$. The states of the world are denoted by $\omega$, hence we are interested in the question to find the smallest $ x $ such that for all $ \omega $ $$ f(\omega) \leq x + \sum_{i=1}^d H_0^i (X^i_1(\omega)-X^i_0(\omega)) \, . $$ This is equivalent to characterize the cone $ C:= \{ (H \bullet X)_1 - g \text{ for all possible strategies } H \text{ and } g \geq 0\} $.

This is a geometric question: the solution is $ C = \{ e \text{ such that } E_Q[e] \leq 0 \text{ for all equivalent martingale measures } $Q$\} $. This yields the beautiful formula $$ \sup_{Q \in \mathcal{M}} E_Q[f] = \inf \{x \text{ such that there is a strategy } H \text{ with } f \leq x + (H \bullet X)_1 \} \, . $$ The set $\mathcal{M} $ is the set of equivalent martingale measures. A super-hedging portfolio is a self-financing portfolio (i.e. the value process is the initial value of the portfolio plus the P&L process -- all in discounted terms) dominating a certain payoff.

We have seen in one step bi- and tri-nomial models that pricing and hedging are in a fundamental duality relationship given a certain payoff: the largest arbitrage free price equals the smallest price of a super-hedging portfolio. Superhedging prices can be calculated by backwards induction.

This yields the following pricing formula: if a payoff's contract is liquidly traded at price $ \pi_t(f) $ at intermediate times $ t $, then there exists an equivalent martingale measure for the given market constituted by $X^1,\dots,X^d$ such that¶

$$ E_Q\big[ \frac{f}{S^0_T} | \mathcal{F}_t \big] = \frac{\pi_t(f)}{S^0_t} . $$ Let us consider these principles in the following modeling situations. Furthermore models are free of arbitrage if and only if there exists an equivalent pricing measure, which associates in particular to P & L processes the value $0$. Whence prices of payoffs can be calculated by taking expectations (i.e. a Monte Carlo evaluation is possible) with respect to this equivalent pricing measure.

First we draw the tree in undiscounted terms in the format time / numeraire / stock price. For simplicity we take the numeraire equal to $1$ but one can easily adapt that. Next the draw the tree in discounted terms in the format time / price.

import numpy as np
from itertools import product


# EUROPEAN
S0 = 100.
u  = 1.2
d  = 0.9
payoffE = lambda S: np.maximum(S-S0,0.)

timesteps = 2
bin = set((0,1))
trajectories = set(product(bin, repeat = timesteps))


import pygraphviz as PG
from IPython.display import Image
binomialtreeforward = PG.AGraph(directed=True, strict=True)
binomialtreeforward.edge_attr.update(len='2.0',color='blue')
binomialtreeforward.node_attr.update(color='red')
binomialtreeforwarddiscounted = PG.AGraph(directed=True, strict=True)
binomialtreeforwarddiscounted.edge_attr.update(len='2.0',color='blue')
binomialtreeforwarddiscounted.node_attr.update(color='red')
binomialtreebackward = PG.AGraph(directed=True, strict=True)
binomialtreebackward.edge_attr.update(len='2.0',color='blue')
binomialtreebackward.node_attr.update(color='red')
process = {(omega,0):S0 for omega in trajectories}
numeraire = {(omega,0):1. for omega in trajectories}
discountedprocess = {(omega,0):S0 for omega in trajectories}

#construct process by forward steps
for time in range(1,timesteps+1):
    for omega in trajectories:
        shelper = process[(omega,time-1)]*u**(omega[time-1])*d**(1.-omega[time-1])
        process.update({(omega,time):shelper})
        
for time in range(1,timesteps+1):
    for omega in trajectories:
        #shelper = process[(omega,time-1)]*u**(omega[time-1])*d**(1.-omega[time-1])
        numeraire.update({(omega,time):1.0})
        
for time in range(1,timesteps+1):
    for omega in trajectories:
        binomialtreeforward.add_edge('%d / %d / %d'% (time-1,numeraire[(omega,time-1)],process[(omega,time-1)]),
                                     '%d / %d / %d'% (time,numeraire[(omega,time)],process[(omega,time)]))

#for time in range(1,timesteps+1):
#    for omega in trajectories:
#        binomialtreeforward.add_edge('%d, %d'% (time-1,numeraire[(omega,time-1)]),'%d, %d'% (time,numeraire[(omega,time)]))

Image(binomialtreeforward.draw(format='png',prog='dot'))

for time in range(0,timesteps+1):
    for omega in trajectories:
        shelper = process[(omega,time)]/numeraire[(omega,time)]
        discountedprocess.update({(omega,time):shelper})
        
for time in range(1,timesteps+1):
    for omega in trajectories:
        binomialtreeforwarddiscounted.add_edge('%d / %d'% (time-1,discountedprocess[(omega,time-1)]),'%d / %d'% (time,discountedprocess[(omega,time)]))

#for time in range(1,timesteps+1):
#    for omega in trajectories:
#        binomialtreeforward.add_edge('%d, %d'% (time-1,numeraire[(omega,time-1)]),'%d, %d'% (time,numeraire[(omega,time)]))


Image(binomialtreeforwarddiscounted.draw(format='png',prog='dot'))

for time in range(0,timesteps+1):
    for omega in trajectories:
        shelper = process[(omega,time)]/numeraire[(omega,time)]
        discountedprocess.update({(omega,time):shelper})
        
for time in range(1,timesteps+1):
    for omega in trajectories:
        binomialtreeforwarddiscounted.add_edge('%d / %d'% (time-1,discountedprocess[(omega,time-1)]),'%d / %d'% (time,discountedprocess[(omega,time)]))

#for time in range(1,timesteps+1):
#    for omega in trajectories:
#        binomialtreeforward.add_edge('%d, %d'% (time-1,numeraire[(omega,time-1)]),'%d, %d'% (time,numeraire[(omega,time)]))


Image(binomialtreeforwarddiscounted.draw(format='png',prog='dot'))

def condprob(omega,time): 
    omegahelperu = list(omega)
    omegahelperd = list(omega)
    omegahelperu[time]=1
    omegahelperd[time]=0
    omegahelperu = tuple(omegahelperu)
    omegahelperd = tuple(omegahelperd)
    return (discountedprocess[(omega,time)]-discountedprocess[(omegahelperd,time+1)])/(discountedprocess[(omegahelperu,time+1)]-discountedprocess[(omegahelperd,time+1)])

The previous function calculates the conditional probabilities which make the process at each node a martingale. Finally the price of a Eurpean Call payoff is calculated in a backwards manner. The tree is drawn in the format time / stock price / derivative price.

processbackward = {(omega,timesteps):(payoffE(process[(omega,timesteps)])/numeraire[(omega,timesteps)]) for omega in trajectories}
#backwardssteps: European
for time in reversed(range(0,timesteps)):
    for omega in trajectories:
        shelper=0                                   
        omegahelperu = list(omega)
        omegahelperd = list(omega)
        omegahelperu[time]=1
        omegahelperd[time]=0
        omegahelperu = tuple(omegahelperu)
        omegahelperd = tuple(omegahelperd)
        shelper = processbackward[(omegahelperu,time+1)]*condprob(omega,time)+processbackward[(omegahelperd,time+1)]*(1-condprob(omega,time))
        processbackward.update({(omega,time):shelper})

for time in range(1,timesteps+1):
    for omega in trajectories:
        binomialtreebackward.add_edge('%d / %f / %f'% (time-1,discountedprocess[(omega,time-1)],processbackward[(omega,time-1)]),'%d / %f / %f'% (time,discountedprocess[(omega,time)],processbackward[(omega,time)]))

Image(binomialtreebackward.draw(format='png',prog='dot'))

Deep Hedging¶

In the sequel we construct deep hedges in a high dimensional frictionless market. In principle also transaction costs can be easily added, however, one should change the loss function appropriately.

The code is based on great work of my master student Michael Giegrich, whose work is based on the paper Deep Hedging by Hans Bühler, Lukas Gonon, Josef Teichmann and Ben Wood.

First a parameter class is defined and several functions needed in the sequel. Then - in tensorflow - a feedforward network with two hidden layers is defined for every time step $ i $ and for every asset $ S^i $ where we need a hedging strategy: Input is the price at time $ i $ and the strategy at time $ i - 1 $, which is due to transaction costs on the one hand or can be seen as way to account for memory effects on the other hand. Be aware that the $L^2$ distance to measure the quality of the hedge is not appropriate in case of transaction costs.

The loss function is given by $$ E_Q\Big[{\big( Y - (\delta \bullet S) - p \big)}^2\Big] $$ where $ Y $ denotes a payoff, $ \delta $ the strategies to be learnt and $ p $ the price to be learned. $Q$ is an a priori chosen martingale measure.

For reasons of illustration we actually use correlated Black-Scholes models to generate prices and we use call options on a certain amount of most relevant assets. This allows to easily write down a Black-Scholes hedge.

Let us recall briefly the Black-Scholes theory: we consider $ d = 1 $, a bank account process $ S^0 = 1 $, and $$ S_t = S_0 \exp(\sigma W_t + \mu t) $$ for $ t \in [0,T] $. $ W $ denotes a Wiener process. $ \sigma > 0 $ is called the volatility and $ \mu $ the drift.

We ask whether we can write a given payoff profile $ f(S_T) $ as a premium plus a hedging portfolio $$ f(S_T) = x + (H \bullet S)_T \, $$ This formula holds with respect to the physical measure but also any other equivalent measure (if it holds true!), hence we can solve it with respect to martingale measure $ Q \sim P $.

The idea to construct the hedging strategy $ H $ comes from Ito's formula. Consider a $C^2$ function $ v $ on $ [0,T] \times \mathbb{R}_{> 0} $ with $ v(0,S) = f(S) $, then Ito's formula yields $$ v(T-t,S_t) = v(T,S_0) = ( \partial_S v(T-.,S) \bullet S)_t + ( (- \partial_t + \frac{\sigma^2 S^2}{2} \partial^2_S)v(T-.,S) \bullet S)_t \, . $$ If we $ v $ satisfies the Black-Scholes PDE $$ \partial_t v(t,S) = \frac{\sigma^2 S^2}{2} \partial^2_Sv(t,S) $$ for $ t in [0,T] $ and $ S > 0 $, we obtain the hedging strategy as so called Delta $ H_t = \partial v(T-t,S_t) $ and the premium $ x = v(T,S_0) $.

This PDE can be solved explicitly by calculating $$ v(T-t,S_t) = E\big[f(\exp(\sigma W_T - \frac{\sigma^2T}{2}))|S_t\big] \, $$ which yields for $ f(S):=(S-K)_+ $ the famous Black-Scholes formula $$ v(u,S) = E((S\exp(\sigma W_u - \sigma^2/2) - K)_+)= S \Phi(\frac{\log(\frac{S}{K}) + \frac{\sigma^2u}{2}}{\sigma \sqrt{u}}) - K \Phi\Big(\frac{\log(\frac{S}{K} \Big) -\frac{\sigma^2u}{2}}{\sigma \sqrt{u}} \Big) \, . $$

# by Michael Giegrich and Josef Teichmann

import os
import time
os.environ['TF_CPP_MIN_LOG_LEVEL']='2'
import tensorflow as tf
import numpy as np 
import pandas as pd
import matplotlib.pyplot as plt
import scipy.stats as ss

# Set Parameters
np.random.seed(2)
tf.set_random_seed(2)
global_step = tf.Variable(0, trainable=False)
t0 = time.time()
epsilon = 0.
class parameters():
    def __init__(self):
        self.NUM_EPOCHS = 40
        self.TIME_STEPS = 30
        self.NUM_ASSETS = 1#50
        self.NUM_RELAVENT = 1 
        self.NUM_SAMPLES = 500000
        self.BATCH_SIZE = 258
        self.NUM_BATCHES = self.NUM_SAMPLES // self.BATCH_SIZE
        self.hidden_dim1 = 2*self.NUM_ASSETS
        self.hidden_dim2 = 2*self.NUM_ASSETS
        self.LEARNING_RATE = 0.0005
        self.NUM_TEST = 10000
        self.training = 1
        self.decay = 0.999
        self.minimal_loss = 100.0 #Determines if model is saved
        self.NUM_VALIDATION = 10000
        self.L1_LAMBDA = 0.001 #Parameter for L1-Penalty
        self.correlated = True
        self.corr = 1

/usr/lib64/python3.6/site-packages/h5py/__init__.py:36: FutureWarning: Conversion of the second argument of issubdtype from `float` to `np.floating` is deprecated. In future, it will be treated as `np.float64 == np.dtype(float).type`.
  from ._conv import register_converters as _register_converters

#############################################
# Generates Model Hedge And Price For Calls #
#############################################
def delta_BS(S, sigma = 0.2, K = 100., r = 0., T =30./365.):
    delta_hedge = []
    time_steps = int(T*365)
    for i in range(time_steps-1):
        i_ratio = float(i)/365.
        delta = (ss.norm.cdf((np.log(S[:,i]/K) +(r+sigma**2/2)*(T-i_ratio))/(sigma*np.sqrt(T-i_ratio))))
        delta_hedge.append(delta)
    delta_hedge = np.stack(delta_hedge, axis=1)
    return delta_hedge
def d1(S0, K, r, sigma, T):
    return (np.log(S0/K) + (r + sigma**2 / 2) * T)/(sigma * np.sqrt(T)) 
def d2(S0, K, r, sigma, T):
    return (np.log(S0 / K) + (r -sigma**2 / 2) * T) / (sigma * np.sqrt(T)) 
def BlackScholes(S0, sigma = 0.2, K = 100., r = 0., T =30./365.):
    return (S0 * ss.norm.cdf(d1(S0, K, r, sigma, T)) - K * np.exp(-r * T)* ss.norm.cdf(d2(S0, K, r, sigma, T)))

import math as m

####################
# Generate Samples #
####################
def SampleGenerator(NumOfAssets ,corr, NumOfRelevant = 5,InitPrice =  100, rf_rate = 0., dt = 1./365.,TimeSteps = 30, sigma = 0.2, K = 100.):
    S0 = np.ones(NumOfAssets)*InitPrice
    features = np.zeros((TimeSteps, NumOfAssets))
    L = np.linalg.cholesky(corr)
    features[0, 0:NumOfAssets] = 0.
    features[1, 0:NumOfAssets] = S0
    for i in range(2, TimeSteps):
        W = m.sqrt(dt)*np.dot(L,np.random.randn(NumOfAssets))
        features[i, 0:NumOfAssets] =(features[i-1, 0:NumOfAssets]*np.exp((rf_rate*np.ones(NumOfAssets)-0.5*sigma*sigma)*dt+sigma*W)) 
    FinalPrices = features[TimeSteps-1, 0:NumOfRelevant]
    value = np.maximum(FinalPrices - K, 0)  
    value = np.sum(value)
    return features, value

###############################
# Generate Correlation Matrix #
###############################
def corr(NumOfAssets):
    dim = NumOfAssets
    A = np.random.normal(3, 8,size=(dim,dim))
    cov = np.dot(A,np.transpose(A))
    D = np.identity(dim)/np.sqrt(np.diag(cov))
    corr = np.dot(D,np.dot(cov,D))
    return corr 

#################
# Generate Data #
#################
def DataGenerator(NumOfSamples, TimeSteps = 30,NumOfAssets = 50, corr = 1, correlated = False,NumOfRelevant = 5):
    if not(correlated): 
        corr = np.eye(NumOfAssets)
    features = np.zeros((NumOfSamples, TimeSteps, NumOfAssets))
    labels = np.zeros((NumOfSamples))
    for i in range(1, NumOfSamples+1):
        features[i-1, :, :], labels[i-1] = SampleGenerator(
        NumOfAssets, TimeSteps=TimeSteps, corr=corr, 
        NumOfRelevant = NumOfRelevant)
    labels = labels.reshape((NumOfSamples, 1))
    return features, labels

##########################
# Generate Price Tensors #
##########################
def price(FLAGS):
    price = (BlackScholes(100, T = float(FLAGS.TIME_STEPS)/365.)*np.ones((FLAGS.BATCH_SIZE))*FLAGS.NUM_RELAVENT)
    q = tf.constant(price, dtype = tf.float32)
    return q
def price_test(FLAGS):
    price = (BlackScholes(100, T = float(FLAGS.TIME_STEPS)/365.)*np.ones((FLAGS.NUM_TEST))*FLAGS.NUM_RELAVENT)
    q = tf.constant(price, dtype = tf.float32)
    return q
def price_valid(FLAGS):
    price = (BlackScholes(100, T = float(FLAGS.TIME_STEPS)/365.)*np.ones((FLAGS.NUM_VALIDATION))*FLAGS.NUM_RELAVENT)
    q = tf.constant(price, dtype = tf.float32)
    return q

###############################
# Functions for Data-Pipeline #
###############################
def generate_data(FLAGS):
    X, Y = DataGenerator(FLAGS.NUM_SAMPLES,TimeSteps = FLAGS.TIME_STEPS, corr = FLAGS.corr,correlated = FLAGS.correlated, NumOfAssets = FLAGS.NUM_ASSETS,NumOfRelevant = FLAGS.NUM_RELAVENT)
    return X, Y
def generate_test_data(FLAGS):
    X, Y = DataGenerator(FLAGS.NUM_TEST,TimeSteps = FLAGS.TIME_STEPS, corr = FLAGS.corr,correlated = FLAGS.correlated, NumOfAssets = FLAGS.NUM_ASSETS,NumOfRelevant = FLAGS.NUM_RELAVENT)
    return X, Y
def generate_validation_data(FLAGS):
    X, Y = DataGenerator(FLAGS.NUM_VALIDATION,TimeSteps = FLAGS.TIME_STEPS, corr = FLAGS.corr,correlated = FLAGS.correlated, NumOfAssets = FLAGS.NUM_ASSETS,NumOfRelevant = FLAGS.NUM_RELAVENT)
    return X, Y  
def generate_batch(FLAGS, raw_data):
    raw_X, raw_y = raw_data
    # Create Batches from Raw Data
    num_batches = FLAGS.NUM_SAMPLES // FLAGS.BATCH_SIZE 
    data_X = np.zeros([FLAGS.BATCH_SIZE, FLAGS.NUM_ASSETS,FLAGS.TIME_STEPS], dtype=np.float32)
    data_y = np.zeros([FLAGS.BATCH_SIZE], dtype=np.float32)
    for i in range(num_batches):
        data_X = (raw_X[FLAGS.BATCH_SIZE * i:FLAGS.BATCH_SIZE * (i+1), :, :])
        data_y = raw_y[FLAGS.BATCH_SIZE * i:FLAGS.BATCH_SIZE * (i+1)]
        yield (data_X, data_y)
def generate_epochs(FLAGS):
    for i in range(FLAGS.NUM_EPOCHS):
        yield generate_batch(FLAGS, FLAGS.data)

#######################
# Batch Normalization #
#######################
def batch_norm(FLAGS, x, name):
    with tf.variable_scope(name, reuse=tf.AUTO_REUSE):
        if FLAGS.training==2:
            FLAGS.decay = 0
        elif FLAGS.training==3:
            FLAGS.decay = FLAGS.NUM_SAMPLES/(FLAGS.NUM_SAMPLES
                                             +FLAGS.NUM_TEST)
        param_shape = [x.get_shape()[-1]]
        batch_mean, batch_var = tf.nn.moments(x, [0], name='moments')
        pop_mean = tf.get_variable('moving_mean', param_shape,tf.float32, initializer=tf.constant_initializer(0.0),trainable=False)
        pop_var = tf.get_variable('moving_variance', param_shape,tf.float32, initializer=tf.constant_initializer(1.0),trainable=False)
        train_mean_op = tf.assign(pop_mean, pop_mean * FLAGS.decay + batch_mean * (1 - FLAGS.decay))
        train_var_op = tf.assign(pop_var, pop_var * FLAGS.decay + batch_var * (1 - FLAGS.decay))
        if FLAGS.training==1:
            with tf.control_dependencies([train_mean_op, train_var_op]):
                return tf.nn.batch_normalization(x, batch_mean, batch_var, 0, 1, 1e-3)
        elif FLAGS.training==2:
            with tf.control_dependencies([train_mean_op, train_var_op]):
                return tf.nn.batch_normalization(x, batch_mean, batch_var, 0, 1, 1e-3)
        elif FLAGS.training==3:
            with tf.control_dependencies([train_mean_op, train_var_op]):
                return tf.nn.batch_normalization(x, pop_mean,pop_var, 0, 1, 1e-3)

###################
# Define RNN Cell #
###################
def rnn_cell(FLAGS, rnn_input, state, name):
    with tf.variable_scope('rnn_cell'+ str(name), reuse=True):
      input_ = tf.concat( [rnn_input, state], axis = 1)
      W1 = tf.get_variable('W1',[FLAGS.NUM_ASSETS*2, FLAGS.hidden_dim1],initializer=tf.random_normal_initializer(stddev=0.1))
      b1 = tf.get_variable('b1', [FLAGS.hidden_dim1],initializer=tf.constant_initializer(0.0))
      W2 = tf.get_variable('W2',[FLAGS.hidden_dim1, FLAGS.hidden_dim2],initializer=tf.random_normal_initializer(stddev=0.1))
      b2 = tf.get_variable('b2', [FLAGS.hidden_dim2],initializer=tf.constant_initializer(0.0))
      W3 = tf.get_variable('W3',[FLAGS.hidden_dim2, FLAGS.NUM_ASSETS], initializer=tf.random_normal_initializer(stddev=0.1))
      b3 = tf.get_variable('b3', [FLAGS.NUM_ASSETS],initializer=tf.constant_initializer(0.0))
      input_ = batch_norm(FLAGS, input_, 'Layer_0')
      out1 = tf.matmul(input_, W1) + b1
      hidden_out1 = tf.nn.relu(out1)
      out2 = batch_norm(FLAGS, hidden_out1, 'Layer_1')
      out2 = tf.matmul(out2, W2) + b2
      hidden_out2 = tf.nn.relu(out2)
      output =tf.matmul(hidden_out2, W3) + b3
      return output, W3, b3, W2, b2, W1, b1

#########################
# Define Loss Functions #
#########################
def QuadraticHedgingLossRegularized(FLAGS, outputs,W3, b3,
 W2, b2, W1, b1, y, X, q):
    PriceChanges = X[:, 1:, :] - X[:, :-1, :]
    PortfolioValue = tf.reduce_sum(tf.reduce_sum(tf.multiply(PriceChanges, outputs), axis=1),axis=1)
    one = tf.ones(tf.squeeze(y).shape, dtype = tf.float32)
    penalization = (tf.reduce_sum(tf.abs(W3), [0,1,2])+tf.reduce_sum(tf.abs(b3),[0,1])+tf.reduce_sum(tf.abs(W2),[0,1,2])+tf.reduce_sum(tf.abs(b2),[0,1])+tf.reduce_sum(tf.abs(W1),[0,1,2])+tf.reduce_sum(tf.abs(b1),[0,1]))
    loss = tf.reduce_mean(tf.square(PortfolioValue - tf.squeeze(y)+ q*one)) + FLAGS.L1_LAMBDA*penalization
    return loss
def QuadraticHedgingLoss(FLAGS, outputs, y, X, q):
    PriceChanges = X[:, 1:, :] - X[:, :-1, :]
    PriceChanges = PriceChanges[:,1:,:] # normalize first entry to one
    helper0 = 0.05*X[:,1:2,:] # normalize first entry to one
    PriceChanges = tf.concat([helper0,PriceChanges],axis=1) # normalize first entry to one
    Prices = X[:,1:,:]
    #print(Prices.shape)
    StrategyChangeshelper = outputs[:,1:,:]-outputs[:,:-1,:]
    StrategyChangeshelper = StrategyChangeshelper[:,1:,:]
    #print(StrategyChangeshelper.shape)
    helper1=outputs[:,1:2,:]
    helper2=outputs[:,FLAGS.TIME_STEPS-2:FLAGS.TIME_STEPS-1,:]
    x = tf.zeros(helper1.shape,dtype=tf.float32)
    StrategyChanges=tf.concat([helper1, StrategyChangeshelper, helper2],axis=1)
    #print(StrategyChanges.shape)
    PortfolioValue = tf.reduce_sum(
        tf.reduce_sum(tf.multiply(PriceChanges,outputs),axis=1),axis=1)
    TransactionCost = tf.reduce_sum(
        tf.reduce_sum(tf.multiply(tf.abs(Prices),tf.abs(StrategyChanges)),axis=1),axis=1)
    one = tf.ones(tf.squeeze(y).shape, dtype = tf.float32)
    loss = tf.reduce_mean(tf.square(PortfolioValue - epsilon*TransactionCost - tf.squeeze(y) + 0.0*q*one))
    return loss
# Calculate Portfolio Losses
def HedgingLossNP(outputs, y, X, p):
    PriceChanges = X[:, 1:, :] - X[:, :-1, :]
    PriceChanges[:,0,:] = 0.05*PriceChanges[:,0,:] # normalize first entry to one
    Prices = X[:,1:,:]
    StrategyChangeshelper = outputs[:,1:,:]-outputs[:,:-1,:]
    StrategyChangeshelper = StrategyChangeshelper[:,1:,:]
    helper1=outputs[:,1:2,:]
    helper2=outputs[:,FLAGS.TIME_STEPS-2:FLAGS.TIME_STEPS-1,:]
    x = np.zeros(helper1.shape)
    StrategyChanges=np.concatenate([helper1, StrategyChangeshelper, helper2],axis=1)
    TransactionCost = np.sum(np.sum(np.abs(Prices)*np.abs(StrategyChanges),axis=1),axis=1)
    PortfolioValue = np.sum(np.sum(PriceChanges*outputs,axis=1),axis=1)
    one = np.ones(np.squeeze(y).shape, dtype = np.float32)
    loss = PortfolioValue + 0.0*p*one-np.squeeze(y)-epsilon*TransactionCost
    return loss 

def HedgingLossNPcum(outputs, y, X, p):
    PriceChanges = X[:, 1:, :] - X[:, :-1, :]
    PriceChanges[:,0,:] = 0.05*PriceChanges[:,0,:] # normalize first entry to one
    Prices = X[:,1:,:]
    StrategyChangeshelper = outputs[:,1:,:]-outputs[:,:-1,:]
    StrategyChangeshelper = StrategyChangeshelper[:,1:,:]
    helper1=outputs[:,1:2,:]
    helper2=outputs[:,FLAGS.TIME_STEPS-2:FLAGS.TIME_STEPS-1,:]
    x = np.zeros(helper1.shape)
    StrategyChanges=np.concatenate([helper1, StrategyChangeshelper, helper2],axis=1)
    TransactionCost = np.cumsum(np.sum(np.abs(Prices)*np.abs(StrategyChanges),axis=2),axis=1)
    #TransactionCost = TransactionCost[:,1:]
    PortfolioValue = np.cumsum(np.sum(PriceChanges*outputs,axis=2),axis=1)
    one = np.ones(np.squeeze(PortfolioValue.shape), dtype = np.float32)
    loss = PortfolioValue + 0.0*p*one - epsilon*TransactionCost
    return loss 

######################
# Define Model Class #
######################
class model(object):
    def __init__(self, FLAGS):
        #Placeholder for Data
        self.X = tf.placeholder(tf.float32, 
            [FLAGS.BATCH_SIZE, FLAGS.TIME_STEPS, FLAGS.NUM_ASSETS],
            name='input_placeholder')
        self.y = tf.placeholder(tf.float32, [FLAGS.BATCH_SIZE],
            name='labels_placeholder')
        self.X_train = tf.placeholder(tf.float32, 
            [FLAGS.NUM_SAMPLES, FLAGS.TIME_STEPS, FLAGS.NUM_ASSETS],
            name='input_train_placeholder')
        self.X_valid = tf.placeholder(tf.float32, 
            [FLAGS.NUM_VALIDATION, FLAGS.TIME_STEPS, FLAGS.NUM_ASSETS],
            name='valid_input_placeholder')
        self.y_valid = tf.placeholder(tf.float32,[FLAGS.NUM_VALIDATION], name='valid_labels_placeholder')
        self.X_test = tf.placeholder(tf.float32, 
            [FLAGS.NUM_TEST, FLAGS.TIME_STEPS, FLAGS.NUM_ASSETS],
            name='test_input_placeholder')
        self.y_test = tf.placeholder(tf.float32, [FLAGS.NUM_TEST],
            name='test_labels_placeholder')
        # Define RNN Cell
        self.initial_state = tf.zeros([FLAGS.BATCH_SIZE,FLAGS.NUM_ASSETS])
        for i in range(0, FLAGS.TIME_STEPS-1):
            with tf.variable_scope('rnn_cell'+ str(i)):
                W1 = tf.get_variable('W1',
                    [FLAGS.NUM_ASSETS*2, FLAGS.hidden_dim1])
                b1 = tf.get_variable('b1', [FLAGS.hidden_dim1],
                    initializer=tf.constant_initializer(0.0))
                W2 = tf.get_variable('W2',
                    [FLAGS.hidden_dim1, FLAGS.hidden_dim2])
                b2 = tf.get_variable('b2', [FLAGS.hidden_dim2],
                    initializer=tf.constant_initializer(0.0))
                W3 = tf.get_variable('W3',
                    [FLAGS.hidden_dim2, FLAGS.NUM_ASSETS])
                b3 = tf.get_variable('b3', [FLAGS.NUM_ASSETS],
                    initializer=tf.constant_initializer(0.0))

        # Build Network Graph
        state = self.initial_state
        weights3 = []
        bias3 = []
        weights2 = []
        bias2 = []
        weights1 = []
        bias1 = []
        rnn_outputs = []
        for i in range(0, FLAGS.TIME_STEPS-1):
            FLAGS.counter = i
            state, weight_matrix3, bias_vector3,  weight_matrix2, bias_vector2, weight_matrix1, bias_vector1 = rnn_cell(FLAGS, self.X[:, i, :], state, name=i)
            rnn_outputs.append(state)
            weights3.append(weight_matrix3)
            bias3.append(bias_vector3)
            weights2.append(weight_matrix2)
            bias2.append(bias_vector2)
            weights1.append(weight_matrix1)
            bias1.append(bias_vector1)
        rnn_outputs = tf.stack(rnn_outputs, axis=1)
        weights3 = tf.stack(weights3, axis=1)
        bias3 = tf.stack(bias3, axis=1)
        weights2 = tf.stack(weights2, axis=1)
        bias2 = tf.stack(bias2, axis=1)
        weights1 = tf.stack(weights1, axis=1)
        bias1 = tf.stack(bias1, axis=1)        
        self.hedging_weights = rnn_outputs
        self.test = rnn_outputs

    #Build Loss Graph and Optimization Step
        self.q =  price(FLAGS)             
        self.total_loss = QuadraticHedgingLoss(FLAGS,self.hedging_weights, self.y, self.X, self.q)
        #QuadraticHedgingLossRegularized(FLAGS, self.hedging_weights, weights3, bias3, weights2, bias2,weights1, bias1, self.y, self.X, self.q)
        optimizer = tf.train.AdamOptimizer(FLAGS.LEARNING_RATE)
        self.quad_loss = QuadraticHedgingLoss(FLAGS,self.hedging_weights, self.y, self.X, self.q)
        self.train_step = optimizer.minimize(self.total_loss,global_step = global_step)
        
    def step(self, sess, batch_X, batch_y):
        input_feed = {self.X: batch_X,
                      self.y: np.squeeze(batch_y)}
        output_feed = [self.hedging_weights,
                       self.total_loss,
                       self.quad_loss,
                       self.train_step,
                       self.q
                       ]
        outputs = sess.run(output_feed, input_feed)
        return (outputs[0], outputs[1], outputs[2],outputs[3], outputs[4])

    # Graphs for Sampling
    def sample_train(self, FLAGS):
        initial_state = tf.zeros(
            [FLAGS.NUM_SAMPLES, FLAGS.NUM_ASSETS])
        hedging_weights = []
        state = initial_state
        for i in range(0, FLAGS.TIME_STEPS-1):
                FLAGS.counter = i
                state,_,_,_,_,_,_ = rnn_cell(FLAGS,self.X_train[:, i, :], state, name = i)
                hedging_weights.append(state)
        hedging_weights = tf.stack(hedging_weights, axis = 1)
        return hedging_weights

    def sample(self, FLAGS):
        initial_state = tf.zeros([FLAGS.NUM_TEST, FLAGS.NUM_ASSETS])
        hedging_weights = []
        state = initial_state
        for i in range(0, FLAGS.TIME_STEPS-1):
                FLAGS.counter = i
                state,_,_,_,_,_,_ = rnn_cell(FLAGS,self.X_test[:, i, :], state, name = i)
                hedging_weights.append(state)
        hedging_weights = tf.stack(hedging_weights, axis = 1)
        return hedging_weights

    def sample_validate(self, FLAGS):
        initial_state = tf.zeros(
            [FLAGS.NUM_VALIDATION, FLAGS.NUM_ASSETS])
        hedging_weights = []
        state = initial_state
        for i in range(0, FLAGS.TIME_STEPS-1):
                FLAGS.counter = i
                state,_,_,_,_,_,_ = rnn_cell(FLAGS, 
                    self.X_valid[:, i, :], state, name = i)
                hedging_weights.append(state)
        hedging_weights = tf.stack(hedging_weights, axis = 1)
        return hedging_weights

# Intializes Model
def create_model(sess, FLAGS):
    char_model = model(FLAGS)
    sess.run(tf.global_variables_initializer())
    return char_model

#if __name__ == '__main__':
FLAGS = parameters()
#Generate Correlation Matrix
if FLAGS.correlated:
    FLAGS.corr = corr(FLAGS.NUM_ASSETS)
#Generate Data
train_data = generate_data(FLAGS)
FLAGS.data = train_data
X_valid, y_valid = generate_validation_data(FLAGS)
X_test, y_test = generate_test_data(FLAGS)
EarlyTrainLoss =np.zeros(FLAGS.NUM_EPOCHS)
EarlyValidateLoss =np.zeros(FLAGS.NUM_EPOCHS)

with tf.Session() as sess:
    model = create_model(sess, FLAGS)
    saver = tf.train.Saver()
        
#########################
# Train Hedging Network #
#########################
    for idx, epoch in enumerate(generate_epochs(FLAGS)):
        training_losses = []
        quad_losses = []
        for step, (input_X, input_y) in enumerate(epoch):
            predictions, total_loss, quad_loss, after_loss, q = model.step(sess, input_X, input_y)
            training_losses.append(total_loss)
            quad_losses.append(quad_loss)
        model_weights = delta_BS(input_X,T = float(FLAGS.TIME_STEPS)/365.)
        model_weights = np.concatenate([model_weights[:,:,0:FLAGS.NUM_RELAVENT],np.zeros([FLAGS.BATCH_SIZE,FLAGS.TIME_STEPS-1,FLAGS.NUM_ASSETS-FLAGS.NUM_RELAVENT])], axis = 2)
        model_hedge_loss = q+HedgingLossNP(model_weights, input_y, input_X, q)
        deep_hedge_loss = HedgingLossNP(predictions, input_y, input_X, q)
        plt.hist(model_hedge_loss,range=(np.min(deep_hedge_loss),np.max(deep_hedge_loss)),color='green')
        plt.show()
        plt.hist(deep_hedge_loss)
        plt.show()
        i = np.random.randint(0,257,1)[0]
        model_hedge_loss_tra = q[i:i+1]+HedgingLossNPcum(model_weights[i:i+1,:,:], input_y[i:i+1,:], input_X[i:i+1,:,:], q[i:i+1])
        deep_hedge_loss_tra = HedgingLossNPcum(predictions[i:i+1,:,:], input_y[i:i+1,:], input_X[i:i+1,:,:], q[i:i+1])
        plt.plot(model_hedge_loss_tra[0,:],color='green')
        plt.show()
        plt.plot(deep_hedge_loss_tra[0,:])
        plt.show()
        print("BS price: ", q[i:i+1])
        print("Deep Hedge price:", deep_hedge_loss_tra[0,0])
        print("Epoch %i, Loss: %.3f" % (idx,np.mean(training_losses)))
        print("Unregularized Loss: %.3f" % (np.mean(quad_losses)))
        FLAGS.training = 2
        train_batch = model.sample_train(FLAGS)
        X_train, y_train = train_data
        test_weights = sess.run([train_batch],{model.X_train: X_train})
        FLAGS.training = 3
        p = price_valid(FLAGS)
        validate_graph = model.sample_validate(FLAGS)
        loss = QuadraticHedgingLoss(FLAGS, validate_graph,model.y_valid, model.X_valid, p)
        validate_loss,test_weights,certain_equivalent=sess.run([loss, validate_graph, model.q],{model.X_valid: X_valid,model.y_valid:np.squeeze(y_valid)})
        print("Validate Loss: %.3f" % (validate_loss))
        FLAGS.training = 1

        #Save Model for Early Stopping
        if validate_loss < FLAGS.minimal_loss:
            FLAGS.minimal_loss = validate_loss
            save_path = saver.save(sess, './model/my-test-model')
            
        EarlyTrainLoss[idx] = np.mean(training_losses)
        EarlyValidateLoss[idx] = validate_loss

    # Generate Batch Normalization Parameters for Testing
    FLAGS.training = 2
    train_batch = model.sample_train(FLAGS)
    X_train, y_train = train_data
    test_weights = sess.run([train_batch], {model.X_train: X_train})

    # Generate Predictions
    FLAGS.training = 3
    p = price_test(FLAGS)
    test_graph = model.sample(FLAGS)
    loss = QuadraticHedgingLoss(FLAGS, test_graph,model.y_test, model.X_test, p)
    test_loss, test_weights, certain_equivalent = sess.run([loss, test_graph, model.q],{model.X_test: X_test,model.y_test: np.squeeze(y_test)})
    print("Loss on Test Data (unstopped): %.3f" % (test_loss))
    p_np = (BlackScholes(100, T = float(FLAGS.TIME_STEPS)/365.) * FLAGS.NUM_RELAVENT)

#######################################
# Evaluate Network for Early Stopping #
#######################################
new_saver = tf.train.import_meta_graph('./model/my-test-model.meta', clear_devices=True)
with tf.Session() as sess1:
    # Generate Batch Normalization Parameters for Testing
    new_saver.restore(sess1,tf.train.latest_checkpoint('./model/'))
    FLAGS.training = 2
    train_batch = model.sample_train(FLAGS)
    X_train, y_train = train_data
    test_weights = sess1.run([train_batch],{model.X_train: X_train})
    
    # Generate Predictions
    FLAGS.training = 3
    p = price_test(FLAGS)
    test_graph = model.sample(FLAGS)
    loss = QuadraticHedgingLoss(FLAGS, test_graph,model.y_test, model.X_test, p)
    test_loss, test_weights, certain_equivalent = sess1.run([loss, test_graph, model.q],{model.X_test: X_test, model.y_test: np.squeeze(y_test)})
    print("Loss on Test Data (stopped): %.3f" % (test_loss))

# Print Runtime 
t1 = time.time()
total_time=t1-t0
print("Runtime (in sec): %.3f" % (total_time))

BS price:  [2.2871506]
Deep Hedge price: 2.1729061007499695
Epoch 0, Loss: 12.482
Unregularized Loss: 12.482
Validate Loss: 4.020

BS price:  [2.2871506]
Deep Hedge price: 2.203982323408127
Epoch 1, Loss: 3.661
Unregularized Loss: 3.661
Validate Loss: 3.299

BS price:  [2.2871506]
Deep Hedge price: 2.197788804769516
Epoch 2, Loss: 3.055
Unregularized Loss: 3.055
Validate Loss: 2.770

BS price:  [2.2871506]
Deep Hedge price: 2.202514559030533
Epoch 3, Loss: 2.600
Unregularized Loss: 2.600
Validate Loss: 2.367

BS price:  [2.2871506]
Deep Hedge price: 2.209967076778412
Epoch 4, Loss: 2.266
Unregularized Loss: 2.266
Validate Loss: 2.107

BS price:  [2.2871506]
Deep Hedge price: 2.207842767238617
Epoch 5, Loss: 1.996
Unregularized Loss: 1.996
Validate Loss: 1.822

BS price:  [2.2871506]
Deep Hedge price: 2.209264785051346
Epoch 6, Loss: 1.764
Unregularized Loss: 1.764
Validate Loss: 1.665

BS price:  [2.2871506]
Deep Hedge price: 2.207951098680496
Epoch 7, Loss: 1.567
Unregularized Loss: 1.567
Validate Loss: 1.446

BS price:  [2.2871506]
Deep Hedge price: 2.2038696706295013
Epoch 8, Loss: 1.340
Unregularized Loss: 1.340
Validate Loss: 1.315

BS price:  [2.2871506]
Deep Hedge price: 2.2042720019817352
Epoch 9, Loss: 1.129
Unregularized Loss: 1.129
Validate Loss: 1.077

################
# Save Results #
################

np.save('deep_weights', test_weights)
deep_hedge_loss = HedgingLossNP(test_weights, y_test, X_test, p_np)
model_weights = delta_BS(X_test,T = float(FLAGS.TIME_STEPS)/365.)
#model_weights = np.concatenate([model_weights[:,:,0:FLAGS.NUM_RELAVENT],np.zeros([FLAGS.NUM_TEST,FLAGS.TIME_STEPS-1,FLAGS.NUM_ASSETS-FLAGS.NUM_RELAVENT])], axis = 2)
if FLAGS.NUM_ASSETS-FLAGS.NUM_RELAVENT != 0:
    model_weights = np.concatenate(
        [model_weights[:,:,0:FLAGS.NUM_RELAVENT],
        np.zeros([FLAGS.NUM_TEST,FLAGS.TIME_STEPS-1,
        FLAGS.NUM_ASSETS-FLAGS.NUM_RELAVENT])], axis = 2)
np.save('model_weights', model_weights)
model_hedge_loss = HedgingLossNP(model_weights, y_test, X_test, p_np)+p_np
np.save('stock_prices', X_test)
df = pd.DataFrame(deep_hedge_loss)
df.to_csv('deep_hedge_loss.csv')
df = pd.DataFrame(model_hedge_loss)
df.to_csv('model_hedge_loss.csv')

stock_prices=np.load('stock_prices.npy')
model_hedge_loss = pd.read_csv('model_hedge_loss.csv')
deep_hedge_loss = pd.read_csv('deep_hedge_loss.csv')

for i in range(10):
   plt.plot(stock_prices[i,1:,:])
plt.show()

model_hedge_loss.hist(column='0',color='green')
deep_hedge_loss.hist(column='0')

array([[<matplotlib.axes._subplots.AxesSubplot object at 0x7f1f86881748>]],
      dtype=object)

plt.show()

from IPython.display import clear_output, Image, display, HTML

def strip_consts(graph_def, max_const_size=32):
    """Strip large constant values from graph_def."""
    strip_def = tf.GraphDef()
    for n0 in graph_def.node:
        n = strip_def.node.add() 
        n.MergeFrom(n0)
        if n.op == 'Const':
            tensor = n.attr['value'].tensor
            size = len(tensor.tensor_content)
            if size > max_const_size:
                tensor.tensor_content = bytes("<stripped %d bytes>"%size, 'utf-8')#"<stripped %d bytes>"%size
    return strip_def

def show_graph(graph_def, max_const_size=32):
    """Visualize TensorFlow graph."""
    if hasattr(graph_def, 'as_graph_def'):
        graph_def = graph_def.as_graph_def()
    strip_def = strip_consts(graph_def, max_const_size=max_const_size)
    code = """
        <script>
          function load() {{
            document.getElementById("{id}").pbtxt = {data};
          }}
        </script>
        <link rel="import" href="https://tensorboard.appspot.com/tf-graph-basic.build.html" onload=load()>
        <div style="height:600px">
          <tf-graph-basic id="{id}"></tf-graph-basic>
        </div>
    """.format(data=repr(str(strip_def)), id='graph'+str(np.random.rand()))

    iframe = """
        <iframe seamless style="width:1200px;height:620px;border:0" srcdoc="{}"></iframe>
    """.format(code.replace('"', '&quot;'))
    display(HTML(iframe))

#show_graph(tf.get_default_graph().as_graph_def())

weights=np.load('deep_weights.npy')

S = stock_prices[7,1:,:]
S = S.T
delta = delta_BS(S, sigma = 0.2, K = 100., r = 0., T =30./365.)
plt.plot(delta[0])
plt.plot(weights[7,1:])
plt.show()