compiled 2018-05-31

Introduction

Who is this Guy ?

Proprietary/Principal Trading

Brian Peterson:

  • quant, author, open source advocate
  • author or co-author of over 10 packages for using R in Finance
  • organization admin for R's participation in Google Summer of Code
  • Lecturer, University Of Washington Computational Finance and Risk Management
  • manage quant trading teams at several Chicago proprietary trading firms over time …

Proprietary Trading:

  • proprietary or principal traders are a specific "member" structure with the exchanges
  • high barriers to entry, large up-front capital requirements
  • many strategies pursued in this structure have capacity constraints
  • benefits on the other side are low fees, potentially high leverage
  • money management needs to be very focused on drawdowns, leverage, volatility

Backtesting, art or science?

 

 

Back-testing. I hate it - it's just optimizing over history. You never see a bad back-test. Ever. In any strategy. - Josh Diedesch (2014), CalSTRS

 

 

Every trading system is in some form an optimization. - Emilio Tomasini (2009)

Moving Beyond Assumptions

Many system developers consider

"I hypothesize that this strategy idea will make money"

to be adequate.

Instead, strive to:

  • understand your business constraints and objectives
  • build a hypothesis for the system
  • build the system in pieces
  • test the system in pieces
  • measure how likely it is that you have overfit

Constraints and Objectives

Constraints

  • capital available
  • products you can trade
  • execution platform

Benchmarks

  • published or synthetic?
  • what are the limitations?
  • are you held to it, or just measured against it?

Objectives

  • formulate objectives for testability
  • make sure they reflect your real business goals

Building a Hypothesis

 

To create a testable idea (a hypothesis):

  • formulate a declarative conjecture
  • make sure the conjecture is predictive
  • define the expected outcome
  • describe means of verifying/testing

 

good/complete Hypothesis Statements include:

  • what is being analyzed (the subject),
  • dependent variable(s) (the result/prediction)
  • independent variables (inputs to the model)
  • the anticipated possible outcomes, including direction or comparison
  • addresses how you will validate or refute each hypothesis

Tools

R in Finance trade simulation toolchain

Building Blocks

Filters

  • select the instruments to trade
  • categorize market characteristics that are favorable to the strategy

Indicators

  • values derived from market data
  • includes all common "technicals"

Signals

  • describe the interaction between filters, market data, and indicators
  • can be viewed as a prediction at a point in time

Rules

  • make path-dependent actionable decisions

Installing blotter and quantstrat

  • on Windows, you will need Rtools
install.packages('devtools') # if you don't have it installed
install.packages('PerformanceAnalytics')
install.packages('FinancialInstrument')

devtools::install_github('braverock/blotter')
devtools::install_github('braverock/quantstrat')

Our test strategy - MACD

stock.str <- 'EEM'

currency('USD')
stock(stock.str,currency='USD',multiplier=1)

startDate='2003-12-31'
initEq=100000
portfolio.st='macd'
account.st='macd'

initPortf(portfolio.st,symbols=stock.str)
initAcct(account.st,portfolios=portfolio.st,initEq = initEq)
initOrders(portfolio=portfolio.st)

strategy.st<-portfolio.st
# define the strategy
strategy(strategy.st, store=TRUE)
## get data 
getSymbols(stock.str,
           from=startDate,
           adjust=TRUE,
           src='tiingo')
  • we'll use MACD as a simple trend follower for illustration
  • I am not advocating MACD as an investment strategy
  • we hypothesize that our MACD strategy will detect durable trends
  • we also hypothesize that it will get chopped up by sideways markets because of the lag in the moving average

 

  • don't pay a lot of attention to the code, this entire presentation is written using rmarkdown (Xie 2014), with references in BibTeX via JabRef and has been compiled for this presentation

Evaluating the Strategy

Test the System in Pieces

How to Screw Up Less

Far better an approximate answer to the right question, which is often vague, than an exact answer to the wrong question, which can always be made precise. - John Tukey (1962) p. 13

 

Fail quickly, think deeply, or both?

 

No matter how beautiful your theory, no matter how clever you are or what your name is, if it disagrees with experiment, it’s wrong. - Richard P. Feynman (1965)

Add the indicator

#MA parameters for MACD
fastMA = 12 
slowMA = 26 
signalMA = 9
maType="EMA"

#one indicator
add.indicator(strategy.st, name = "MACD", 
                  arguments = list(x=quote(Cl(mktdata)),
                                   nFast=fastMA, 
                                   nSlow=slowMA),
                  label='_' 
)

 

 

 

MACD is a two moving average cross system that seeks to measure:

  • the momentum of the change
  • the divergence between the two moving averages

Classical technical analysis, for example only, not widely deployed in production

Measuring Indicators

What do you think you're measuring? A good indicator measures something in the market:

  • a theoretical "fair value" price, or
  • the impact of a factor on that price, or
  • turning points, direction, or slope of the series

Make sure the indicator is testable:

  • hypothesis and tests for the indicator
    • standard errors and goodness of fit
    • t-tests or p-value
  • goodness of fit
  • custom 'perfect foresight' model built from a periodogram or signature plot

If your indicator doesn't have testable information content, throw it out and start over.

Specifying tests for Indicators

  • facts to support or refute

  • general tests
    • MSFE/MSFE
    • confusion matrix
    • standard errors
    • Monte Carlo errors
  • specific tests
    • tests related to the model
    • tests of the prediction
    • tests of the relationships between variables

General Diagnostics for Indicators

  • Euclidean Distance
  • often squared
  • or de-meaned
  • or lagged to line things up
  • clustering models
  • piece-wise linear decomposition
  • mean squared forecast error

Add the Signals

#two signals
add.signal(strategy.st,
           name="sigThreshold",
           arguments = list(column="signal._",
                            relationship="gt",
                            threshold=0,
                            cross=TRUE),
           label="signal.gt.zero"
)
   
add.signal(strategy.st,
           name="sigThreshold",
           arguments = list(column="signal._",
                            relationship="lt",
                            threshold=0,
                            cross=TRUE),
           label="signal.lt.zero"
)

Combining Signals

Signals are often combined:

"\(A\) & \(B\)" should both be true.

This is a composite signal, and serves to reduce the dimensionality of the decision space.

A lower dimensioned space is easier to measure, but is at higher risk of overfitting.

Avoid overfitting while combining signals by making sure that your process has a strong economic or theoretical basis before writing code or running tests

Measuring Signals

Signals make predictions so all the literature on forecasting is applicable:

  • mean squared forecast error, BIC, etc.
  • box plots or additive models for forward expectations
  • "revealed performance" approach of Racine and Parmeter (2012)
  • re-evaluate assumptions about the method of action of the strategy
  • detect information bias or luck before moving on
add.distribution(strategy.st,
                 paramset.label = 'signal_analysis',
                 component.type = 'indicator',
                 component.label = '_', 
                 variable = list(n = fastMA),
                 label = 'nFAST'
)
#> [1] "macd"

add.distribution(strategy.st,
                 paramset.label = 'signal_analysis',
                 component.type = 'indicator',
                 component.label = '_', 
                 variable = list(n = slowMA),
                 label = 'nSLOW'
)
#> [1] "macd"

Run Signal Analysis Study

sa_buy <- apply.paramset.signal.analysis(
            strategy.st, 
            paramset.label='signal_analysis', 
            portfolio.st=portfolio.st, 
            sigcol = 'signal.gt.zero',
            sigval = 1,
            on=NULL,
            forward.days=50,
            cum.sum=TRUE,
            include.day.of.signal=FALSE,
            obj.fun=signal.obj.slope,
            decreasing=TRUE,
            verbose=TRUE)
#> Applying Parameter Set:  12, 26
sa_sell <- apply.paramset.signal.analysis(
             strategy.st, 
             paramset.label='signal_analysis', 
             portfolio.st=portfolio.st, 
             sigcol = 'signal.lt.zero',
             sigval = 1,
             on=NULL,
             forward.days=10,
             cum.sum=TRUE,
             include.day.of.signal=FALSE,
             obj.fun=signal.obj.slope,
             decreasing=TRUE,
             verbose=TRUE)
#> Applying Parameter Set:  12, 26

Look at Buy Signal

Look at Buy Signal (cont.)

Look at Buy Signal (cont.)

Look at Sell Signal

Add the Rules

# entry
add.rule(strategy.st,
         name='ruleSignal', 
         arguments = list(sigcol="signal.gt.zero",
                          sigval=TRUE, 
                          orderqty=100, 
                          ordertype='market', 
                          orderside='long', 
                          threshold=NULL),
         type='enter',
         label='enter',
         storefun=FALSE
)
# exit
add.rule(strategy.st,name='ruleSignal', 
         arguments = list(sigcol="signal.lt.zero",
                          sigval=TRUE, 
                          orderqty='all', 
                          ordertype='market', 
                          orderside='long', 
                          threshold=NULL,
                          orderset='exit2'),
         type='exit',
         label='exit'
)

Measuring Rules

If your signal process doesn't have predictive power, stop now.

 

  • rules should refine the way the strategy 'listens' to signals
  • entries may be passive or aggressive, or may level or pyramid into a position
  • exits may have their own signal process, or may be derived empirically
  • risk rules should be added near the end, for empirical 'stops' or to meet business constraints

Beware of Rule Burden:

  • having too many rules is an invitation to overfitting
  • adding rules after being disappointed in backtest results is almost certainly an exercise in overfitting (data snooping)
  • strategies with fewer rules are more likely to be robust out of sample

Run the Strategy

start_t<-Sys.time()
out<-applyStrategy(strategy.st , 
                   portfolios=portfolio.st,
                   parameters=list(nFast=fastMA, 
                                   nSlow=slowMA,
                                   nSig=signalMA,
                                   maType=maType),
                   verbose=TRUE)
end_t<-Sys.time()

start_pt<-Sys.time()
updatePortf(Portfolio=portfolio.st)
end_pt<-Sys.time()
#> [1] "Running the backtest (applyStrategy):"
#> Time difference of 0.7516775 secs
#> [1] "trade blotter portfolio update (updatePortf):"
#> Time difference of 0.08920383 secs

Initial Results

chart.Posn(Portfolio=portfolio.st,Symbol=stock.str)
plot(add_MACD(fast=fastMA, slow=slowMA, signal=signalMA,maType="EMA"))

Parameter Optimization

Parameter Optimization

Every trading system is in some form an optimization. - Tomasini and Jaekle (2009)

 

  • all strategies have parameters: What are the right ones?
  • locate a parameter combination that most closely matches the hypotheses and objectives
  • look for stable regions of both in and out of sample performance
  • even your initial parameter choices are an optimization, you've chosen values you believe may be optimal
  • parameter optimization just adds process and tools to your investment decision

What are good parameters?

  • parameters are all the non-data inputs to your strategy
  • they are the knobs and levers that control the model
  • good parameters are parsimonious

    • not too many of them
    • not too loosely defined
    • each has a clear impact on the strategy performance
    • each is testable in a backtest
  • production strategies have additional parameters that are specific to the production environment

Limiting the number of parameters

  • focus on the major drivers defined by your hypotheses
  • if necessary, refine the hypotheses
  • use ROC or effective parameter testing (Hastie, Tibshirani, and Friedman 2009) to identify the major drivers in a small training and testing set
  • limiting free parameters

    • limits the opportunities to cherry pick lucky combinations
    • reduces the required adjustment for data mining bias (Bailey and López de Prado 2014, Harvey and Liu (2015))
    • reduces number of trials (resulting in faster parameter searches)

Too Many Free Parameters

  • increases your chance of overfitting
  • larger numbers of free parameters vastly increase the amount of data you need
  • more parameters lowers your degrees of freedom, and

    • increases the chance that your chosen parameters are a false discovery
  • goal should be to eliminate free parameters before running parameter optimization

Moving from Free to Non-Free Parameters

  • eliminate parameters where possible
  • fix/freeze tertiary parameters next
  • can you use production data to set less important parameters empirically?
  • is there a model fitting method you could use instead? ( e.g. regression, maximum likelihood, Bayes)
  • does this still count as a free parameter?

    • it depends on the complexity of the model/approach used to fit
    • does your fitting model have parameters too?
    • what assumptions are you making?

Robust Parameters

  • small parameter changes lead to small changes in P&L and objective expectations
  • out of sample deterioration is not large, on average
  • parameter choices have a sound theoretical or economic basis
  • parameter variation should produce correlated differences in multiple objectives

quantstrat Parameters

  • quantstrat parameters are added via add.distribution

    • they are identified with a parameter set (paramset) label via the paramset.label argument
    • the component.type argument tells quantstrat where to look for a variable (inicators, signals, rules, etc.)
    • the component.label argument must match the label used for the component earlier in the strategy
    • the variable argument tells quantstrat what to act on
  • relationships (constraints) between parameters are set via add.distribution.constraint

quantstrat::add.distribution()

.FastMA = (3:15)
.SlowMA = (20:60)
# .nsamples = 200 
#for random parameter sampling, 
# less important if you're using doParallel or doMC

### MA paramset
add.distribution(strategy.st,
                 paramset.label = 'MA',
                 component.type = 'indicator',
                 component.label = '_', #this is the label given to the indicator in the strat
                 variable = list(n = .FastMA),
                 label = 'nFAST'
)

add.distribution(strategy.st,
                 paramset.label = 'MA',
                 component.type = 'indicator',
                 component.label = '_', #this is the label given to the indicator in the strat
                 variable = list(n = .SlowMA),
                 label = 'nSLOW'
)

add.distribution.constraint(strategy.st,
                            paramset.label = 'MA',
                            distribution.label.1 = 'nFAST',
                            distribution.label.2 = 'nSLOW',
                            operator = '<',
                            label = 'MA'
)

quantstrat::apply.paramset()

  • creates param.combos from distributions and constraints
  • runs applyStrategy() on portfolio for each param. combo
  • nsamples: draws random selection
  • audit: file name to store all portfolios and order books
  • this takes over an hour on my laptop, and less than two minutes on my workstation
.paramaudit <- new.env()
ps_start <- Sys.time()
paramset.results  <- apply.paramset(strategy.st, 
                           paramset.label='MA', 
                           portfolio.st=portfolio.st, 
                           account.st=account.st, 
#                          nsamples=.nsamples,
                           audit=.paramaudit,
                           store=TRUE,
                           verbose=FALSE)
ps_end   <- Sys.time()

paramset results

Running the parameter search (apply.paramset): Time difference of 1.244973 mins Total trials: 543

plot(paramset.results$cumPL[-1,], major.ticks = 'years', grid.ticks.on = 'years')

Search Process

  • by default, quantstrat will use a brute force search by combining all parameters
  • since the full brute force search space may be too large, you can use the nsamples argument to apply.paramset to sample from the whole parameter space
  • more sophisticated sampling may be done using an optimizer (on my todo list to generalize)

Parameter Regions

  • Tomasini and Jaekle (2009) and Pardo (2008) both recommend finding stable parameter regions
  • ideally, those regions will be "obvious" from first principles
  • you can easily use plot to compare one parameter and one output, less easily 2-3 parameters
  • for more complex parameter surfaces, 3-d surfaces or heat maps (e.g. via contourplot) are better

Parameter distributions - Profit to Max Drawdown

Overfitting

Things to Watch Out For, or, Types of Overfitting

Look Ahead Bias

  • directly using knowledge of future events

Data Mining Bias

  • caused by testing multiple configurations and parameters over multiple runs, with adjustments between backtest runs
  • exhaustive searches may or may not introduce biases

Data Snooping

  • knowledge of the data set can contaminate your choices
  • making changes after failures without having strong experimental design
NOTE: We just did all three of these things by optimizing over the entire series

Degrees of Freedom

Pardo (2008, 130–31) describes the degrees of freedom of a strategy as:

  • the number of of observations
  • minus all observations used by your indicators

In parameter optimization, we should consider the sum of observations used by all different parameter combinations.

Goal should be to have 95% or more free parameters or 'free observations' even after parameter search.

Applying Degrees of Freedom calculation

degrees.of.freedom(strategy = 'macd', portfolios = 'macd', paramset.method='trial')
#> 
#>  Degrees of freedom report for strategy: macd 
#>  Total market observations: 3628 
#>  Degrees of freedom consumed by strategy: 1248 
#>  Total degrees of freedom remaining: 2380 
#>  % Degrees of Freedom:  65.6 %
degrees.of.freedom(strategy = 'macd', portfolios = 'macd', paramset.method='sum')
#> 
#>  Degrees of freedom report for strategy: macd 
#>  Total market observations: 3628 
#>  Degrees of freedom consumed by strategy: 26204 
#>  Total degrees of freedom remaining: -22576 
#>  % Degrees of Freedom:  -622.27 %
  • there is a lot more information in the returned object, see ?dof for more details

Implications for Torture and Training sets

  • to increase degrees of freedom, you may:

    • increase the length of market data consumed by the backtest
    • increase the number of symbols examined
    • decrease the number of free parameters
    • decrease the ranges of the parameters examined
  • torture and training data sets should be large enough to still have reasonable statistical confidence when you move to walk forward
  • this underlines the difficulty of doing extensive parameter searches on low frequency data

Multiple testing bias

Investment theory, not computational power, should motivate what experiments are worth conducting. (Bailey and López de Prado 2014, 10)

 

  • we do want to test multiple hypotheses
  • as we perform more tests, the likelihood of "discovering" an apparently significant result which is actually random increases
  • as we perform more tests, this likelihood approaches 100%
  • we need to be prepared to detect and correct for this multiple testing bias

Deflated Sharpe

Bailey and López de Prado (2014) describes a way of adjusting the observed Sharpe Ratio of a candidate strategy by taking the variance of the trials and the skewness and kurtosis into account.

  • corrects for selection bias from multiple testing on the same or related data (the parameter optimization)
  • adjusts for non-normality of observed returns
  • establishes theoretical maximum Sharpe for a series of related trials

  • we have implemented a version with Kipnis (2017) as SharpeRatio.deflated

Applying Deflated Sharpe Ratio

dsr <- SharpeRatio.deflated(portfolios='macd',strategy='macd', audit=.paramaudit)
obs.Sharpe max.Sharpe deflated.Sharpe p.value nTrials
0.57 3.077243 0.5177464 0.091673 543
  • small numbers of trials will result in a small adjustment
  • maximum computed Sharpe ratio relies on correlation of trials and moments of the distribution
  • p-value is the assumed significance of the adjusted backtest, so low p-value suggests a low probability of overfitting

Haircut Sharpe Ratio

  • Harvey and Liu (2015) proposes another method of correcting for the multiple testing bias
  • they propose three methods of adjusting the Sharpe Ratio for multiple related tests
  • this research was mostly aimed at the proliferation of equity 'risk premia', but it is applicable to backtests where we have all the information as well
  • Jasen Mackie (2016) implemented these methods in R, and
  • we have ported those methods to the quantstrat in SharpeRatio.haircut

Applying the Sharpe Ratio Haircut

hsr <- SharpeRatio.haircut(portfolios='macd',strategy='macd',audit=.paramaudit)
#> Warning in log(x): NaNs produced
#> 
#>  Sharpe Ratio Haircut Report: 
#> 
#>  Observed Sharpe Ratio: 0.57 
#>  Sharpe Ratio corrected for autocorrelation: 0.57 
#> 
#>  Bonferroni Adjustment: 
#>  Adjusted P-value = 1 
#>  Haircut Sharpe Ratio = 0 
#>  Percentage Haircut = 1 
#> 
#>  Holm Adjustment: 
#>  Adjusted P-value = 1 
#>  Haircut Sharpe Ratio = 0 
#>  Percentage Haircut = 1 
#> 
#>  BHY Adjustment: 
#>  Adjusted P-value = 0.9981303 
#>  Haircut Sharpe Ratio = 0.0007425807 
#>  Percentage Haircut = 0.9986972 
#> 
#>  Average Adjustment: 
#>  Adjusted P-value = 0.9993768 
#>  Haircut Sharpe Ratio = 0.0002475267 
#>  Percentage Haircut = 0.9995657

Monte Carlo and the bootstrap

Sampling from limited information

  • estimate the 'true' properties of a distribution from incomplete information
  • evaluate the likelihood (test the hypothesis) that a particular result is
    • not the result of chance
    • not overfit
  • understand confidence intervals for other descriptive statistics on the backtest
  • simulate different paths that the results might have taken, if the ordering had been different

History of Monte Carlo and bootstrap simulation

  • Laplace was the first to describe the mathematical properties of sampling from a distribution
  • Mahalanobis extended this work in the 1930's to describe sampling from dependent distributions, and anticipated the block bootstrap by examining these dependencies
  • Monte Carlo simulation was developed by Stan Ulam and John von Neumann (with computation by Françoise Ulam) as part of the hydrogen bomb program in 1946 (Richard Rhodes, Dark Sun, p.304)
  • computational implementation of Monte Carlo simulation was constructed by Nicholas Metropolis on the ENIAC and MANIAC machines
  • Metropolis was an author in 1953 of the prior distribution sampler extended by W.K Hastings to the modern Metropolis-Hastings form in 1970
  • Maurice Quenouille & John Tukey created 'jackknife' simulation in the 1950's
  • Bradley Efron described the modern bootstrap in 1979

Simulation from the equity curve using daily P&L

Sampling Without replacement:

  • results have same mean and final P&L, allows inference on likely error bounds

Sampling With Replacement:

  • provides multiple paths; block sampling, with replacement
    • mimics some of the autocorrelation structure of returns
    • may create deeper drawdowns if down streaks are effectively repeated
  • choosing block size
    • some multiple of average holding period, 1/5 or 1/4 holding period is a good guess
    • block size equal to observed significant autocorrelation
    • variable distribution of block size centered around one of the above, with tails

Disadvantages of Sampling from portfolio P&L:

  • not transparent
  • potentially unrealistic
  • really only a statistical confidence model
  • path won't line up with historical market regimes

Empirical Example, with replacement

rsim <- mcsim(  Portfolio = "macd"
               , Account = "macd"
               , n=1000
               , replacement=TRUE
               , l=1, gap=10)
rblocksim <-  mcsim(  Portfolio = "macd"
               , Account = "macd"
               , n=1000
               , replacement=TRUE
               , l=10, gap=10)

P&L Quantiles:

0% 25% 50% 75% 100%
-0.04 0.0039 0.014 0.027 0.095
0% 25% 50% 75% 100%
-0.04 0.0038 0.014 0.027 0.091

Empirical Example, With replacement, cont.

INSERT CSCV/PBO HERE

other bootstrapping methods

  • White’s Data Mining Reality Check from White (2000) http://www.cristiandima.com/white-s-reality-check-for-data-snooping-in-r/

  • bootstrap optimization as an option in LSTM (Vince 2009)

  • discussion in Aronson (2006, 230–40)
  • using resampled market data, with or without multi-asset dependence, to train or run the system
  • k-fold cross validation
  • combinatorially symmetric cross-validation (CSCV) and probability of backtest overfitting (PBO) from Bailey et al. (2017)

Simulation with round turn trades

  • resampling entries and exits
    • round turn size, direction, duration are sampled from the trades
    • also resample from any flat periods
    • applied in order to market data as new transactions at the then-prevalent price
  • trade expectations in the random-trade model, compared to backtest expectations
    • Drawdowns and tail risk, as in other simulation types

Dis/advantages of bootstrapping trades

Disadvantages:

  • much more complicated to model trade dynamics
  • maintaining constraints e.g. max position

Advantages:

  • can more closely compare strategy to random entries and exits with same overall dynamic
  • creates a distribution around the trading dynamics, not just the daily P&L
  • effectively creates simulated traders with the same style as strategy but no skill

  • best for modeling "skill vs. luck"

Outline of Trade Resampling Process

Extract Stylized Facts from the observed series:

  • duration, quantity, direction of round turns
  • total time in market
  • %-time long/short/flat
  • number of layers and maximum position

For each replicate:

  • sample, w/ or w/o replacement, from long/short/flat trades
  • construct the first/base layer for the replicate
  • first layer preserves long/short/flat ratios
  • add long/short layers on top of long/short periods

For the collections of start/qty/duration:

  • construct portfolios for all replicates
  • construct opening/closing transactions from start/qty/duration collections
  • apply those transactions to each replicate portfolio

Empirical Example

# nrtxsim <- txnsim( Portfolio = "macd"
#                  , n=250
#                  , replacement=FALSE
#                  , tradeDef = 'increased.to.reduced')

wrtxsim <- txnsim( Portfolio = "macd"
                 , n=250
                 , replacement=TRUE
                 , tradeDef = 'increased.to.reduced')

Comments:

  • without replacement samples identical number of trades, randomizing start date
  • with replacement samples number of trades to get correct total duration
  • entry and exit prices are set at the time of each trade, from current market

Empirical Example, With replacement

P&L Quantiles:

0% 25% 50% 75% 100%
-3041 208 914 1746 6283

Overfitting summary

  • the best way to prevent overfitting is via careful experiment design
  • if your signal process isn't predictive, you should never get to the point of constructing a full backtest
  • avoid free parameters where possible
  • use as much data, and as many instruments, as you can

Walk Forward

Walk Forward

  • walk forward analysis periodically reparameterizes the strategy
  • in production, this could be done daily, weekly, monthly, or quarterly
  • rolling window analysis is the most common, assumes evolving parameter space
  • anchored analysis assumes that the residual information from earlier periods helps you make decisions now

Applying Walk Forward

  • apply parameter optimization via walk.forward
  • consider choice of objective

    • Sharpe ratio
    • minimize drawdown
    • profit to max draw
    • multiple objective optimization
    • default choice of best in-sample generally a bad idea
  • be careful about performing walk forward analysis then making changes

  • more trials increases bias

quantstrat::walk.forward

wfportfolio <- "wf.macd"
initPortf(wfportfolio,symbols=stock.str)
#> [1] "wf.macd"
initOrders(portfolio=wfportfolio)
wf_start <- Sys.time()
wfaresults <- walk.forward(strategy.st, 
                           paramset.label='MA', 
                           portfolio.st=wfportfolio, 
                           account.st=account.st, 
#                           nsamples=100,
                           period='months',
                           k.training = 48,
                           k.testing = 12,
                           verbose = FALSE,
                           anchored = FALSE,
                           audit.prefix = NULL,
                           savewf = FALSE,
                           include.insamples = TRUE,
                           psgc=TRUE
                          )
#> [1] "=== training MA on 2003-12-31/2007-10-31"
#> [1] "=== testing param.combo 533 on 2007-11-01/2008-10-31"
#>     nFAST nSLOW
#> 533    15    60
#> [1] "2008-04-29 00:00:00 EEM 100 @ 40.1341581132"
#> [1] "=== training MA on 2004-11-01/2008-10-31"
#> [1] "=== testing param.combo 325 on 2008-11-03/2009-10-30"
#>     nFAST nSLOW
#> 325    15    44
#> [1] "=== training MA on 2005-11-01/2009-10-30"
#> [1] "=== testing param.combo 195 on 2009-11-02/2010-10-29"
#>     nFAST nSLOW
#> 195    15    34
#> [1] "2010-02-08 00:00:00 EEM -100 @ 31.7488855213"
#> [1] "2010-03-19 00:00:00 EEM 100 @ 35.694013308"
#> [1] "2010-05-18 00:00:00 EEM -100 @ 33.8495379792"
#> [1] "=== training MA on 2006-11-01/2010-10-29"
#> [1] "=== testing param.combo 13 on 2010-11-01/2011-10-31"
#>    nFAST nSLOW
#> 13    15    20
#> [1] "2011-03-31 00:00:00 EEM 100 @ 42.0677276172"
#> [1] "2011-05-24 00:00:00 EEM -100 @ 40.3084056753"
#> [1] "2011-07-25 00:00:00 EEM 100 @ 41.4420713187"
#> [1] "=== training MA on 2007-11-01/2011-10-31"
#> [1] "=== testing param.combo 208 on 2011-11-01/2012-10-31"
#>     nFAST nSLOW
#> 208    15    35
#> [1] "2011-11-15 00:00:00 EEM 100 @ 35.1741112291"
#> [1] "2011-11-28 00:00:00 EEM -200 @ 33.0249712822"
#> [1] "2012-01-23 00:00:00 EEM 100 @ 36.7344639248"
#> [1] "2012-04-25 00:00:00 EEM -100 @ 37.0613458789"
#> [1] "=== training MA on 2008-11-03/2012-10-31"
#> [1] "=== testing param.combo 273 on 2012-11-01/2013-10-31"
#>     nFAST nSLOW
#> 273    15    40
#> [1] "2013-05-16 00:00:00 EEM 100 @ 39.0432867026"
#> [1] "2013-06-06 00:00:00 EEM -100 @ 36.4554031459"
#> [1] "=== training MA on 2009-11-02/2013-10-31"
#> [1] "=== testing param.combo 26 on 2013-11-01/2014-10-31"
#>    nFAST nSLOW
#> 26    15    21
#> [1] "2014-04-01 00:00:00 EEM 100 @ 38.0793609613"
#> [1] "=== training MA on 2010-11-01/2014-10-31"
#> [1] "=== testing param.combo 364 on 2014-11-03/2015-10-30"
#>     nFAST nSLOW
#> 364    15    47
#> [1] "2015-02-26 00:00:00 EEM 100 @ 38.364624064"
#> [1] "2015-03-17 00:00:00 EEM -200 @ 36.4750652703"
#> [1] "2015-04-09 00:00:00 EEM 100 @ 40.1225717974"
#> [1] "=== training MA on 2011-11-01/2015-10-30"
#> [1] "=== testing param.combo 312 on 2015-11-02/2016-10-31"
#>     nFAST nSLOW
#> 312    15    43
#> [1] "2015-11-03 00:00:00 EEM 100 @ 33.5843280749"
#> [1] "2015-11-23 00:00:00 EEM -200 @ 33.1202468353"
#> [1] "=== training MA on 2012-11-01/2016-10-31"
#> [1] "=== testing param.combo 299 on 2016-11-01/2017-10-31"
#>     nFAST nSLOW
#> 299    15    42
#> [1] "=== training MA on 2013-11-01/2017-10-31"
#> [1] "=== testing param.combo 13 on 2017-11-01/2018-05-30"
#>    nFAST nSLOW
#> 13    15    20
#> Warning in .updatePosPL(Portfolio = pname, Symbol = as.character(symbol), :
#> Could not parse //2018-05-30 as ISO8601 string, or one/bothends of the
#> range were outside the available prices: 2003-12-31/2018-05-30. Using all
#> data instead.
wf_end <-Sys.time()

Walk Forward Results

#> 
#>  Running the walk forward search: 
#> 
#> Time difference of 28.85891 mins
#>  Total trials: 6939
Portfolio Symbol Num.Txns Num.Trades Net.Trading.PL Avg.Trade.PL Med.Trade.PL Largest.Winner Largest.Loser Gross.Profits Gross.Losses Std.Dev.Trade.PL Std.Err.Trade.PL Percent.Positive Percent.Negative Profit.Factor Avg.Win.Trade Med.Win.Trade Avg.Losing.Trade Med.Losing.Trade Avg.Daily.PL Med.Daily.PL Std.Dev.Daily.PL Std.Err.Daily.PL Ann.Sharpe Max.Drawdown Profit.To.Max.Draw Avg.WinLoss.Ratio Med.WinLoss.Ratio Max.Equity Min.Equity End.Equity
EEM wf.macd EEM 19 8 -3577.657 -447.2072 -304.0869 32.6882 -1056.624 32.6882 -3610.345 383.9985 135.764 12.5 87.5 0.009054 32.6882 32.6882 -515.7636 -349.3854 -447.2072 -304.0869 383.9985 135.764 -18.48756 -4200.986 -0.8516232 0.0633782 0.0935591 249.2224 -3951.764 -3577.657

Walk Forward Results (cont.)

chart.forward(wfaresults)

ADD WFA OOS STATS HERE

testing.timespan nFAST nSLOW
533 2007-11-01/2008-10-31 15 60
325 2008-11-03/2009-10-30 15 44
195 2009-11-02/2010-10-29 15 34
13 2010-11-01/2011-10-31 15 20
208 2011-11-01/2012-10-31 15 35
273 2012-11-01/2013-10-31 15 40
26 2013-11-01/2014-10-31 15 21
364 2014-11-03/2015-10-30 15 47
312 2015-11-02/2016-10-31 15 43
299 2016-11-01/2017-10-31 15 42
131 2017-11-01/2018-05-30 15 20
  • testing period OOS ranks, w/ chosen paramset
  • OOS deterioration for best/chosen paramset

Risk of Ruin

Strong hypotheses guard against risk of ruin.

I hypothesize that this strategy idea will make money.

Specifying hypotheses at the beginning reduces the urge to modify them later and: - adjust expectations while testing - revise the objectives - construct ad hoc hypotheses

seek to answer what and why before going too far

Future Work

  • more overfitting work
  • modular objective for walk forward and parameter optimization
  • optimizer for parameter optimization and walk forward objectives
  • more machine learning examples

we always have more work than time, so please talk to us if you want to work on these

Conclusion

  • backtesting is a process of trial and error (mostly error)
  • build a process that is grounded in stating and testing hypotheses
  • trials should be kept track of, and counted against your eventual success
  • multiple adjustments exist, examine

    • as many as you have data for
    • the ones that make sense for your trade
    • methods that can help you answer questions
  • stay skeptical of your results

Thank You for Your Attention

 

Thanks to all the contributors to quantstrat and blotter, especially Ross Bennett, Peter Carl, Jasen Mackie, Joshua Ulrich, my team, and my family, who make it possible.

©2018 Brian G. Peterson brian@braverock.com

This work is licensed under a Creative Commons Attribution 4.0 International License

The rmarkdown (Xie 2014) source code for this document may be found on github

#> prepared using blotter: 0.14.2  and quantstrat: 0.14.3

All views expressed in this presentation are those of Brian Peterson, and do not necessarily reflect the opinions, policies, or practices of Brian's employers.

All remaining errors or omissions should be attributed to the author.

References

References

Aronson, David. 2006. Evidence-Based Technical Analysis: Applying the Scientific Method and Statistical Inference to Trading Signals. Wiley.

Bailey, David H, and Marcos López de Prado. 2014. “The Deflated Sharpe Ratio: Correcting for Selection Bias, Backtest Overfitting and Non-Normality.” Journal of Portfolio Management 40 (5): 94–107. doi:10.3905/jpm.2014.40.5.094.

Bailey, David H, Jonathan M Borwein, Marcos López de Prado, and Qiji Jim Zhu. 2017. “The Probability of Backtest Overfitting” 20 (4): 39–69. doi:10.21314/JCF.2016.322.

Feynman, Richard P, Robert B Leighton, Matthew Sands, and EM Hafner. 1965. The Feynman Lectures on Physics. Vols. 1-3.

Harvey, Campbell R., and Yan Liu. 2015. “Backtesting.” The Journal of Portfolio Management 41 (1): 13–28. http://ssrn.com/abstract=2345489.

Hastie, Trevor, Robert Tibshirani, and Jerome Friedman. 2009. The Elements of Statistical Learning: Data Mining, Inference, and Prediction. Second Edition. Springer.

Kipnis, Ilya. 2017. “QuantstratTrader Blog.” July 16. https://quantstrattrader.wordpress.com/.

Mackie, Jasen. 2016. “Backtesting – Harvey & Liu (2015).” November 17. https://opensourcequant.wordpress.com/2016/11/17/r-view-backtesting-harvey-liu-2015/.

Pardo, Robert. 2008. The Evaluation and Optimization of Trading Strategies. Second ed. John Wiley & Sons.

Racine, Jeffrey S, and Christopher F Parmeter. 2012. “Data-Driven Model Evaluation: A Test for Revealed Performance.” McMaster University. http://socserv.mcmaster.ca/econ/rsrch/papers/archive/2012-13.pdf.

Tomasini, Emilio, and Urban Jaekle. 2009. Trading Systems: A New Approach to System Development and Portfolio Optimisation.

Tukey, John W. 1962. “The Future of Data Analysis.” The Annals of Mathematical Statistics. JSTOR, 1–67. http://projecteuclid.org/euclid.aoms/1177704711.

Vince, Ralph. 2009. The Leverage Space Trading Model: Reconciling Portfolio Management Strategies and Economic Theory. John Wiley & Sons.

White, Halbert L. 2000. “System and Method for Testing Prediction Models and/or Entities.” Google Patents. http://www.google.com/patents/US6088676.

Xie, Yihui. 2014. “R Markdown — Dynamic Documents for R.” http://rmarkdown.rstudio.com/.