Quantitative Strategy Evaluation with quantstrat/blotter

compiled 2018-05-31

Introduction

Who is this Guy ?

Proprietary/Principal Trading

Brian Peterson:

quant, author, open source advocate
author or co-author of over 10 packages for using R in Finance
organization admin for R's participation in Google Summer of Code
Lecturer, University Of Washington Computational Finance and Risk Management
manage quant trading teams at several Chicago proprietary trading firms over time …

Proprietary Trading:

proprietary or principal traders are a specific "member" structure with the exchanges
high barriers to entry, large up-front capital requirements
many strategies pursued in this structure have capacity constraints
benefits on the other side are low fees, potentially high leverage
money management needs to be very focused on drawdowns, leverage, volatility

Backtesting, art or science?

Back-testing. I hate it - it's just optimizing over history. You never see a bad back-test. Ever. In any strategy. - Josh Diedesch (2014), CalSTRS

Every trading system is in some form an optimization. - Emilio Tomasini (2009)

Moving Beyond Assumptions

Many system developers consider

"I hypothesize that this strategy idea will make money"

to be adequate.

Instead, strive to:

understand your business constraints and objectives
build a hypothesis for the system
build the system in pieces
test the system in pieces
measure how likely it is that you have overfit

Constraints and Objectives

Constraints

capital available
products you can trade
execution platform

Benchmarks

published or synthetic?
what are the limitations?
are you held to it, or just measured against it?

Objectives

formulate objectives for testability
make sure they reflect your real business goals

Building a Hypothesis

To create a testable idea (a hypothesis):

formulate a declarative conjecture
make sure the conjecture is predictive
define the expected outcome
describe means of verifying/testing

good/complete Hypothesis Statements include:

what is being analyzed (the subject),
dependent variable(s) (the result/prediction)
independent variables (inputs to the model)
the anticipated possible outcomes, including direction or comparison
addresses how you will validate or refute each hypothesis

Tools

R in Finance trade simulation toolchain

Building Blocks

Filters

select the instruments to trade
categorize market characteristics that are favorable to the strategy

Indicators

values derived from market data
includes all common "technicals"

Signals

describe the interaction between filters, market data, and indicators
can be viewed as a prediction at a point in time

Rules

make path-dependent actionable decisions

Installing blotter and quantstrat

on Windows, you will need Rtools

install.packages('devtools') # if you don't have it installed
install.packages('PerformanceAnalytics')
install.packages('FinancialInstrument')

devtools::install_github('braverock/blotter')
devtools::install_github('braverock/quantstrat')

Our test strategy - MACD

stock.str <- 'EEM'

currency('USD')
stock(stock.str,currency='USD',multiplier=1)

startDate='2003-12-31'
initEq=100000
portfolio.st='macd'
account.st='macd'

initPortf(portfolio.st,symbols=stock.str)
initAcct(account.st,portfolios=portfolio.st,initEq = initEq)
initOrders(portfolio=portfolio.st)

strategy.st<-portfolio.st
# define the strategy
strategy(strategy.st, store=TRUE)

## get data 
getSymbols(stock.str,
           from=startDate,
           adjust=TRUE,
           src='tiingo')

we'll use MACD as a simple trend follower for illustration
I am not advocating MACD as an investment strategy
we hypothesize that our MACD strategy will detect durable trends
we also hypothesize that it will get chopped up by sideways markets because of the lag in the moving average

don't pay a lot of attention to the code, this entire presentation is written using rmarkdown (Xie 2014), with references in BibTeX via JabRef and has been compiled for this presentation

Evaluating the Strategy

Test the System in Pieces

How to Screw Up Less

Far better an approximate answer to the right question, which is often vague, than an exact answer to the wrong question, which can always be made precise. - John Tukey (1962) p. 13

Fail quickly, think deeply, or both?

No matter how beautiful your theory, no matter how clever you are or what your name is, if it disagrees with experiment, it’s wrong. - Richard P. Feynman (1965)

Add the indicator

#MA parameters for MACD
fastMA = 12 
slowMA = 26 
signalMA = 9
maType="EMA"

#one indicator
add.indicator(strategy.st, name = "MACD", 
                  arguments = list(x=quote(Cl(mktdata)),
                                   nFast=fastMA, 
                                   nSlow=slowMA),
                  label='_' 
)

MACD is a two moving average cross system that seeks to measure:

the momentum of the change
the divergence between the two moving averages

Classical technical analysis, for example only, not widely deployed in production

Measuring Indicators

What do you think you're measuring? A good indicator measures something in the market:

a theoretical "fair value" price, or
the impact of a factor on that price, or
turning points, direction, or slope of the series

Make sure the indicator is testable:

hypothesis and tests for the indicator
- standard errors and goodness of fit
- t-tests or p-value
goodness of fit
custom 'perfect foresight' model built from a periodogram or signature plot

If your indicator doesn't have testable information content, throw it out and start over.

Specifying tests for Indicators

facts to support or refute
general tests
- MSFE/MSFE
- confusion matrix
- standard errors
- Monte Carlo errors
specific tests
- tests related to the model
- tests of the prediction
- tests of the relationships between variables

General Diagnostics for Indicators

Euclidean Distance
often squared
or de-meaned
or lagged to line things up
clustering models
piece-wise linear decomposition
mean squared forecast error

Add the Signals

#two signals
add.signal(strategy.st,
           name="sigThreshold",
           arguments = list(column="signal._",
                            relationship="gt",
                            threshold=0,
                            cross=TRUE),
           label="signal.gt.zero"
)
   
add.signal(strategy.st,
           name="sigThreshold",
           arguments = list(column="signal._",
                            relationship="lt",
                            threshold=0,
                            cross=TRUE),
           label="signal.lt.zero"
)

Combining Signals

Signals are often combined:

"\(A\) & \(B\)" should both be true.

This is a composite signal, and serves to reduce the dimensionality of the decision space.

A lower dimensioned space is easier to measure, but is at higher risk of overfitting.

Avoid overfitting while combining signals by making sure that your process has a strong economic or theoretical basis before writing code or running tests

Measuring Signals

Signals make predictions so all the literature on forecasting is applicable:

mean squared forecast error, BIC, etc.
box plots or additive models for forward expectations
"revealed performance" approach of Racine and Parmeter (2012)
re-evaluate assumptions about the method of action of the strategy
detect information bias or luck before moving on

add.distribution(strategy.st,
                 paramset.label = 'signal_analysis',
                 component.type = 'indicator',
                 component.label = '_', 
                 variable = list(n = fastMA),
                 label = 'nFAST'
)
#> [1] "macd"

add.distribution(strategy.st,
                 paramset.label = 'signal_analysis',
                 component.type = 'indicator',
                 component.label = '_', 
                 variable = list(n = slowMA),
                 label = 'nSLOW'
)
#> [1] "macd"

Run Signal Analysis Study

sa_buy <- apply.paramset.signal.analysis(
            strategy.st, 
            paramset.label='signal_analysis', 
            portfolio.st=portfolio.st, 
            sigcol = 'signal.gt.zero',
            sigval = 1,
            on=NULL,
            forward.days=50,
            cum.sum=TRUE,
            include.day.of.signal=FALSE,
            obj.fun=signal.obj.slope,
            decreasing=TRUE,
            verbose=TRUE)
#> Applying Parameter Set:  12, 26

sa_sell <- apply.paramset.signal.analysis(
             strategy.st, 
             paramset.label='signal_analysis', 
             portfolio.st=portfolio.st, 
             sigcol = 'signal.lt.zero',
             sigval = 1,
             on=NULL,
             forward.days=10,
             cum.sum=TRUE,
             include.day.of.signal=FALSE,
             obj.fun=signal.obj.slope,
             decreasing=TRUE,
             verbose=TRUE)
#> Applying Parameter Set:  12, 26

Look at Buy Signal

Look at Buy Signal (cont.)

Look at Sell Signal

Add the Rules

# entry
add.rule(strategy.st,
         name='ruleSignal', 
         arguments = list(sigcol="signal.gt.zero",
                          sigval=TRUE, 
                          orderqty=100, 
                          ordertype='market', 
                          orderside='long', 
                          threshold=NULL),
         type='enter',
         label='enter',
         storefun=FALSE
)

# exit
add.rule(strategy.st,name='ruleSignal', 
         arguments = list(sigcol="signal.lt.zero",
                          sigval=TRUE, 
                          orderqty='all', 
                          ordertype='market', 
                          orderside='long', 
                          threshold=NULL,
                          orderset='exit2'),
         type='exit',
         label='exit'
)

Measuring Rules

If your signal process doesn't have predictive power, stop now.

rules should refine the way the strategy 'listens' to signals
entries may be passive or aggressive, or may level or pyramid into a position
exits may have their own signal process, or may be derived empirically
risk rules should be added near the end, for empirical 'stops' or to meet business constraints

Beware of Rule Burden:

having too many rules is an invitation to overfitting
adding rules after being disappointed in backtest results is almost certainly an exercise in overfitting (data snooping)
strategies with fewer rules are more likely to be robust out of sample

Run the Strategy

start_t<-Sys.time()
out<-applyStrategy(strategy.st , 
                   portfolios=portfolio.st,
                   parameters=list(nFast=fastMA, 
                                   nSlow=slowMA,
                                   nSig=signalMA,
                                   maType=maType),
                   verbose=TRUE)
end_t<-Sys.time()

start_pt<-Sys.time()
updatePortf(Portfolio=portfolio.st)
end_pt<-Sys.time()

#> [1] "Running the backtest (applyStrategy):"
#> Time difference of 0.7516775 secs
#> [1] "trade blotter portfolio update (updatePortf):"
#> Time difference of 0.08920383 secs

Initial Results

chart.Posn(Portfolio=portfolio.st,Symbol=stock.str)
plot(add_MACD(fast=fastMA, slow=slowMA, signal=signalMA,maType="EMA"))

Parameter Optimization

Every trading system is in some form an optimization. - Tomasini and Jaekle (2009)

all strategies have parameters: What are the right ones?
locate a parameter combination that most closely matches the hypotheses and objectives
look for stable regions of both in and out of sample performance
even your initial parameter choices are an optimization, you've chosen values you believe may be optimal
parameter optimization just adds process and tools to your investment decision

What are good parameters?

parameters are all the non-data inputs to your strategy
they are the knobs and levers that control the model
good parameters are parsimonious
- not too many of them
- not too loosely defined
- each has a clear impact on the strategy performance
- each is testable in a backtest
production strategies have additional parameters that are specific to the production environment

Limiting the number of parameters

focus on the major drivers defined by your hypotheses
if necessary, refine the hypotheses
use ROC or effective parameter testing (Hastie, Tibshirani, and Friedman 2009) to identify the major drivers in a small training and testing set
limiting free parameters
- limits the opportunities to cherry pick lucky combinations
- reduces the required adjustment for data mining bias (Bailey and López de Prado 2014, Harvey and Liu (2015))
- reduces number of trials (resulting in faster parameter searches)

Too Many Free Parameters

increases your chance of overfitting
larger numbers of free parameters vastly increase the amount of data you need
more parameters lowers your degrees of freedom, and
- increases the chance that your chosen parameters are a false discovery
goal should be to eliminate free parameters before running parameter optimization

Moving from Free to Non-Free Parameters

eliminate parameters where possible
fix/freeze tertiary parameters next
can you use production data to set less important parameters empirically?
is there a model fitting method you could use instead? ( e.g. regression, maximum likelihood, Bayes)
does this still count as a free parameter?
- it depends on the complexity of the model/approach used to fit
- does your fitting model have parameters too?
- what assumptions are you making?

Robust Parameters

small parameter changes lead to small changes in P&L and objective expectations
out of sample deterioration is not large, on average
parameter choices have a sound theoretical or economic basis
parameter variation should produce correlated differences in multiple objectives

quantstrat Parameters

quantstrat parameters are added via add.distribution
- they are identified with a parameter set (paramset) label via the paramset.label argument
- the component.type argument tells quantstrat where to look for a variable (inicators, signals, rules, etc.)
- the component.label argument must match the label used for the component earlier in the strategy
- the variable argument tells quantstrat what to act on
relationships (constraints) between parameters are set via add.distribution.constraint

quantstrat::add.distribution()

.FastMA = (3:15)
.SlowMA = (20:60)
# .nsamples = 200 
#for random parameter sampling, 
# less important if you're using doParallel or doMC

### MA paramset
add.distribution(strategy.st,
                 paramset.label = 'MA',
                 component.type = 'indicator',
                 component.label = '_', #this is the label given to the indicator in the strat
                 variable = list(n = .FastMA),
                 label = 'nFAST'
)

add.distribution(strategy.st,
                 paramset.label = 'MA',
                 component.type = 'indicator',
                 component.label = '_', #this is the label given to the indicator in the strat
                 variable = list(n = .SlowMA),
                 label = 'nSLOW'
)

add.distribution.constraint(strategy.st,
                            paramset.label = 'MA',
                            distribution.label.1 = 'nFAST',
                            distribution.label.2 = 'nSLOW',
                            operator = '<',
                            label = 'MA'
)

quantstrat::apply.paramset()

creates param.combos from distributions and constraints
runs applyStrategy() on portfolio for each param. combo
nsamples: draws random selection
audit: file name to store all portfolios and order books
this takes over an hour on my laptop, and less than two minutes on my workstation

.paramaudit <- new.env()
ps_start <- Sys.time()
paramset.results  <- apply.paramset(strategy.st, 
                           paramset.label='MA', 
                           portfolio.st=portfolio.st, 
                           account.st=account.st, 
#                          nsamples=.nsamples,
                           audit=.paramaudit,
                           store=TRUE,
                           verbose=FALSE)
ps_end   <- Sys.time()

paramset results

Running the parameter search (apply.paramset): Time difference of 1.244973 mins Total trials: 543

plot(paramset.results$cumPL[-1,], major.ticks = 'years', grid.ticks.on = 'years')

Search Process

by default, quantstrat will use a brute force search by combining all parameters
since the full brute force search space may be too large, you can use the nsamples argument to apply.paramset to sample from the whole parameter space
more sophisticated sampling may be done using an optimizer (on my todo list to generalize)

Parameter Regions

Tomasini and Jaekle (2009) and Pardo (2008) both recommend finding stable parameter regions
ideally, those regions will be "obvious" from first principles
you can easily use plot to compare one parameter and one output, less easily 2-3 parameters
for more complex parameter surfaces, 3-d surfaces or heat maps (e.g. via contourplot) are better

Parameter distributions - Profit to Max Drawdown

Overfitting

Things to Watch Out For, or, Types of Overfitting

Look Ahead Bias

directly using knowledge of future events

Data Mining Bias

caused by testing multiple configurations and parameters over multiple runs, with adjustments between backtest runs
exhaustive searches may or may not introduce biases

Data Snooping

knowledge of the data set can contaminate your choices
making changes after failures without having strong experimental design

NOTE: We just did all three of these things by optimizing over the entire series

Degrees of Freedom

Pardo (2008, 130–31) describes the degrees of freedom of a strategy as:

the number of of observations
minus all observations used by your indicators

In parameter optimization, we should consider the sum of observations used by all different parameter combinations.

Goal should be to have 95% or more free parameters or 'free observations' even after parameter search.

Applying Degrees of Freedom calculation

degrees.of.freedom(strategy = 'macd', portfolios = 'macd', paramset.method='trial')
#> 
#>  Degrees of freedom report for strategy: macd 
#>  Total market observations: 3628 
#>  Degrees of freedom consumed by strategy: 1248 
#>  Total degrees of freedom remaining: 2380 
#>  % Degrees of Freedom:  65.6 %
degrees.of.freedom(strategy = 'macd', portfolios = 'macd', paramset.method='sum')
#> 
#>  Degrees of freedom report for strategy: macd 
#>  Total market observations: 3628 
#>  Degrees of freedom consumed by strategy: 26204 
#>  Total degrees of freedom remaining: -22576 
#>  % Degrees of Freedom:  -622.27 %

there is a lot more information in the returned object, see ?dof for more details

Implications for Torture and Training sets

to increase degrees of freedom, you may:
- increase the length of market data consumed by the backtest
- increase the number of symbols examined
- decrease the number of free parameters
- decrease the ranges of the parameters examined
torture and training data sets should be large enough to still have reasonable statistical confidence when you move to walk forward
this underlines the difficulty of doing extensive parameter searches on low frequency data

Multiple testing bias

Investment theory, not computational power, should motivate what experiments are worth conducting. (Bailey and López de Prado 2014, 10)

we do want to test multiple hypotheses
as we perform more tests, the likelihood of "discovering" an apparently significant result which is actually random increases
as we perform more tests, this likelihood approaches 100%
we need to be prepared to detect and correct for this multiple testing bias

Deflated Sharpe

Bailey and López de Prado (2014) describes a way of adjusting the observed Sharpe Ratio of a candidate strategy by taking the variance of the trials and the skewness and kurtosis into account.

corrects for selection bias from multiple testing on the same or related data (the parameter optimization)
adjusts for non-normality of observed returns
establishes theoretical maximum Sharpe for a series of related trials
we have implemented a version with Kipnis (2017) as SharpeRatio.deflated

Applying Deflated Sharpe Ratio

dsr <- SharpeRatio.deflated(portfolios='macd',strategy='macd', audit=.paramaudit)

obs.Sharpe	max.Sharpe	deflated.Sharpe	p.value	nTrials
0.57	3.077243	0.5177464	0.091673	543

small numbers of trials will result in a small adjustment
maximum computed Sharpe ratio relies on correlation of trials and moments of the distribution
p-value is the assumed significance of the adjusted backtest, so low p-value suggests a low probability of overfitting

Haircut Sharpe Ratio

Harvey and Liu (2015) proposes another method of correcting for the multiple testing bias
they propose three methods of adjusting the Sharpe Ratio for multiple related tests
this research was mostly aimed at the proliferation of equity 'risk premia', but it is applicable to backtests where we have all the information as well
Jasen Mackie (2016) implemented these methods in R, and
we have ported those methods to the quantstrat in SharpeRatio.haircut

Applying the Sharpe Ratio Haircut

hsr <- SharpeRatio.haircut(portfolios='macd',strategy='macd',audit=.paramaudit)
#> Warning in log(x): NaNs produced

#> 
#>  Sharpe Ratio Haircut Report: 
#> 
#>  Observed Sharpe Ratio: 0.57 
#>  Sharpe Ratio corrected for autocorrelation: 0.57 
#> 
#>  Bonferroni Adjustment: 
#>  Adjusted P-value = 1 
#>  Haircut Sharpe Ratio = 0 
#>  Percentage Haircut = 1 
#> 
#>  Holm Adjustment: 
#>  Adjusted P-value = 1 
#>  Haircut Sharpe Ratio = 0 
#>  Percentage Haircut = 1 
#> 
#>  BHY Adjustment: 
#>  Adjusted P-value = 0.9981303 
#>  Haircut Sharpe Ratio = 0.0007425807 
#>  Percentage Haircut = 0.9986972 
#> 
#>  Average Adjustment: 
#>  Adjusted P-value = 0.9993768 
#>  Haircut Sharpe Ratio = 0.0002475267 
#>  Percentage Haircut = 0.9995657

Monte Carlo and the bootstrap

Sampling from limited information

estimate the 'true' properties of a distribution from incomplete information
evaluate the likelihood (test the hypothesis) that a particular result is
- not the result of chance
- not overfit
understand confidence intervals for other descriptive statistics on the backtest
simulate different paths that the results might have taken, if the ordering had been different

History of Monte Carlo and bootstrap simulation

Laplace was the first to describe the mathematical properties of sampling from a distribution
Mahalanobis extended this work in the 1930's to describe sampling from dependent distributions, and anticipated the block bootstrap by examining these dependencies
Monte Carlo simulation was developed by Stan Ulam and John von Neumann (with computation by Françoise Ulam) as part of the hydrogen bomb program in 1946 (Richard Rhodes, Dark Sun, p.304)
computational implementation of Monte Carlo simulation was constructed by Nicholas Metropolis on the ENIAC and MANIAC machines
Metropolis was an author in 1953 of the prior distribution sampler extended by W.K Hastings to the modern Metropolis-Hastings form in 1970
Maurice Quenouille & John Tukey created 'jackknife' simulation in the 1950's
Bradley Efron described the modern bootstrap in 1979

Simulation from the equity curve using daily P&L

Sampling Without replacement:

results have same mean and final P&L, allows inference on likely error bounds

Sampling With Replacement:

provides multiple paths; block sampling, with replacement
- mimics some of the autocorrelation structure of returns
- may create deeper drawdowns if down streaks are effectively repeated
choosing block size
- some multiple of average holding period, 1/5 or 1/4 holding period is a good guess
- block size equal to observed significant autocorrelation
- variable distribution of block size centered around one of the above, with tails

Disadvantages of Sampling from portfolio P&L:

not transparent
potentially unrealistic
really only a statistical confidence model
path won't line up with historical market regimes

Empirical Example, with replacement

rsim <- mcsim(  Portfolio = "macd"
               , Account = "macd"
               , n=1000
               , replacement=TRUE
               , l=1, gap=10)
rblocksim <-  mcsim(  Portfolio = "macd"
               , Account = "macd"
               , n=1000
               , replacement=TRUE
               , l=10, gap=10)

P&L Quantiles:

0%	25%	50%	75%	100%
-0.04	0.0039	0.014	0.027	0.095

0%	25%	50%	75%	100%
-0.04	0.0038	0.014	0.027	0.091

Empirical Example, With replacement, cont.

INSERT CSCV/PBO HERE

other bootstrapping methods

White’s Data Mining Reality Check from White (2000) http://www.cristiandima.com/white-s-reality-check-for-data-snooping-in-r/
bootstrap optimization as an option in LSTM (Vince 2009)
discussion in Aronson (2006, 230–40)
using resampled market data, with or without multi-asset dependence, to train or run the system
k-fold cross validation
combinatorially symmetric cross-validation (CSCV) and probability of backtest overfitting (PBO) from Bailey et al. (2017)

Simulation with round turn trades

resampling entries and exits
- round turn size, direction, duration are sampled from the trades
- also resample from any flat periods
- applied in order to market data as new transactions at the then-prevalent price
trade expectations in the random-trade model, compared to backtest expectations
- Drawdowns and tail risk, as in other simulation types

Dis/advantages of bootstrapping trades

Disadvantages:

much more complicated to model trade dynamics
maintaining constraints e.g. max position

Advantages:

can more closely compare strategy to random entries and exits with same overall dynamic
creates a distribution around the trading dynamics, not just the daily P&L
effectively creates simulated traders with the same style as strategy but no skill
best for modeling "skill vs. luck"

Outline of Trade Resampling Process

Extract Stylized Facts from the observed series:

duration, quantity, direction of round turns
total time in market
%-time long/short/flat
number of layers and maximum position

For each replicate:

sample, w/ or w/o replacement, from long/short/flat trades
construct the first/base layer for the replicate
first layer preserves long/short/flat ratios
add long/short layers on top of long/short periods

For the collections of start/qty/duration:

construct portfolios for all replicates
construct opening/closing transactions from start/qty/duration collections
apply those transactions to each replicate portfolio

Empirical Example

# nrtxsim <- txnsim( Portfolio = "macd"
#                  , n=250
#                  , replacement=FALSE
#                  , tradeDef = 'increased.to.reduced')

wrtxsim <- txnsim( Portfolio = "macd"
                 , n=250
                 , replacement=TRUE
                 , tradeDef = 'increased.to.reduced')

Comments:

without replacement samples identical number of trades, randomizing start date
with replacement samples number of trades to get correct total duration
entry and exit prices are set at the time of each trade, from current market

Empirical Example, With replacement

P&L Quantiles:

0%	25%	50%	75%	100%
-3041	208	914	1746	6283

Overfitting summary

the best way to prevent overfitting is via careful experiment design
if your signal process isn't predictive, you should never get to the point of constructing a full backtest
avoid free parameters where possible
use as much data, and as many instruments, as you can

Walk Forward

walk forward analysis periodically reparameterizes the strategy
in production, this could be done daily, weekly, monthly, or quarterly
rolling window analysis is the most common, assumes evolving parameter space
anchored analysis assumes that the residual information from earlier periods helps you make decisions now

Applying Walk Forward

apply parameter optimization via walk.forward
consider choice of objective
- Sharpe ratio
- minimize drawdown
- profit to max draw
- multiple objective optimization
- default choice of best in-sample generally a bad idea
be careful about performing walk forward analysis then making changes
more trials increases bias

quantstrat::walk.forward

wfportfolio <- "wf.macd"
initPortf(wfportfolio,symbols=stock.str)
#> [1] "wf.macd"
initOrders(portfolio=wfportfolio)
wf_start <- Sys.time()
wfaresults <- walk.forward(strategy.st, 
                           paramset.label='MA', 
                           portfolio.st=wfportfolio, 
                           account.st=account.st, 
#                           nsamples=100,
                           period='months',
                           k.training = 48,
                           k.testing = 12,
                           verbose = FALSE,
                           anchored = FALSE,
                           audit.prefix = NULL,
                           savewf = FALSE,
                           include.insamples = TRUE,
                           psgc=TRUE
                          )
#> [1] "=== training MA on 2003-12-31/2007-10-31"
#> [1] "=== testing param.combo 533 on 2007-11-01/2008-10-31"
#>     nFAST nSLOW
#> 533    15    60
#> [1] "2008-04-29 00:00:00 EEM 100 @ 40.1341581132"
#> [1] "=== training MA on 2004-11-01/2008-10-31"
#> [1] "=== testing param.combo 325 on 2008-11-03/2009-10-30"
#>     nFAST nSLOW
#> 325    15    44
#> [1] "=== training MA on 2005-11-01/2009-10-30"
#> [1] "=== testing param.combo 195 on 2009-11-02/2010-10-29"
#>     nFAST nSLOW
#> 195    15    34
#> [1] "2010-02-08 00:00:00 EEM -100 @ 31.7488855213"
#> [1] "2010-03-19 00:00:00 EEM 100 @ 35.694013308"
#> [1] "2010-05-18 00:00:00 EEM -100 @ 33.8495379792"
#> [1] "=== training MA on 2006-11-01/2010-10-29"
#> [1] "=== testing param.combo 13 on 2010-11-01/2011-10-31"
#>    nFAST nSLOW
#> 13    15    20
#> [1] "2011-03-31 00:00:00 EEM 100 @ 42.0677276172"
#> [1] "2011-05-24 00:00:00 EEM -100 @ 40.3084056753"
#> [1] "2011-07-25 00:00:00 EEM 100 @ 41.4420713187"
#> [1] "=== training MA on 2007-11-01/2011-10-31"
#> [1] "=== testing param.combo 208 on 2011-11-01/2012-10-31"
#>     nFAST nSLOW
#> 208    15    35
#> [1] "2011-11-15 00:00:00 EEM 100 @ 35.1741112291"
#> [1] "2011-11-28 00:00:00 EEM -200 @ 33.0249712822"
#> [1] "2012-01-23 00:00:00 EEM 100 @ 36.7344639248"
#> [1] "2012-04-25 00:00:00 EEM -100 @ 37.0613458789"
#> [1] "=== training MA on 2008-11-03/2012-10-31"
#> [1] "=== testing param.combo 273 on 2012-11-01/2013-10-31"
#>     nFAST nSLOW
#> 273    15    40
#> [1] "2013-05-16 00:00:00 EEM 100 @ 39.0432867026"
#> [1] "2013-06-06 00:00:00 EEM -100 @ 36.4554031459"
#> [1] "=== training MA on 2009-11-02/2013-10-31"
#> [1] "=== testing param.combo 26 on 2013-11-01/2014-10-31"
#>    nFAST nSLOW
#> 26    15    21
#> [1] "2014-04-01 00:00:00 EEM 100 @ 38.0793609613"
#> [1] "=== training MA on 2010-11-01/2014-10-31"
#> [1] "=== testing param.combo 364 on 2014-11-03/2015-10-30"
#>     nFAST nSLOW
#> 364    15    47
#> [1] "2015-02-26 00:00:00 EEM 100 @ 38.364624064"
#> [1] "2015-03-17 00:00:00 EEM -200 @ 36.4750652703"
#> [1] "2015-04-09 00:00:00 EEM 100 @ 40.1225717974"
#> [1] "=== training MA on 2011-11-01/2015-10-30"
#> [1] "=== testing param.combo 312 on 2015-11-02/2016-10-31"
#>     nFAST nSLOW
#> 312    15    43
#> [1] "2015-11-03 00:00:00 EEM 100 @ 33.5843280749"
#> [1] "2015-11-23 00:00:00 EEM -200 @ 33.1202468353"
#> [1] "=== training MA on 2012-11-01/2016-10-31"
#> [1] "=== testing param.combo 299 on 2016-11-01/2017-10-31"
#>     nFAST nSLOW
#> 299    15    42
#> [1] "=== training MA on 2013-11-01/2017-10-31"
#> [1] "=== testing param.combo 13 on 2017-11-01/2018-05-30"
#>    nFAST nSLOW
#> 13    15    20
#> Warning in .updatePosPL(Portfolio = pname, Symbol = as.character(symbol), :
#> Could not parse //2018-05-30 as ISO8601 string, or one/bothends of the
#> range were outside the available prices: 2003-12-31/2018-05-30. Using all
#> data instead.
wf_end <-Sys.time()

Walk Forward Results

#> 
#>  Running the walk forward search: 
#> 
#> Time difference of 28.85891 mins
#>  Total trials: 6939

	Portfolio	Symbol	Num.Txns	Num.Trades	Net.Trading.PL	Avg.Trade.PL	Med.Trade.PL	Largest.Winner	Largest.Loser	Gross.Profits	Gross.Losses	Std.Dev.Trade.PL	Std.Err.Trade.PL	Percent.Positive	Percent.Negative	Profit.Factor	Avg.Win.Trade	Med.Win.Trade	Avg.Losing.Trade	Med.Losing.Trade	Avg.Daily.PL	Med.Daily.PL	Std.Dev.Daily.PL	Std.Err.Daily.PL	Ann.Sharpe	Max.Drawdown	Profit.To.Max.Draw	Avg.WinLoss.Ratio	Med.WinLoss.Ratio	Max.Equity	Min.Equity	End.Equity
EEM	wf.macd	EEM	19	8	-3577.657	-447.2072	-304.0869	32.6882	-1056.624	32.6882	-3610.345	383.9985	135.764	12.5	87.5	0.009054	32.6882	32.6882	-515.7636	-349.3854	-447.2072	-304.0869	383.9985	135.764	-18.48756	-4200.986	-0.8516232	0.0633782	0.0935591	249.2224	-3951.764	-3577.657

Walk Forward Results (cont.)

chart.forward(wfaresults)

ADD WFA OOS STATS HERE

	testing.timespan	nFAST	nSLOW
533	2007-11-01/2008-10-31	15	60
325	2008-11-03/2009-10-30	15	44
195	2009-11-02/2010-10-29	15	34
13	2010-11-01/2011-10-31	15	20
208	2011-11-01/2012-10-31	15	35
273	2012-11-01/2013-10-31	15	40
26	2013-11-01/2014-10-31	15	21
364	2014-11-03/2015-10-30	15	47
312	2015-11-02/2016-10-31	15	43
299	2016-11-01/2017-10-31	15	42
131	2017-11-01/2018-05-30	15	20

testing period OOS ranks, w/ chosen paramset
OOS deterioration for best/chosen paramset

Risk of Ruin

Strong hypotheses guard against risk of ruin.

I hypothesize that this strategy idea will make money.

Specifying hypotheses at the beginning reduces the urge to modify them later and: - adjust expectations while testing - revise the objectives - construct ad hoc hypotheses

seek to answer what and why before going too far

Future Work

more overfitting work
modular objective for walk forward and parameter optimization
optimizer for parameter optimization and walk forward objectives
more machine learning examples

we always have more work than time, so please talk to us if you want to work on these

Conclusion

backtesting is a process of trial and error (mostly error)
build a process that is grounded in stating and testing hypotheses
trials should be kept track of, and counted against your eventual success
multiple adjustments exist, examine
- as many as you have data for
- the ones that make sense for your trade
- methods that can help you answer questions
stay skeptical of your results

Thank You for Your Attention

Thanks to all the contributors to quantstrat and blotter, especially Ross Bennett, Peter Carl, Jasen Mackie, Joshua Ulrich, my team, and my family, who make it possible.

This work is licensed under a Creative Commons Attribution 4.0 International License

The rmarkdown (Xie 2014) source code for this document may be found on github

#> prepared using blotter: 0.14.2  and quantstrat: 0.14.3

All views expressed in this presentation are those of Brian Peterson, and do not necessarily reflect the opinions, policies, or practices of Brian's employers.

All remaining errors or omissions should be attributed to the author.

References

Aronson, David. 2006. Evidence-Based Technical Analysis: Applying the Scientific Method and Statistical Inference to Trading Signals. Wiley.

Bailey, David H, and Marcos López de Prado. 2014. “The Deflated Sharpe Ratio: Correcting for Selection Bias, Backtest Overfitting and Non-Normality.” Journal of Portfolio Management 40 (5): 94–107. doi:10.3905/jpm.2014.40.5.094.

Bailey, David H, Jonathan M Borwein, Marcos López de Prado, and Qiji Jim Zhu. 2017. “The Probability of Backtest Overfitting” 20 (4): 39–69. doi:10.21314/JCF.2016.322.

Feynman, Richard P, Robert B Leighton, Matthew Sands, and EM Hafner. 1965. The Feynman Lectures on Physics. Vols. 1-3.

Harvey, Campbell R., and Yan Liu. 2015. “Backtesting.” The Journal of Portfolio Management 41 (1): 13–28. http://ssrn.com/abstract=2345489.

Hastie, Trevor, Robert Tibshirani, and Jerome Friedman. 2009. The Elements of Statistical Learning: Data Mining, Inference, and Prediction. Second Edition. Springer.

Kipnis, Ilya. 2017. “QuantstratTrader Blog.” July 16. https://quantstrattrader.wordpress.com/.

Mackie, Jasen. 2016. “Backtesting – Harvey & Liu (2015).” November 17. https://opensourcequant.wordpress.com/2016/11/17/r-view-backtesting-harvey-liu-2015/.

Pardo, Robert. 2008. The Evaluation and Optimization of Trading Strategies. Second ed. John Wiley & Sons.

Racine, Jeffrey S, and Christopher F Parmeter. 2012. “Data-Driven Model Evaluation: A Test for Revealed Performance.” McMaster University. http://socserv.mcmaster.ca/econ/rsrch/papers/archive/2012-13.pdf.

Tomasini, Emilio, and Urban Jaekle. 2009. Trading Systems: A New Approach to System Development and Portfolio Optimisation.

Tukey, John W. 1962. “The Future of Data Analysis.” The Annals of Mathematical Statistics. JSTOR, 1–67. http://projecteuclid.org/euclid.aoms/1177704711.

Vince, Ralph. 2009. The Leverage Space Trading Model: Reconciling Portfolio Management Strategies and Economic Theory. John Wiley & Sons.

White, Halbert L. 2000. “System and Method for Testing Prediction Models and/or Entities.” Google Patents. http://www.google.com/patents/US6088676.

Xie, Yihui. 2014. “R Markdown — Dynamic Documents for R.” http://rmarkdown.rstudio.com/.