compiled 2018-05-31
Brian Peterson:
Proprietary Trading:
Back-testing. I hate it - it's just optimizing over history. You never see a bad back-test. Ever. In any strategy. - Josh Diedesch (2014), CalSTRS
Every trading system is in some form an optimization. - Emilio Tomasini (2009)
Many system developers consider
"I hypothesize that this strategy idea will make money"
to be adequate.
Instead, strive to:
Constraints
Benchmarks
Objectives
To create a testable idea (a hypothesis):
good/complete Hypothesis Statements include:
Filters
Indicators
Signals
Rules
install.packages('devtools') # if you don't have it installed install.packages('PerformanceAnalytics') install.packages('FinancialInstrument') devtools::install_github('braverock/blotter') devtools::install_github('braverock/quantstrat')
stock.str <- 'EEM' currency('USD') stock(stock.str,currency='USD',multiplier=1) startDate='2003-12-31' initEq=100000 portfolio.st='macd' account.st='macd' initPortf(portfolio.st,symbols=stock.str) initAcct(account.st,portfolios=portfolio.st,initEq = initEq) initOrders(portfolio=portfolio.st) strategy.st<-portfolio.st # define the strategy strategy(strategy.st, store=TRUE)
## get data getSymbols(stock.str, from=startDate, adjust=TRUE, src='tiingo')
Far better an approximate answer to the right question, which is often vague, than an exact answer to the wrong question, which can always be made precise. - John Tukey (1962) p. 13
Fail quickly, think deeply, or both?
No matter how beautiful your theory, no matter how clever you are or what your name is, if it disagrees with experiment, it’s wrong. - Richard P. Feynman (1965)
#MA parameters for MACD fastMA = 12 slowMA = 26 signalMA = 9 maType="EMA" #one indicator add.indicator(strategy.st, name = "MACD", arguments = list(x=quote(Cl(mktdata)), nFast=fastMA, nSlow=slowMA), label='_' )
MACD is a two moving average cross system that seeks to measure:
Classical technical analysis, for example only, not widely deployed in production
What do you think you're measuring? A good indicator measures something in the market:
Make sure the indicator is testable:
If your indicator doesn't have testable information content, throw it out and start over.
facts to support or refute
#two signals add.signal(strategy.st, name="sigThreshold", arguments = list(column="signal._", relationship="gt", threshold=0, cross=TRUE), label="signal.gt.zero" ) add.signal(strategy.st, name="sigThreshold", arguments = list(column="signal._", relationship="lt", threshold=0, cross=TRUE), label="signal.lt.zero" )
Signals are often combined:
"\(A\) & \(B\)" should both be true.
This is a composite signal, and serves to reduce the dimensionality of the decision space.
A lower dimensioned space is easier to measure, but is at higher risk of overfitting.
Avoid overfitting while combining signals by making sure that your process has a strong economic or theoretical basis before writing code or running tests
Signals make predictions so all the literature on forecasting is applicable:
add.distribution(strategy.st, paramset.label = 'signal_analysis', component.type = 'indicator', component.label = '_', variable = list(n = fastMA), label = 'nFAST' ) #> [1] "macd" add.distribution(strategy.st, paramset.label = 'signal_analysis', component.type = 'indicator', component.label = '_', variable = list(n = slowMA), label = 'nSLOW' ) #> [1] "macd"
sa_buy <- apply.paramset.signal.analysis( strategy.st, paramset.label='signal_analysis', portfolio.st=portfolio.st, sigcol = 'signal.gt.zero', sigval = 1, on=NULL, forward.days=50, cum.sum=TRUE, include.day.of.signal=FALSE, obj.fun=signal.obj.slope, decreasing=TRUE, verbose=TRUE) #> Applying Parameter Set: 12, 26
sa_sell <- apply.paramset.signal.analysis( strategy.st, paramset.label='signal_analysis', portfolio.st=portfolio.st, sigcol = 'signal.lt.zero', sigval = 1, on=NULL, forward.days=10, cum.sum=TRUE, include.day.of.signal=FALSE, obj.fun=signal.obj.slope, decreasing=TRUE, verbose=TRUE) #> Applying Parameter Set: 12, 26
# entry add.rule(strategy.st, name='ruleSignal', arguments = list(sigcol="signal.gt.zero", sigval=TRUE, orderqty=100, ordertype='market', orderside='long', threshold=NULL), type='enter', label='enter', storefun=FALSE )
# exit add.rule(strategy.st,name='ruleSignal', arguments = list(sigcol="signal.lt.zero", sigval=TRUE, orderqty='all', ordertype='market', orderside='long', threshold=NULL, orderset='exit2'), type='exit', label='exit' )
If your signal process doesn't have predictive power, stop now.
Beware of Rule Burden:
start_t<-Sys.time() out<-applyStrategy(strategy.st , portfolios=portfolio.st, parameters=list(nFast=fastMA, nSlow=slowMA, nSig=signalMA, maType=maType), verbose=TRUE) end_t<-Sys.time() start_pt<-Sys.time() updatePortf(Portfolio=portfolio.st) end_pt<-Sys.time()
#> [1] "Running the backtest (applyStrategy):" #> Time difference of 0.7516775 secs #> [1] "trade blotter portfolio update (updatePortf):" #> Time difference of 0.08920383 secs
chart.Posn(Portfolio=portfolio.st,Symbol=stock.str) plot(add_MACD(fast=fastMA, slow=slowMA, signal=signalMA,maType="EMA"))
good parameters are parsimonious
production strategies have additional parameters that are specific to the production environment
limiting free parameters
more parameters lowers your degrees of freedom, and
goal should be to eliminate free parameters before running parameter optimization
does this still count as a free parameter?
quantstrat parameters are added via add.distribution
relationships (constraints) between parameters are set via add.distribution.constraint
.FastMA = (3:15) .SlowMA = (20:60) # .nsamples = 200 #for random parameter sampling, # less important if you're using doParallel or doMC ### MA paramset add.distribution(strategy.st, paramset.label = 'MA', component.type = 'indicator', component.label = '_', #this is the label given to the indicator in the strat variable = list(n = .FastMA), label = 'nFAST' ) add.distribution(strategy.st, paramset.label = 'MA', component.type = 'indicator', component.label = '_', #this is the label given to the indicator in the strat variable = list(n = .SlowMA), label = 'nSLOW' ) add.distribution.constraint(strategy.st, paramset.label = 'MA', distribution.label.1 = 'nFAST', distribution.label.2 = 'nSLOW', operator = '<', label = 'MA' )
.paramaudit <- new.env() ps_start <- Sys.time() paramset.results <- apply.paramset(strategy.st, paramset.label='MA', portfolio.st=portfolio.st, account.st=account.st, # nsamples=.nsamples, audit=.paramaudit, store=TRUE, verbose=FALSE) ps_end <- Sys.time()
Running the parameter search (apply.paramset): Time difference of 1.244973 mins Total trials: 543
plot(paramset.results$cumPL[-1,], major.ticks = 'years', grid.ticks.on = 'years')
Look Ahead Bias
Data Mining Bias
Data Snooping
Pardo (2008, 130–31) describes the degrees of freedom of a strategy as:
In parameter optimization, we should consider the sum of observations used by all different parameter combinations.
Goal should be to have 95% or more free parameters or 'free observations' even after parameter search.
degrees.of.freedom(strategy = 'macd', portfolios = 'macd', paramset.method='trial') #> #> Degrees of freedom report for strategy: macd #> Total market observations: 3628 #> Degrees of freedom consumed by strategy: 1248 #> Total degrees of freedom remaining: 2380 #> % Degrees of Freedom: 65.6 % degrees.of.freedom(strategy = 'macd', portfolios = 'macd', paramset.method='sum') #> #> Degrees of freedom report for strategy: macd #> Total market observations: 3628 #> Degrees of freedom consumed by strategy: 26204 #> Total degrees of freedom remaining: -22576 #> % Degrees of Freedom: -622.27 %
to increase degrees of freedom, you may:
this underlines the difficulty of doing extensive parameter searches on low frequency data
Bailey and López de Prado (2014) describes a way of adjusting the observed Sharpe Ratio of a candidate strategy by taking the variance of the trials and the skewness and kurtosis into account.
establishes theoretical maximum Sharpe for a series of related trials
we have implemented a version with Kipnis (2017) as SharpeRatio.deflated
dsr <- SharpeRatio.deflated(portfolios='macd',strategy='macd', audit=.paramaudit)
obs.Sharpe | max.Sharpe | deflated.Sharpe | p.value | nTrials |
---|---|---|---|---|
0.57 | 3.077243 | 0.5177464 | 0.091673 | 543 |
hsr <- SharpeRatio.haircut(portfolios='macd',strategy='macd',audit=.paramaudit) #> Warning in log(x): NaNs produced
#> #> Sharpe Ratio Haircut Report: #> #> Observed Sharpe Ratio: 0.57 #> Sharpe Ratio corrected for autocorrelation: 0.57 #> #> Bonferroni Adjustment: #> Adjusted P-value = 1 #> Haircut Sharpe Ratio = 0 #> Percentage Haircut = 1 #> #> Holm Adjustment: #> Adjusted P-value = 1 #> Haircut Sharpe Ratio = 0 #> Percentage Haircut = 1 #> #> BHY Adjustment: #> Adjusted P-value = 0.9981303 #> Haircut Sharpe Ratio = 0.0007425807 #> Percentage Haircut = 0.9986972 #> #> Average Adjustment: #> Adjusted P-value = 0.9993768 #> Haircut Sharpe Ratio = 0.0002475267 #> Percentage Haircut = 0.9995657
Sampling Without replacement:
Sampling With Replacement:
Disadvantages of Sampling from portfolio P&L:
rsim <- mcsim( Portfolio = "macd" , Account = "macd" , n=1000 , replacement=TRUE , l=1, gap=10) rblocksim <- mcsim( Portfolio = "macd" , Account = "macd" , n=1000 , replacement=TRUE , l=10, gap=10)
P&L Quantiles:
0% | 25% | 50% | 75% | 100% |
---|---|---|---|---|
-0.04 | 0.0039 | 0.014 | 0.027 | 0.095 |
0% | 25% | 50% | 75% | 100% |
---|---|---|---|---|
-0.04 | 0.0038 | 0.014 | 0.027 | 0.091 |
White’s Data Mining Reality Check from White (2000) http://www.cristiandima.com/white-s-reality-check-for-data-snooping-in-r/
bootstrap optimization as an option in LSTM (Vince 2009)
combinatorially symmetric cross-validation (CSCV) and probability of backtest overfitting (PBO) from Bailey et al. (2017)
Disadvantages:
Advantages:
effectively creates simulated traders with the same style as strategy but no skill
best for modeling "skill vs. luck"
Extract Stylized Facts from the observed series:
For each replicate:
For the collections of start/qty/duration:
# nrtxsim <- txnsim( Portfolio = "macd" # , n=250 # , replacement=FALSE # , tradeDef = 'increased.to.reduced') wrtxsim <- txnsim( Portfolio = "macd" , n=250 , replacement=TRUE , tradeDef = 'increased.to.reduced')
Comments:
P&L Quantiles:
0% | 25% | 50% | 75% | 100% |
---|---|---|---|---|
-3041 | 208 | 914 | 1746 | 6283 |
consider choice of objective
be careful about performing walk forward analysis then making changes
more trials increases bias
wfportfolio <- "wf.macd" initPortf(wfportfolio,symbols=stock.str) #> [1] "wf.macd" initOrders(portfolio=wfportfolio) wf_start <- Sys.time() wfaresults <- walk.forward(strategy.st, paramset.label='MA', portfolio.st=wfportfolio, account.st=account.st, # nsamples=100, period='months', k.training = 48, k.testing = 12, verbose = FALSE, anchored = FALSE, audit.prefix = NULL, savewf = FALSE, include.insamples = TRUE, psgc=TRUE ) #> [1] "=== training MA on 2003-12-31/2007-10-31" #> [1] "=== testing param.combo 533 on 2007-11-01/2008-10-31" #> nFAST nSLOW #> 533 15 60 #> [1] "2008-04-29 00:00:00 EEM 100 @ 40.1341581132" #> [1] "=== training MA on 2004-11-01/2008-10-31" #> [1] "=== testing param.combo 325 on 2008-11-03/2009-10-30" #> nFAST nSLOW #> 325 15 44 #> [1] "=== training MA on 2005-11-01/2009-10-30" #> [1] "=== testing param.combo 195 on 2009-11-02/2010-10-29" #> nFAST nSLOW #> 195 15 34 #> [1] "2010-02-08 00:00:00 EEM -100 @ 31.7488855213" #> [1] "2010-03-19 00:00:00 EEM 100 @ 35.694013308" #> [1] "2010-05-18 00:00:00 EEM -100 @ 33.8495379792" #> [1] "=== training MA on 2006-11-01/2010-10-29" #> [1] "=== testing param.combo 13 on 2010-11-01/2011-10-31" #> nFAST nSLOW #> 13 15 20 #> [1] "2011-03-31 00:00:00 EEM 100 @ 42.0677276172" #> [1] "2011-05-24 00:00:00 EEM -100 @ 40.3084056753" #> [1] "2011-07-25 00:00:00 EEM 100 @ 41.4420713187" #> [1] "=== training MA on 2007-11-01/2011-10-31" #> [1] "=== testing param.combo 208 on 2011-11-01/2012-10-31" #> nFAST nSLOW #> 208 15 35 #> [1] "2011-11-15 00:00:00 EEM 100 @ 35.1741112291" #> [1] "2011-11-28 00:00:00 EEM -200 @ 33.0249712822" #> [1] "2012-01-23 00:00:00 EEM 100 @ 36.7344639248" #> [1] "2012-04-25 00:00:00 EEM -100 @ 37.0613458789" #> [1] "=== training MA on 2008-11-03/2012-10-31" #> [1] "=== testing param.combo 273 on 2012-11-01/2013-10-31" #> nFAST nSLOW #> 273 15 40 #> [1] "2013-05-16 00:00:00 EEM 100 @ 39.0432867026" #> [1] "2013-06-06 00:00:00 EEM -100 @ 36.4554031459" #> [1] "=== training MA on 2009-11-02/2013-10-31" #> [1] "=== testing param.combo 26 on 2013-11-01/2014-10-31" #> nFAST nSLOW #> 26 15 21 #> [1] "2014-04-01 00:00:00 EEM 100 @ 38.0793609613" #> [1] "=== training MA on 2010-11-01/2014-10-31" #> [1] "=== testing param.combo 364 on 2014-11-03/2015-10-30" #> nFAST nSLOW #> 364 15 47 #> [1] "2015-02-26 00:00:00 EEM 100 @ 38.364624064" #> [1] "2015-03-17 00:00:00 EEM -200 @ 36.4750652703" #> [1] "2015-04-09 00:00:00 EEM 100 @ 40.1225717974" #> [1] "=== training MA on 2011-11-01/2015-10-30" #> [1] "=== testing param.combo 312 on 2015-11-02/2016-10-31" #> nFAST nSLOW #> 312 15 43 #> [1] "2015-11-03 00:00:00 EEM 100 @ 33.5843280749" #> [1] "2015-11-23 00:00:00 EEM -200 @ 33.1202468353" #> [1] "=== training MA on 2012-11-01/2016-10-31" #> [1] "=== testing param.combo 299 on 2016-11-01/2017-10-31" #> nFAST nSLOW #> 299 15 42 #> [1] "=== training MA on 2013-11-01/2017-10-31" #> [1] "=== testing param.combo 13 on 2017-11-01/2018-05-30" #> nFAST nSLOW #> 13 15 20 #> Warning in .updatePosPL(Portfolio = pname, Symbol = as.character(symbol), : #> Could not parse //2018-05-30 as ISO8601 string, or one/bothends of the #> range were outside the available prices: 2003-12-31/2018-05-30. Using all #> data instead. wf_end <-Sys.time()
#> #> Running the walk forward search: #> #> Time difference of 28.85891 mins #> Total trials: 6939
Portfolio | Symbol | Num.Txns | Num.Trades | Net.Trading.PL | Avg.Trade.PL | Med.Trade.PL | Largest.Winner | Largest.Loser | Gross.Profits | Gross.Losses | Std.Dev.Trade.PL | Std.Err.Trade.PL | Percent.Positive | Percent.Negative | Profit.Factor | Avg.Win.Trade | Med.Win.Trade | Avg.Losing.Trade | Med.Losing.Trade | Avg.Daily.PL | Med.Daily.PL | Std.Dev.Daily.PL | Std.Err.Daily.PL | Ann.Sharpe | Max.Drawdown | Profit.To.Max.Draw | Avg.WinLoss.Ratio | Med.WinLoss.Ratio | Max.Equity | Min.Equity | End.Equity | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
EEM | wf.macd | EEM | 19 | 8 | -3577.657 | -447.2072 | -304.0869 | 32.6882 | -1056.624 | 32.6882 | -3610.345 | 383.9985 | 135.764 | 12.5 | 87.5 | 0.009054 | 32.6882 | 32.6882 | -515.7636 | -349.3854 | -447.2072 | -304.0869 | 383.9985 | 135.764 | -18.48756 | -4200.986 | -0.8516232 | 0.0633782 | 0.0935591 | 249.2224 | -3951.764 | -3577.657 |
chart.forward(wfaresults)
testing.timespan | nFAST | nSLOW | |
---|---|---|---|
533 | 2007-11-01/2008-10-31 | 15 | 60 |
325 | 2008-11-03/2009-10-30 | 15 | 44 |
195 | 2009-11-02/2010-10-29 | 15 | 34 |
13 | 2010-11-01/2011-10-31 | 15 | 20 |
208 | 2011-11-01/2012-10-31 | 15 | 35 |
273 | 2012-11-01/2013-10-31 | 15 | 40 |
26 | 2013-11-01/2014-10-31 | 15 | 21 |
364 | 2014-11-03/2015-10-30 | 15 | 47 |
312 | 2015-11-02/2016-10-31 | 15 | 43 |
299 | 2016-11-01/2017-10-31 | 15 | 42 |
131 | 2017-11-01/2018-05-30 | 15 | 20 |
Strong hypotheses guard against risk of ruin.
I hypothesize that this strategy idea will make money.
Specifying hypotheses at the beginning reduces the urge to modify them later and: - adjust expectations while testing - revise the objectives - construct ad hoc hypotheses
seek to answer what and why before going too far
we always have more work than time, so please talk to us if you want to work on these
multiple adjustments exist, examine
stay skeptical of your results
Thank You for Your Attention
Thanks to all the contributors to quantstrat and blotter, especially Ross Bennett, Peter Carl, Jasen Mackie, Joshua Ulrich, my team, and my family, who make it possible.
©2018 Brian G. Peterson brian@braverock.com
This work is licensed under a Creative Commons Attribution 4.0 International License
The rmarkdown (Xie 2014) source code for this document may be found on github
#> prepared using blotter: 0.14.2 and quantstrat: 0.14.3
All views expressed in this presentation are those of Brian Peterson, and do not necessarily reflect the opinions, policies, or practices of Brian's employers.
All remaining errors or omissions should be attributed to the author.
Aronson, David. 2006. Evidence-Based Technical Analysis: Applying the Scientific Method and Statistical Inference to Trading Signals. Wiley.
Bailey, David H, and Marcos López de Prado. 2014. “The Deflated Sharpe Ratio: Correcting for Selection Bias, Backtest Overfitting and Non-Normality.” Journal of Portfolio Management 40 (5): 94–107. doi:10.3905/jpm.2014.40.5.094.
Bailey, David H, Jonathan M Borwein, Marcos López de Prado, and Qiji Jim Zhu. 2017. “The Probability of Backtest Overfitting” 20 (4): 39–69. doi:10.21314/JCF.2016.322.
Feynman, Richard P, Robert B Leighton, Matthew Sands, and EM Hafner. 1965. The Feynman Lectures on Physics. Vols. 1-3.
Harvey, Campbell R., and Yan Liu. 2015. “Backtesting.” The Journal of Portfolio Management 41 (1): 13–28. http://ssrn.com/abstract=2345489.
Hastie, Trevor, Robert Tibshirani, and Jerome Friedman. 2009. The Elements of Statistical Learning: Data Mining, Inference, and Prediction. Second Edition. Springer.
Kipnis, Ilya. 2017. “QuantstratTrader Blog.” July 16. https://quantstrattrader.wordpress.com/.
Mackie, Jasen. 2016. “Backtesting – Harvey & Liu (2015).” November 17. https://opensourcequant.wordpress.com/2016/11/17/r-view-backtesting-harvey-liu-2015/.
Pardo, Robert. 2008. The Evaluation and Optimization of Trading Strategies. Second ed. John Wiley & Sons.
Racine, Jeffrey S, and Christopher F Parmeter. 2012. “Data-Driven Model Evaluation: A Test for Revealed Performance.” McMaster University. http://socserv.mcmaster.ca/econ/rsrch/papers/archive/2012-13.pdf.
Tomasini, Emilio, and Urban Jaekle. 2009. Trading Systems: A New Approach to System Development and Portfolio Optimisation.
Tukey, John W. 1962. “The Future of Data Analysis.” The Annals of Mathematical Statistics. JSTOR, 1–67. http://projecteuclid.org/euclid.aoms/1177704711.
Vince, Ralph. 2009. The Leverage Space Trading Model: Reconciling Portfolio Management Strategies and Economic Theory. John Wiley & Sons.
White, Halbert L. 2000. “System and Method for Testing Prediction Models and/or Entities.” Google Patents. http://www.google.com/patents/US6088676.
Xie, Yihui. 2014. “R Markdown — Dynamic Documents for R.” http://rmarkdown.rstudio.com/.