Factors to help predict Out-of-Sample Performance
Author: sedelstein
Creation Date: 6/2/2016 12:50 PM
profile picture

sedelstein

#1

Here is a new paper dated 3/9/16 from the folks at Quantopian that people here may be interested in reading.

This paper http://papers.ssrn.com/sol3/papers.cfm?abstract_id=2745220 All that Glitters Is Not Gold: Comparing Backtest and Out-of-Sample Performance on a Large Cohort of Trading Algorithms offers some interesting factors that might be useful to have in the performance+ visualizer.

Some of the factors are already available to us. The attached graph is an excerpt which ranks the factors the authors found to be important.

Some interesting ones include

tail_ratio - Ratio between the 95th and (absolute) 5th percentile of
the daily returns distribution. This was found to be the most important factor

For example, a tail ratio of 0.25 means that losses are
four times as bad as profits.

skewness - skewness of daily returns distribution
kurtosis - kurtosis of daily returns distribution
sharpe_std - Standard deviation of rolling 6 month Sharpe ratio


Food for thought for those of us interested in avoiding overfitting.

Regards
profile picture

Carova

#2
It would be great if these factors could be added to the MS123 Scorecard.

Would that be possible Eugene?

Vince
profile picture

Eugene

#3
Steve, thanks for the find. Most of the metrics aren't suitable for different reasons (skewness and kurtosis are too specific and rolling Sharpe ratio is too big an effort), let's see if Tail Ratio is useful or not. At the moment I'm pretty skeptical re: tail ratio because isn't bullet proof.

Consider a simple example. The 95th percentile of a system's EOD returns is 43419 and the absolute 5th percentile is 66144. The tail ratio is therefore 0.65 meaning that the loss is only ~1.5 times greater than the win. This must be feasible enough to stomach, right? Well, here's the actual backtested equity behind that 0.65 tail ratio with a close to total loss of capital - its Net Profit is -74%:



Vince, why do you think it'd be great? If you had tested them before then what were your findings? Don't you find that skewness and kurtosis are too specific to fit the paradigm of the MS123 Scorecard? At least I do. If I were to consider adding something I'd give it a go first in the visualizer to see if it proves useful and only then go along.
profile picture

sedelstein

#4
Hi Eugene

The dollar P&L figures above. Were they for a constant dollars per trade? I would have thought the returns should be in percents and the strategy executed with a constant percentage of portfolio allocated to the trade

At any rate. I thought the paper might be helpful to the folks here.
profile picture

Eugene

#5
Steve, the test was done using a Percent Equity and the daily returns were absolute. In my experiments, values below 0.50 were also reported in some cherrypicked tests (thereby confirming the percentile formula is correct) but most of the time the TR stays above 0.7 even for desperately losing systems.

Taking the returns in percents of a losing system doesn't make difference: the tail ratio numbers still make no sense to me. Thanks for the suggestion anyway.
profile picture

Eugene

#6
Although it will not make its way to the Scorecard, look for Tail Ratio in Performance+ in the upcoming release of MS123 Visualizers.
profile picture

Carova

#7
Hi Eugene!

QUOTE:
Vince, why do you think it'd be great? If you had tested them before then what were your findings? Don't you find that skewness and kurtosis are too specific to fit the paradigm of the MS123 Scorecard? At least I do. If I were to consider adding something I'd give it a go first in the visualizer to see if it proves useful and only then go along.


The "tails ratio" has a strong basis in both theory and practice. It is based on the observation that financial returns do not follow a Gaussian Distribution (with exponential tails) but are rather much closer to a Pareto or Cauchy Distribution (with power law tails). This is often referred to as "fat tails" in returns. Several academic and financial studies have shown that the vast majority of returns (both positive and negative) results from these tails. As a result, you would expect that any trading strategy that reduced either the positive or negative tail would significantly impact future results. I have used something similar to sort my strategies prior to extensive testing, and have found it of value.

QUOTE:
Although it will not make its way to the Scorecard, look for Tail Ratio in Performance+ in the upcoming release of MS123 Visualizers.


The value of this would be limited. This type of metric is MUCH MORE valuable during an optimization rather that ex post facto.

Vince
profile picture

Eugene

#8
Thank you for the feedback Vince. Let's first evaluate the tail ratio numbers in backtest mode, and if they are reasonable we may consider going along.
profile picture

Eugene

#9
For what it's worth, check out the tail ratio on the Performance+ tab in new version 2016.07.
profile picture

sedelstein

#10
FWIW and for those who are interested here is another paper dated last month with some interesting metrics that I had not seen before.
It is not a request for support unless and perhaps the user community think it worthwhile.

Don't Stand So Close to Sharpe

http://papers.ssrn.com/sol3/papers.cfm?abstract_id=2792592

The reference section has interesting papers as well.

Their FT ratio seems to have better OOS performance and has low correlations 0.6ish to the Sharpe Ratio
profile picture

Carova

#11
Interesting statistic (FT ratio) that I have never heard of prior to your post. It does make some sense, and it appears to be somewhat similar to the Tails Ratio in that it is examining the contributions of the positive and negative portions of the distribution (where most of the gains and losses are located).

I have been using the sum of the top 2.5% trades (gains) over the bottom 2.5% trades (losses) as a metric for the past few years as a ranking metric for non-trend-following systems and have found it to have significant value. Unfortunately it seems to have little value for trend-following systems.

Vince