Do you want to get informed about new posts via Email?

Any suggestions on statistics?

Author: mikesblack

Creation Date: 9/8/2009 3:12 PM

It's is somewhat confusing knowing what methods and tests are relevant for analysis and testing of time series. Sometimes I may get results that seem very good but I realize I may be misapplying these standard stats methods found in text, e.g. binomial probability, Z tests, T tests, etc.

Are there any good books, websites, videos that someone can suggest. I have read quite a bit from other members here and thanks to them have definitely furthered my understanding , but much of it is above my comprehension at this time. The posts have illuminated many misunderstanding I have on the subject. I would like to fill in the gaps in my knowledge and not only just plugging numbers into formulas, but possessing a solid grasp of the concepts.

Thanks kindly.

Are there any good books, websites, videos that someone can suggest. I have read quite a bit from other members here and thanks to them have definitely furthered my understanding , but much of it is above my comprehension at this time. The posts have illuminated many misunderstanding I have on the subject. I would like to fill in the gaps in my knowledge and not only just plugging numbers into formulas, but possessing a solid grasp of the concepts.

Thanks kindly.

Hi Michael,

statistics is a big field, and statistics that are relevant for trading systems are few and far between, for a variety of reasons. Also, there are a range of caveats and exceptions a mile long when it comes to applying statistics to what we do.

I have several chapters about this in my book, which explain the statistically correct way to compare trading systems, compute z-scores, interpret correlations etc... see: http://www.harriman-house.com/pages/book.htm?BookCode=426483

One of the biggest issues with applying statistics in trading is making sure that the statistic you are using is valid (meets the necessary conditions) to be validly applied. Essentially you can apply almost any statistic to a set of data, but if the data doesn't meet the pre-conditions, then the result is meaningless.

So rather than worry about specific tests, is there something specific you are trying to find out?

Cheers,

Bruce

statistics is a big field, and statistics that are relevant for trading systems are few and far between, for a variety of reasons. Also, there are a range of caveats and exceptions a mile long when it comes to applying statistics to what we do.

I have several chapters about this in my book, which explain the statistically correct way to compare trading systems, compute z-scores, interpret correlations etc... see: http://www.harriman-house.com/pages/book.htm?BookCode=426483

One of the biggest issues with applying statistics in trading is making sure that the statistic you are using is valid (meets the necessary conditions) to be validly applied. Essentially you can apply almost any statistic to a set of data, but if the data doesn't meet the pre-conditions, then the result is meaningless.

So rather than worry about specific tests, is there something specific you are trying to find out?

Cheers,

Bruce

Thanks Bruce. There are a many concepts that I would like to develop. I have many questions and thoughts. Here are a few of them.

1. How to data mine( look for best stocks for a system)avoiding curve fitting and having some metric to determine significants from those results.

2. I develop rules from a 10 year window on S and P 100 stocks. Results look good, so I apply to WL100, NASDAQ 100, S and P 500, Russell 1000. Those results look good for the same time frame for all the indexes . The results are good, but some are better than others.

I back test on the years prior to that 10 year window. Results there are also good, but are characteristically different(e.g. sharp ratios better on earlier data, etc.).

First: I'd like to know if there is a checklist to determine if the results I am getting are significant and how best to make those determinations.

Second: Is there a way to examine the persistence of the results I am getting and how much I can expect the results to change with time.

Third: How can I determine, if and or when the system is no longer working?

3: Methods to factor survivership bias:

4. Amount of data:

I have concern using 20 years of stock data( offered through Fidelity). I wonder how to think about forecasting 1 year- 5 or greater based on 20 years history.

I think of an analogy. If I only knew about the growth trends of a baby through 20 years of life. With millions of samples, I can be fairly precise in describing these trends, but I would be completely inaccurate applying the observed growth bias of the first 20 years of life to the next 20, or even 5 or one year.

At some point I would have to recognize the trends have stopped and reject the future projections.

It's like this stock market. From late 1800s forward there has been a positive bias. From my understanding since the 50s the markets have had a low 70% positive winning to losing month ratio. Certainly a fair coin toss wouldn't yield such results, so I assume most long term buy and holders would use this thinking as security for long term index investing.( So long as one can sit through (the 70s)( the 2000s) 10 year windows of flat returns all is OK. ( This all neglecting inflation.)

So 100 years can't forecast the 100 year flood, but it can show patterns that on balance show frequency greater than chance would suggest. It has been the case for 99 years, so the 100th year I presume will be the same. But how about the 101st?

1. How to data mine( look for best stocks for a system)avoiding curve fitting and having some metric to determine significants from those results.

2. I develop rules from a 10 year window on S and P 100 stocks. Results look good, so I apply to WL100, NASDAQ 100, S and P 500, Russell 1000. Those results look good for the same time frame for all the indexes . The results are good, but some are better than others.

I back test on the years prior to that 10 year window. Results there are also good, but are characteristically different(e.g. sharp ratios better on earlier data, etc.).

First: I'd like to know if there is a checklist to determine if the results I am getting are significant and how best to make those determinations.

Second: Is there a way to examine the persistence of the results I am getting and how much I can expect the results to change with time.

Third: How can I determine, if and or when the system is no longer working?

3: Methods to factor survivership bias:

4. Amount of data:

I have concern using 20 years of stock data( offered through Fidelity). I wonder how to think about forecasting 1 year- 5 or greater based on 20 years history.

I think of an analogy. If I only knew about the growth trends of a baby through 20 years of life. With millions of samples, I can be fairly precise in describing these trends, but I would be completely inaccurate applying the observed growth bias of the first 20 years of life to the next 20, or even 5 or one year.

At some point I would have to recognize the trends have stopped and reject the future projections.

It's like this stock market. From late 1800s forward there has been a positive bias. From my understanding since the 50s the markets have had a low 70% positive winning to losing month ratio. Certainly a fair coin toss wouldn't yield such results, so I assume most long term buy and holders would use this thinking as security for long term index investing.( So long as one can sit through (the 70s)( the 2000s) 10 year windows of flat returns all is OK. ( This all neglecting inflation.)

So 100 years can't forecast the 100 year flood, but it can show patterns that on balance show frequency greater than chance would suggest. It has been the case for 99 years, so the 100th year I presume will be the same. But how about the 101st?

Hi Michael,

there are quite a few questions here, and many of them deal with forecasting.

The first thing you should know is that statistics are not forecasts. There are methods to deal with forecasting (for example, neural networks), which is my main field. However, that is nothing to do with statistics.

Think of statistics as providing a more formal vindication of some previous hypothesis. Most of statistics deals with comparing two sets of "things" or comparing a "thing" to a set of "things".

Thus we can get confidence that two sets of things are "reasonably" similar (subject to a degree of probability), or that one thing is most likely a member of a particular set of things (subject to mean, variance etc, and all again subject to some degree of probability).

Further, (dependant on how you feel about the debate in finance regarding returns and normal distributions), you can assess the sequence of order of occurrence of members of your set.

There are some further specific forms of statistical tests that are valid for our work, but they are rather niche and have many tests to meet before they are valid.

Also, whether a particular statistic can be applied to a set of data depends vary much on the distribution of that data, hence, there aren't any general guidelines, except (the overabused, and generally misused) central limit theorum, and whether your distribution statistic is modestly tolerant of departures from normality.

Hope that helps,

Bruce

there are quite a few questions here, and many of them deal with forecasting.

The first thing you should know is that statistics are not forecasts. There are methods to deal with forecasting (for example, neural networks), which is my main field. However, that is nothing to do with statistics.

Think of statistics as providing a more formal vindication of some previous hypothesis. Most of statistics deals with comparing two sets of "things" or comparing a "thing" to a set of "things".

Thus we can get confidence that two sets of things are "reasonably" similar (subject to a degree of probability), or that one thing is most likely a member of a particular set of things (subject to mean, variance etc, and all again subject to some degree of probability).

Further, (dependant on how you feel about the debate in finance regarding returns and normal distributions), you can assess the sequence of order of occurrence of members of your set.

There are some further specific forms of statistical tests that are valid for our work, but they are rather niche and have many tests to meet before they are valid.

Also, whether a particular statistic can be applied to a set of data depends vary much on the distribution of that data, hence, there aren't any general guidelines, except (the overabused, and generally misused) central limit theorum, and whether your distribution statistic is modestly tolerant of departures from normality.

Hope that helps,

Bruce

Thanks very much Bruce.

I am quite interested in studying this further.

I look forward to reading your book.

Mike

I am quite interested in studying this further.

I look forward to reading your book.

Mike

Mike, What a fantastic set of questions you ask. I agree with all that Bruce writes but would add a few other comments as well. Note, I have comments, but really no answers, and those are the same issues that I worry about. I would add one other big question to your list of questions and would similarly ask for any help / feedback on the extent to which others worry about this or take it into consideration in their own system design:

and the question is: HOW TO HANDLE DATA QUALITY ISSUES? it seems like i go through similar steps as you do, come up with what i perceive to be a winning formula / methodology, test it somewhat rigorously over various time periods and different data sets, and then move into implementation mode. Only when running the system each day, do the data quality issues become apparent. I happen to own KRE today and yesterday resulting from one of my systems, I had a good day in it yesterday and it closed at 4:00 at 20.89 (I was watching it fairly carefully). Low and behold this morning, I open up my trading platform and the market center official close for the symbol was 20.30. This isn't even an error from the data providers, for some reason, the exchange(s) chose 20.30 as the official close. Now, my system that runs on eod data, will forever view yesterday's price action as a loss, when it really was a gain, and any buys triggered by systems based on yesterday's close of 20.30 will record what appears to be a big win today, when in reality the etf closed down 1 cent from what was an executable fill at the close. My worry is that this happens all the time, and when we backtest, we get the system results that are skewed by the bad data and not actually achievable in real-time trading. If anyone has any suggestions about how to handle this, I'd love to hear the comments.

To some of your points: I was fortunate to attend a very interesting seminar this summer presented by Baruch College and Bloomberg Alpha: ARPM'09 (Advanced Risk and Portfolio Management). The presenter / author is generous enough to put all the course materials as well as his entire text book in slide form on his website, which can be accessed here: http://www.baruch.cuny.edu/math/arpm2009/course.html and here: http://www.symmys.com -- the book is a good buy as well, btw. You may find some of the material interesting. I've been focused on using PCA as a forecasting tool, with some initially promising results (although i'm still early in the development / testing).

My own view and hope is that statistics is a tool to help with constructing and evaluating forecasts. As you clearly indicate, one has to be very careful in how various statistics are applied to data -- for example, applying many statistics to price data is generally taboo, while applying them to return data, is potentially useful. I tend to focus only on dividend adjusted return data (actually log return) and only use price series when i'm buying or selling.

Finally, I'd add that given everything I've read and learned on the subject of building trading systems, there is clearly no one right way to do it. Given the thought you've put into the questions you've asked, i'd bet the common sense approach you take to designing and building systems is as good as most other approaches out there and would probably serve as a "best practice" for many.

Regards,

Steve

and the question is: HOW TO HANDLE DATA QUALITY ISSUES? it seems like i go through similar steps as you do, come up with what i perceive to be a winning formula / methodology, test it somewhat rigorously over various time periods and different data sets, and then move into implementation mode. Only when running the system each day, do the data quality issues become apparent. I happen to own KRE today and yesterday resulting from one of my systems, I had a good day in it yesterday and it closed at 4:00 at 20.89 (I was watching it fairly carefully). Low and behold this morning, I open up my trading platform and the market center official close for the symbol was 20.30. This isn't even an error from the data providers, for some reason, the exchange(s) chose 20.30 as the official close. Now, my system that runs on eod data, will forever view yesterday's price action as a loss, when it really was a gain, and any buys triggered by systems based on yesterday's close of 20.30 will record what appears to be a big win today, when in reality the etf closed down 1 cent from what was an executable fill at the close. My worry is that this happens all the time, and when we backtest, we get the system results that are skewed by the bad data and not actually achievable in real-time trading. If anyone has any suggestions about how to handle this, I'd love to hear the comments.

To some of your points: I was fortunate to attend a very interesting seminar this summer presented by Baruch College and Bloomberg Alpha: ARPM'09 (Advanced Risk and Portfolio Management). The presenter / author is generous enough to put all the course materials as well as his entire text book in slide form on his website, which can be accessed here: http://www.baruch.cuny.edu/math/arpm2009/course.html and here: http://www.symmys.com -- the book is a good buy as well, btw. You may find some of the material interesting. I've been focused on using PCA as a forecasting tool, with some initially promising results (although i'm still early in the development / testing).

My own view and hope is that statistics is a tool to help with constructing and evaluating forecasts. As you clearly indicate, one has to be very careful in how various statistics are applied to data -- for example, applying many statistics to price data is generally taboo, while applying them to return data, is potentially useful. I tend to focus only on dividend adjusted return data (actually log return) and only use price series when i'm buying or selling.

Finally, I'd add that given everything I've read and learned on the subject of building trading systems, there is clearly no one right way to do it. Given the thought you've put into the questions you've asked, i'd bet the common sense approach you take to designing and building systems is as good as most other approaches out there and would probably serve as a "best practice" for many.

Regards,

Steve

I'll check out those leads. Thanks!

About data, I tend to think that if there are errors in the data, so long as errors are not to frequent, extreme or skewed the numbers should wash out as the sample size increases. The issue takes on different meaning however on thin stocks and older split adjusted stocks. For example, if average trades are +- 2%. A series of 10% differences can really distort results, even if errors revert to random.

Survivorship bias in testing data is of most concern for me.

Yes common sense does seem the most potent ingredient in a researchers toolkit. I have seen experts in every field become fixated on the tools of measurements their field embraces and mistake the measurements for plain observation. Experts can become quite precise in how inaccurate they are. They can become entrenched in a supporting a wrong idea especially in the face of the obvious. Check the link below.

I find the debates among economists, politicians and other various schools of thought interesting. Ambiguity is hard to accept, so rather than admitting doubt, one ascribes to the models they know and love, even if it's bunk. ( I think of the story of a woman looking for a ring on a dark road under a streetlamp who when asked, where about did she think she lost her ring answered, "across the street". When further asked," why are you looking here then?, answered , " because it is much easier to see under this lamp".

-----

Tuesday, 9 September 2008

How to be sure that your beliefs are not just a load of bull

In his film, Al Gore quoted Upton Sinclair: "You can't make somebody understand something if their salary depends upon them not understanding it." That's true, but we need to remember people who work for lobby groups and charities have salaries, too, and that wages are not the only things we value. A deeper, broader version of Sinclair's maxim would be, "You can't make somebody understand something if their existing world view depends upon them not understanding it."

Latest column in The Herald. Julian Baggini http://www.heraldscotland.com/how-to-be-sure-that-your-beliefs-are-not-just-a-load-of-bull-1.839140

About data, I tend to think that if there are errors in the data, so long as errors are not to frequent, extreme or skewed the numbers should wash out as the sample size increases. The issue takes on different meaning however on thin stocks and older split adjusted stocks. For example, if average trades are +- 2%. A series of 10% differences can really distort results, even if errors revert to random.

Survivorship bias in testing data is of most concern for me.

Yes common sense does seem the most potent ingredient in a researchers toolkit. I have seen experts in every field become fixated on the tools of measurements their field embraces and mistake the measurements for plain observation. Experts can become quite precise in how inaccurate they are. They can become entrenched in a supporting a wrong idea especially in the face of the obvious. Check the link below.

I find the debates among economists, politicians and other various schools of thought interesting. Ambiguity is hard to accept, so rather than admitting doubt, one ascribes to the models they know and love, even if it's bunk. ( I think of the story of a woman looking for a ring on a dark road under a streetlamp who when asked, where about did she think she lost her ring answered, "across the street". When further asked," why are you looking here then?, answered , " because it is much easier to see under this lamp".

-----

Tuesday, 9 September 2008

How to be sure that your beliefs are not just a load of bull

In his film, Al Gore quoted Upton Sinclair: "You can't make somebody understand something if their salary depends upon them not understanding it." That's true, but we need to remember people who work for lobby groups and charities have salaries, too, and that wages are not the only things we value. A deeper, broader version of Sinclair's maxim would be, "You can't make somebody understand something if their existing world view depends upon them not understanding it."

Latest column in The Herald. Julian Baggini http://www.heraldscotland.com/how-to-be-sure-that-your-beliefs-are-not-just-a-load-of-bull-1.839140