Genetic Optimizer - Guidelines / How To for selection of parameters

kribel

#1

8/14/2013 12:59 AM

Hello WealthLab users & support,

I am writing this post with the hope that those of you, who are familiar with the Genetic Optimizer in WealthLab, could help to create a user guide line or a how to.

I think, that there are some users out there who would like to use the genetic optimizer but do not know how to calculate the appropriate numbers for Populaton count, Generation count and what to select for Selection method, Crossover method, Mutation method and Mutation Probability.

Given the number of strategy parameters and possible combinations from the exhaustive optimizer, we could create a guide line how to figure out these settings.

What do you think? Can we do that?

Many thanks,
Konstantin

Eugene

#2

8/14/2013 1:20 AM

Here's the guideline/how-to:

Genetic Optimizer

It describes what to select for Selection/Crossover/Mutation method. As to the numeric parameters, no formal rule exists for finding the best combination. Feel free to experiment.

kribel

#3

8/14/2013 1:28 AM

Hi Egene,

many thanks. That is one part of the guide line.

I understand that there is no formal rule for the best combination, but I am sure that other users have experience in that. Therefore I think we could come up with a best practice guide line to calculate these parameters. Don't you think?

I hope that Genetic Optimizer users would share their experience in this post.

Cheers,
Konstantin

LenMoz

#4

8/14/2013 11:22 AM

Hi Konstantin,

My experience with the Genetic option of the Optimizer...

First, some background:
I have two usual backtest scenarios, both using 10 years of data. First is a 210 symbol backtest which takes one and a half minutes to run, second, an S&P 500 backtest which takes three and a half minutes. My strategies have about 10 parameters and include a NeuroLab component. So, given the length of time for a single backtest, exhaustive optimization is not a realistic option. My genetic optimizer runs take from 20 minutes to 3 hours, depending on how many exhaustive possibilities there are.

Optimization Technique
1. Before optimizing, I use multi-symbol backtesting give me reasonable ranges for each parameter.
2. I find I have to limit optimizer to less than 100 exhaustive possibilities, more like 75. (my system only has 3GB memory) This limits me to allowing only 3 to 4 parameters to vary in an optimizer run. So, I may do several optimization runs, varying different combinations of parameters.
3. For "Settings", I use a population count of 75 or 100 and a generation count of 20, for no particular reason except that it works for me.
4. I suspect but can't prove memory leakage, from NeuroLab(?), so usually restart WLP immediately before doing an optimizer run.
5. Perhaps this is obvious, but I start with wide ranges and big increments, then narrow the ranges with smaller increments in subsequent optimizer runs. I never make the increments too small (less than 5% of the range) for fear of overtraining, or tuning to an outlier.
6. On my system, if the optimizer's initial estimate of time remaining exceeds 2 days, chances are the optimization run will crash before finishing. This could be a function of my hardware.

The biggest problem I have is with out-of-memory(?) crashes, which generally occur 2 hours into the run, not fun. My 3GB memory may be a factor.

Hope this helps.

Len

Size:

Color:

Hi Konstantin,

My experience with the Genetic option of the Optimizer...

First, some background:
I have two usual backtest scenarios, both using 10 years of data.  First is a 210 symbol backtest which takes one and a half minutes to run, second, an S&P 500 backtest which takes three and a half minutes.  My strategies have about 10 parameters and include a NeuroLab component.  So, given the length of time for a single backtest, exhaustive optimization is not a realistic option.  My genetic optimizer runs take from 20 minutes to 3 hours, depending on how many exhaustive possibilities there are.

Optimization Technique
1. Before optimizing, I use multi-symbol backtesting give me reasonable ranges for each parameter.
2. I find I have to limit optimizer to less than 100 exhaustive possibilities, more like 75.  (my system only has 3GB memory)  This limits me to allowing only 3 to 4 parameters to vary in an optimizer run.  So, I may do several optimization runs, varying different combinations of parameters.
3. For "Settings", I use a population count of 75 or 100 and a generation count of 20, for no particular reason except that it works for me.
4. I suspect but can't prove memory leakage, from NeuroLab(?), so usually restart WLP immediately before doing an optimizer run.
5. Perhaps this is obvious, but I start with wide ranges and big increments, then narrow the ranges with smaller increments in subsequent optimizer runs.  I never make the increments too small (less than 5% of the range) for fear of overtraining, or tuning to an outlier.
6. On my system, if the optimizer's initial estimate of time remaining exceeds 2 days, chances are the optimization run will crash before finishing.  This could be a function of my hardware.

The biggest problem I have is with out-of-memory(?) crashes, which generally occur 2 hours into the run, not fun.  My 3GB memory may be a factor.

Hope this helps.

Len

Eugene

#5

8/14/2013 2:09 PM

Len,

Chances of memory leaks are minimal in .NET, as the runtime takes care of memory allocation and the Garbage Collector (improved in .NET4) accounts for this - more likely objects may never get disposed (or get but not fast enough). 3Gb is a rather small amount, so you might really benefit from installing say 8-16 Gb and switching to 64-bit WLP by downloading and installing 64-bit Wealth-Lab Pro from Fidelity's website.

kribel

#6

8/15/2013 12:30 PM

HI Len, Eugene,

I see the discussion is going off topic, which still is interesting for me. Therefore I opened another ticket to take this discussion a little further. Here is the link: http://www.wealth-lab.com/Forum/Posts/How-to-tweak-the-system-to-boost-WealthLab-Optimizer-speed-33480/

Back to the topic. Len thank you very much for your input! This is what I understand from your description:

100 parameter combinations or less means:
Population Count = 75..100
Generation Count = 20

Do you actually really mean 100 combinations or rather 100k?

@Eugene:
The Genetic Optimizer guide line (http://www2.wealth-lab.com/WL5WIKI/GeneticOptimizer.ashx) says that up to 200k combination the default settings can be used. That is:
Population Count = 100
Generation Count = 20

The number of combinations is what ever the exhaustive optimizer is showing divided by the number of symbols in the selected data set. Is that right?

I also found out that there are maximum values:
Population Count Max = 1000
Generation Count Max = 100

The question is which values do I use if I go beyond 200k of parameter combinations? Can the rule of three be used here? Which is a linear equation. Or do we need some sort of more complex formula to calculate the appropriate values for the genetic optimizer settings?

Is somebody out there who is using the Genetic Optimizer with more then 200k of parameter combinations?

Many thanks,
Konstantin

LenMoz

#7

8/15/2013 12:56 PM

Konstantin, (Sorry for the off topic)

QUOTE:
100 parameter combinations or less means:
Population Count = 75..100
Generation Count = 20

Do you actually really mean 100 combinations or rather 100k?

I don't know the internals of Optimizer(Opt) or the theory behind it, but did not conclude that population count was ever multiplied by generation count. That seems prohibitive. My assumption was that Opt would generate a population of 75 or 100 members, possibly with duplicates, and combine the best genes for up to 20 iterations, unless the goal converges in fewer. Since in my case it takes minutes to evaluate one member (set of parameters), that more or less coincided with my experience in terms of timing.

My experience is also that while initial time estimates may be more than 24 hours, don't despair, that Opt run may well complete in under 2 hours.

Len

LenMoz

#8

8/15/2013 3:14 PM

Konstantin,

Would you care to describe your Optimization problem? You say,

QUOTE:
The question is which values do I use if I go beyond 200k of parameter combinations?

That sounds like the number for an exhaustive calculation. The advantage of the Genetic algorithm is that it avoids work by creating a smaller set of random combinations, then crossing parameters among better-performing chromosomes. Poorly performing combinations don't get to the subsequent generation.

Also, I'll restate my initial point 5...

QUOTE:
5. Perhaps this is obvious, but I start with wide ranges and big increments, then narrow the ranges with smaller increments in subsequent optimizer runs. I never make the increments too small (less than 5% of the range) for fear of overtraining, or tuning to an outlier.

For example, I have a "stop loss" parameter, which I typically set between 85 and 95(%). For the initial optimization, I'll use a starting value of 85, ending value of 95, and a step size of 5, or even 10. For this parameter, Optimization has only 3 or 2 values to work with. If the results of that Opt run show that all the performant chromosomes have 85, I might wonder if lower is better and do a 2nd run using 80(not available in the 1st run) to 85, step 5. The point I'm trying to make is that using too small a step size is tuning to the data and won't generalize well, not to mention that it multiplies the time it takes for the optimizer run. I see in another thread that you are considering hardware solutions. Simplifying the problem is another approach.

If this not helpful to you, let me know.

Best Regards,
Len

Size:

Color:

Konstantin,

Would you care to describe your Optimization problem?  You say,
[quote]The question is which values do I use if I go beyond 200k of parameter combinations?[/quote]

That sounds like the number for an exhaustive calculation.  The advantage of the Genetic algorithm is that it avoids work by creating a smaller set of random combinations, then crossing parameters among better-performing chromosomes.  Poorly performing combinations don't get to the subsequent generation.

Also, I'll restate my initial point 5...
[quote]5. Perhaps this is obvious, but I start with wide ranges and big increments, then narrow the ranges with smaller increments in subsequent optimizer runs. I never make the increments too small (less than 5% of the range) for fear of overtraining, or tuning to an outlier.[/quote]
For example, I have a "stop loss" parameter, which I typically set between 85 and 95(%).  For the initial optimization, I'll use a starting value of 85, ending value of 95, and a step size of 5, or even 10.  For this parameter, Optimization has only 3 or 2 values to work with.  If the results of that Opt run show that all the performant chromosomes have 85, I might wonder if lower is better and do a 2nd run using 80(not available in the 1st run) to 85, step 5.  The point I'm trying to make is that using too small a step size is tuning to the data and won't generalize well, not to mention that it multiplies the time it takes for the optimizer run.  I see in another thread that you are considering hardware solutions.  Simplifying the problem is another approach.

If this not helpful to you, let me know.

Best Regards,
Len

kribel

#9

8/26/2013 1:35 AM

HI Len,

QUOTE:
I see in another thread that you are considering hardware solutions. Simplifying the problem is another approach.

That is right! Nevertheless simplifying the problem is not always an option and it is definitely not an excuse to let the hardware out of the scope.

QUOTE:
Would you care to describe your Optimization problem?

I understand the way the Genetic Optimizer works. My problem is how to calculate the right values for population and generation count when the required runs exceed 200k.

The Genetic Optimizer guide says:

QUOTE:
No formal rule exists for finding the best combination. We recommend starting your optimization using default values for up to 200K runs required.

The 200k required runs mentioned in the guide is a reference number from the Exhaustive Optimizer. I guess this is the number displayed in Wealth Lab if I select the Exhaustive Optimizer, take the value from Runs Required field and divide it by the number of symbols of the selected data set.

The formula would be: