Any plans for multicore support?
Author: SunriseMan
Creation Date: 4/11/2011 7:57 PM
profile picture

SunriseMan

#1
Are there any plans to add multicore support to WL? Almost every computer being sold (even netbooks) is now multicore, so I'm sure this would be popular. I'd really like to see the speed-up that could occur with backtesting and optimization if all four of my cores were utilized.

I saw a question of whether this would be included in WL6 (obviously not), but I haven't seen addressed whether it's on the roadmap for a future release.
profile picture

Eugene

#2
Let's say, it's not going to appear in version 6.2 -- its highlight will be multi-system backtesting (a new type of strategy called Combination Strategy).

Everyone interested please consider calling your Fidelity rep and telling you need multi-core support in WL6.
profile picture

joannakim

#3
Please excuse my ignorance, but what would multi core support of WLP do? I have a quad core Intel chip in my PC. I would love to know more about how multi core would operate to enhance WLp's performance. Thanks
profile picture

Eugene

#4
To quote the topic starter:
QUOTE:
"I'd really like to see the speed-up that could occur with backtesting and optimization if all four of my cores were utilized."
profile picture

dan_rozenberg

#5
Would WL really benefit by being multi-core supportive (same as multi-threading?) . Aren't most of WL calculations done in a linear fashion?
profile picture

Eugene

#6
Right now WL 6.2 doesn't utilize all of a CPU's multiple cores. Consequently, typical CPU load of a heavy backtest on a quad core would be 25% vs. 100% with multi-core support. Add here parallel optimizations, or parallelizing a single optimization run.
profile picture

hlh

#7
In good old WL4 times one could copy WL.exe to WL1.exe, WL2.exe, ... to run 4 instances of WL on a quad core CPU and split long opti runs over the 4 process. Can I do the same with WL6 (6.2 Dev. 64bit)? I thought I red somewhere here in the forum or the Wiki to start WL as another user to have another instance. This did not work out for me (by trying to start the very same exe). Is there a way to do that?
profile picture

Eugene

#8
Yes you can. Cumbersome but should work:

''Workspaces'' provide a quick and convenient way to do what might require multiple instances. Still it's possible to run several copies of Wealth-Lab 6 under different Windows user names.
profile picture

dan_rozenberg

#9
Eugene, are there any plans to add multi-core support to Exhaustive backtesting in the next version (6.3)?

The backtester could run each run of the optimization on a different core, if it was coded right, correct?
profile picture

Eugene

#10
QUOTE:
The backtester could run each run of the optimization on a different core, if it was coded right, correct?

It is coded right. Only .NET 4.0 brings true parallelism (well, .NET 3.5 with Parallel Extensions to be precise), we're on .NET 2.0.

Furthermore, the existing Optimizer API is not sufficient to create your optimizers supporting parallel optimization (been there done that). Unless you go out of your way (but please don't ask me).
profile picture

dan_rozenberg

#11
Ok, understood. Thanks!
profile picture

hlh

#12
As long as WL still does not use multicore I wonder whether hyperthreading on an Intel i2600k for example would even be slower than HT turned off or simply using an Intel i2500k which does not support HT?

As I need to upgrade RAM to 16GB to hopefully increase WLs capability to be used over intraday data for some symbols and years I am thinking of upgrading mainboard and CPU on the development computer as well. Think to anyway go for the i2600k (and not the i2500k) but so I realized that the difference of those CPUs would also be HT and the question if HT wouldn't even slow down singlecore apps like WL popped to my mind.

Is this turbo boost feauire someting which would/could speed up a single process? Is it recommemded to use or turn off HT for WL?

And: please make WL support multicore a.s.a.p. This from my side and the lot of other traders and WL users I am working with would be one of the two most important issues! The other one is stability when running it over more data! Way more iortant than some fancy features for the time being. Thanks. lot!
profile picture

abegy

#13
Agree with you ! Must be one of the top priorities.
profile picture

Eugene

#14
What does the phrase "make WL support multicore" really mean to you?

If it means "magically parallelize my Strategy's calculations" then it's not realistic to expect.

Besides the obvious multi-core optimization enhancement which would speed things up, in which tool do you expect to get the boost from parallel extensions, if at some point in time they appeared by virtue of .NET 4.0?
profile picture

jalalfeghhi1

#15
Eugene,
I have a 12 core processor and am getting about 8% cpu utilization. I have toyed with the idea of writing my own optimization which can leverage the new multi core design. It would solve another problem that I have. Because I develop code in CSharp, the built-in optimizer does not allow me to remove some of the parameters to speed up the process. This is very annoying and requires me to change my code.

One issue that I see is the builtin 1-parameter and 2-parameter graphs that are really nice. While I can develop the calculations rather quickly, I have no idea how to re-create the builtin graphs that work with sliders.

Do you have any suggestions for me? Is there any way I can gain access to the built-in graphics capabilities? Any other ideas?

-thanks, J
profile picture

Eugene

#16
J,

The built-in graphic capabilities (i.e. the part of Exhaustive optimizer that does the charting) are proprietery Fidelity code (closed source). Consequently, if one wishes to recreate them in your own optimizer, he/she would have to do it from scratch. You might want to raise this question in a different thread as it's getting too specific for this general thread.
profile picture

Cone

#17
If I'm not mistaken, Wealth-Lab uses TeeChart for those graphs. Price charting, on the other hand comes from proprietary Wealth-Lab code.
profile picture

Eugene

#18
Right but TeeChart controls are licensed for use in Wealth-Lab and thereby shouldn't work in 3rd party assemblies right away (w/o purchasing a license).
profile picture

Eugene

#19
I think that "add muti-core CPU support" sounds too generic. Based on user demand it would be a milestone if the product offered exhaustive optimization boosted by multi-core support. But from any practical standpoint, updating the .NET framework version to 4.0 first is a prerequisite because it brings support for parallelizing tasks. I'd vote for these 2 items, .NET4 and parallelized optimization, for 6.4 or 6.5.
profile picture

jalalfeghhi1

#20
Eugene, Cone,
Thanks much for your feedback. I did a bit of research and I agree that WLP support of .net 4.0 is essential to implement exhaustive optimization. I will give Fidelity a call to push it from our side.

-Best, J
profile picture

festipower

#21
Hello to all.

I have been working for a long time in multithreaded applications using the .NET platform, and i can say that .NET 4.0 isn't required at all to parallelize Wealth-Lab backtests and optimizations. It could make it easier (or not, depending on the approach), but it isn't necessary at all.

On the other hand, I think the best strategy to accelerate the execution would be a multi-level parallelization:

------1.-Execution of the strategy on each symbol of a Data Set in in parallel. (strategy level parallelization):
this parallelization seems a priori a good candidate to be done by using multiple threads, using structures such as semaphores or other structures to synchronize access to data structures shared by all threads (Data Set????) . With proper design, I suspect that the level of parallelization could be very high. The solution does not seem excessively complicated:
A)-A thread reads the Data Set.
B)-As it reads each symbol, it launches another thread executing the strategy for that symbol.
C)-When all the threads have finished executing the strategy, another thread is responsible for conducting the 'position sizing' tasks (this can not be easily parallelized).

It should be a way (perhaps an overridable property at class WealthScript) to enable or disable the parallelization of strategy, as there may be cases where the strategy should be executed on the traditional mode (sequentially).

------2.-Execution of the backtests composing an optimization in parallel.(optimization level paralelization):
this parallelization may be implemented using Background Workers (or Tasks. NET 4) very easily in my opinion.

The best type of parallelization would be under my view the parallelization of type 1 (strategy level parallelization), especially for backtests in Data Sets with a large number of symbols. In addition, this parallelization would also directly benefit the optimizers.
Type 2 parallelization would be good especially for optimization of strategies using sigle symbol or Data Sets with few symbols.

Ideally both parallelizations should be implemented in Wealth-Lab.

profile picture

Eugene

#22
Carlos,

Thank you for your suggestions. We will forward them to the developers.

Re: .NET4. If using the .NET4 features rather than dealing with the thread pool and locks, the resulting code would be more efficient. But not only upgrading to .NET 4.0 would allow to take advantage of the new parallel programming features. We would be able to utilize .NET4 assemblies in WL6.x and benefit from the enhanced garbage collection, to name a few.
profile picture

festipower

#23
Eugene,

Yes, you are totally right about .NET4. I think the same.

I just wanted to say that .NET4 isn't required in order to make the program multithreaded.

The problem that WealthLab solves when backtesting and optimizing and the way it uses to achieve the solution is easy to parallelyze, regardless the .NET version used. The time used executing those tasks would be greately reduced using modern multicore hardware.

Regards.
profile picture

dan_rozenberg

#24
Hi Eugene,

Is there an ETA for version 6.3?
profile picture

Eugene

#25
Hi Dan,

The features we're talking here (the topic says "multicore support") will not make it into version 6.3.
profile picture

Cone

#26
ETA next week, unless something changes.

However, while speaking of changes, the S. Monitor enhancement (with respect to Fidelity data) had to be delayed until 6.4. Consequently, 6.3 is essentially a maintenance-only release, i.e., bug fixes.
profile picture

skalman99

#27
How do utilize several cores in CPU when doing exhaustive backtesting:
----------------------------------------------------------------------

1. Put several copies of the (almost) same strategy in same C#-file. They need to have separate GUIDs and names:

namespace Strategies
{
public class Strategy1 : Wealtscript ...

public class Strategy2 : Wealtscript ...

}

2. Assume you need to test StrategyParameter strapParam1 for values 1 to 100. In Strategy1 include line
strapParam1 = CreateParameter("param1", 1, 1, 50, 1);.
In Strategy2 use this line instead:
strapParam1 = CreateParameter("param1", 51, 51, 100, 1);

3. Now 2 new separate workspaces. In the first workspace open Strategy1, in the second open Strategy2. Click optimize in each
of the workspaces. Voila, CPU-utilization will be twice as big. (This can of course be repeated for the number of cores you have.)

Regards Jon Brewer
profile picture

Eugene

#28
Alternative ways:

A. Launch two (three...) WL instances with the same strategy in different Windows accounts (usernames). Tweak the Strategy Parameters so they don't overlap.

B. Use Genetic Optimizer.
profile picture

dan_rozenberg

#29
Now that 6.3 is out...what are our chances for multicore support in 6.4? This is the one thing i pray for every night before i go to bed!!
profile picture

Cone

#30
It's not in the cards for 6.4, but at least 6.4 should be built on .NET 4.0. Baby steps...
profile picture

hlh

#31
Reading that Garbage Collection would work better in .NET4 would be worth to stop anything else at Fidelity (including their core brokerage business) until it is done in WL.

Multicore usage is a NO BRAINER and A MUST for a SW which core functionality is to loop thru a bunch of data (series) and do mathematical operations. Whoever has not used the Optimizer extensively has not developed or back-tested a trading strategy. So, even if WL would only allow to start n optimizations on an n core machine without this WL instance trick (which I did not manage to get to work) this would be a huge step into the right (mutlit core) direction.

All the fancy stuff is nice but first of all a stable and fast backtesting engine (which, from time-to-time, gives back some of the enormous Giga bytes of RAM it consumes) is key (for me at least). Being able to use .NET4 stuff would be very cool too (but let us not forget that some bugs need to be fixed as well).

P.S.: I once asked but no answer so far: On some of this Intel CPUs there is this Hyperthreading (or however called). For ol' 6.3 and 6.4 WL single core version, is it - theoretically and/or practically - better to switch that off, so that WL would not use only half of a physical core making it even slower (or does Windows use the full core if required anyway, even if Hyper is on)? Thx!
profile picture

Eugene

#32
Well, then our definitions of "no brainer" are drastically different. It's not a no-brainer to add multi-core support to a mature application of this scale. What may be natural these days with the advent of the Task Parallel Library in .NET 4.0, was nowhere as easy during Wealth-Lab development in 2007 - when .NET 2.0 was pretty much new.

Furthermore, don't forget that 1) multi-core CPUs were not installed in every PC in 2007-2008, and 2) that C# Strategies by themselves offered a 2-10 speed boost compared to slow, interpreted ChartScripts.

20/20 hindsight is always easy. ;)
profile picture

Eugene

#33
P.S. Having multi-core CPU support would be a bless, but GPGPU support might even surpass speed-wise...
profile picture

hlh

#34
I do not argue with the history (and in hindsight I am always good, I think I have that in common with all the financial news which, after the close, can always tell us why it went up, down or sideways today).

I am speaking for the presence, today, where I let freeware run on my PC using cores I even wasnt't aware of that I have, and, as you mentioned, also uses my video card to do ugly fast calculations.

Even my cell phone, once invented for making calls (not sure if it still can do that), has nowadays quad core cpus to multitask between angry birds and facebook app flawlessly. So I am looking forward for WL to catch up ;-)

And still, any opinion from everyone on that HyperThreading question is very much appreciated.

Thanks!
profile picture

HendersonTrader

#35
Regarding hyper-threading technology:
1) Pentium 4 (e.g. one physical core, two logical cores) with Windows/XP & hyper-threading enabled in the bios was slower for W/L 5.x.
2) Current i7 (e,g, four physical cores, eight logical cores) with Windows 7 runs the current W/L release at the same elapsed time whether HT is enabled or disabled in the bios.
The current Intel implementation of hyper-threading comes very close to eliminating the performance hit. Current HT does not dispatch the second logical core on a physical core when W/L is utilizing a high percentage of a physical core.
profile picture

dirkp

#36
Hi guys,

it's been quiet here in this thread. I just wanted to get an update on the multicore support issue. WL6.4 has been recently released. Will the next versions 6.5/6.6 have multicore support?

Thanks!!
profile picture

Eugene

#37
The Wealth-Lab Strategy backtesting has been fairly fast in its present state, and honestly, I do not think that it requires any multicore support.

What appears to be the natural target for enhancement by the parallel execution (i.e. multicore support) is the Optimization tool, especially since 6.4 is based on .NET 4 and can utilize Tasks. Unfortunately, the Optimizer API design in Version 6 is absolutely not suitable for developing 3rd party multicore-enabled optimizers.

If only we were able to get the Optimizer API enhanced, that could move us forward. I've been trying to lobby this point of view for a long time, but unfortunately, I don't have any good news to tell you at this point.
profile picture

dan_rozenberg

#38
Eugene, whom should I contact in order to lobby for multi-core optimization as well?
profile picture

Cone

#39
We (MS123) are your Wealth-Lab Developer contact. Fidelity Wealth-Lab Pro customers should call their reps. Ultimately, changes to the thick client (Wealth-Lab) are business decisions made by Fidelity, which can be demand-driven, but they compete for the same resources allocated by other planning. It's a balancing act.
profile picture

dirkp

#40
Thanks for the update! My question was concerning the optimization tool as this can take up a long time. Anyways hopefully you can lobby this for us Eugene. Good luck!!
profile picture

kribel

#41
Hello Eugene, Cone,

It has been a year since the last update in this thread. What is the current state? Could you please give us an update?

I found this website: http://www2.wealth-lab.com/WL5Wiki/Print.aspx?Page=OpenIssues

As I can see, there is the following topic:
(98348) Optimizer API not sufficient to create parallelized, multi-threaded optimizers. Leverage benefits of multi-core CPUs in Optimizer.

And it is bold? What does it mean? What stand bold for? Does it mean that it is going to be implemented in the next release?

Cheers,
Konstantin

PS:
Where can I find release notes for each WealthLab release?
profile picture

Eugene

#42
QUOTE:
It has been a year since the last update in this thread. What is the current state? Could you please give us an update?

No, not at this time. Wealth-Lab modifications are in the hands of Fidelity, with regard to their business interests and business planning. But we made sure they're aware how high is demand for speedy, parallelized optimizations.

QUOTE:
Where can I find release notes for each WealthLab release?

We don't keep them anymore, if you mean Change History. For what's new in latest build, you can always see it in the User Guide > What's new.

QUOTE:
And it is bold? What does it mean? What stand bold for? Does it mean that it is going to be implemented in the next release?

The list contains high priority bugs with deferred low-priority ones in no particular order. Bold does not mean anything in particular, except that we may consider it to be issues to focus at in the first place.
profile picture

kribel

#43
Hello Eugene,

many thanks for your quick reply! Here a few further questions.

QUOTE:
No, not at this time. Wealth-Lab modifications are in the hands of Fidelity, with regard to their business interests and business planning. But we made sure they're aware how high is demand for speedy, parallelized optimizations.


How can we reach Fidelity to find out more about the implementation status? Would it be helpful if users begging for multi-core support would sign a petition and we would present it to the Fidelity Management?

QUOTE:
For what's new in latest build, you can always see it in the User Guide > What's new.

Can I also see it before I install it?

Many thanks,
Konstantin
profile picture

Eugene

#44
1 - No / No.
2 - Although the complete change log is only available after installation, we always put a brief "what's new" note on the "Home Page" tool in Wealth-Lab Pro/Developer whenever a newer build becomes available.
profile picture

sourkraut

#45
Here's a look at the other side:

Just how much improvement could we expect from multi-core support?

Do you remember the slowest element from computers, the I/O (disks, keyboard, screen)?
CPUs have only a single data bus and address bus. Both buses are used simultaneously, by a one core at a time. Usually they will tie up the busses for several CPU cycles at a time. During that time, the other cores must wait for their turn.

So if a CPU process takes one cycle to complete, but the core requires three cycles to load the data, and after its one cycle process, must wait for three to ten cycles to write the results back to RAM, this one-cycle process might have taken 14 cycles.

Sure, in the mean time, the other cores did something similar. But supposing the same scenario for each core, it might take 50+ cycles to perform four similar operations in parallel, while a single core processor might complete them in only 28 cycles (3 in, 1 inside, 3 out each operation).
This is of course all speculation, and there are several ways to truely improve speed, but do not expect a fourfold increas from a four core processor.

Perhaps if you have a multi-bus (super) computer you could see such improvements. Where each core has its own data and address bus, you only have to worry about your particular RAM location being in use by another CPU. But that would be one heck of an expensive machine.


I too would like to see speed improvements. However, much more, I would like to see the bugs corrected, that make WL a crash-prone Goonybird. True, the editor is much improved, but new sources of crashes are appearing faster than fixes for old ones (Data Manager, Extension Manager).

To me, fixes for the crash problems are much more important, than the promise of questionable results from multi-core support.


Eb
profile picture

kribel

#46
@sourkraut:

As you already said:
QUOTE:
This is of course all speculation...


Speculations remain speculations until they get tested. Therefore I do not see any point looking for excuses to postpone this improvement.

Cheers,
Konstantin
profile picture

Cone

#47
sourkraut doesn't work for Fidelity or MS123, so it wasn't an excuse. Multi-core support isn't even on the table right now. Sorry to disappoint, but WFO is next.
profile picture

kribel

#48
Hi Cone,

I am aware of that.

QUOTE:
Wealth-Lab modifications are in the hands of Fidelity, with regard to their business interests and business planning. But we made sure they're aware how high is demand for speedy, parallelized optimizations.


Therefore I am not loosing my hope. ;)
profile picture

JDardon

#49
So it's been a while now and WFO is already a reality. Eugene, any news on plans for multi core optimization?
profile picture

Cone

#50
There are no current plans.
profile picture

Eugene

#51
However, at MS123 we are investigating into the idea of parallelizing the code of Community Indicators (where possible) and maybe some most-used indicators (like SMA) on which there exist some dependencies in the existing code of the library. If it works out, then existing Strategy code that uses them may run faster in Optimizer if the data is big enough. Currently, we're making no promises if this will be done and no ETA exists.

You might want to call Fidelity and let them know that yet another customer wants faster multi-core optimizations, though.
profile picture

JDardon

#52
Sigh... Ok will call Fidelity and push them a little. I remember seeing some time ago that there was a page with development requests on which customers could issue a limited amount of votes. Does that still exist? That would really help making the point to Fidelity that SO MANY people require multi core capabilities on the optimization engine (which just about every other trading tool in the market already provides).

Is there any existing guideline in how to take advantage of parallelization in one's own indicators (such as the one you will implement later this year)? We should be able to optimize our own code already with such a guideline.

profile picture

Eugene

#53
QUOTE:
Is there any existing guideline in how to take advantage of parallelization in one's own indicators (such as the one you will implement later this year)? We should be able to optimize our own code already with such a guideline.

Indicator speed improvement through parallelizing
profile picture

superticker

#54
QUOTE:
[by festipower:] ... best strategy to accelerate ... [WL] would be a multi-level parallelization:
----1) Execution of the strategy on each symbol of a Data Set in in parallel. (strategy level parallelization):
This parallelization seems a priori a good candidate to be done by using multiple threads, using structures such as semaphores or other structures to synchronize access to data structures shared by all threads (or Data Sets??) . With proper design, I suspect that the level of parallelization could be very high. The solution does not seem excessively complicated:
A)-A thread reads the Data Set.
B)-As it reads each symbol, it launches another thread executing the strategy for that symbol.
C)-When all the threads have finished executing the strategy, another thread is responsible for conducting the "position sizing" tasks (This cannot be easily parallelized).

I totally agree with festipower's comments with the exception of the semaphore part. Record locking is a database problem, not an application problem. If WL must control data access, it should do so through database record locking, not "directly" using semaphores. (The use of semaphores is for the database to do.)

But what makes WL fast today is its simplicity. If you add database record locking to the mix, you'll take a major speed hit because you're tying the hands of the Windows scheduler. Very bad idea in a simulation program, which must command minimal context-switch overhead.

Moreover, most strategies only need write access to the cache entries they create for a given symbol, not across all symbols. So if the WL DataSeries cache incorporated a hash key to include the symbol name (which I think it already does), that would be protection enough from write conflicts with other threads/tasks crunching different symbols for that particular strategy. The exception would be if a strategy was writing to an external symbol (or cache entry) where threaded tasks could have cache write conflicts.

QUOTE:
... [There] should be a way (perhaps an override property at class WealthScript) to enable or disable the parallelization of strategy, as there may be cases where the strategy should be executed ... sequentially.

And that's the point. For those few strategies that require write access to external symbols, one has to disallow parallelism altogether. This way WL preserves its simplicity and avoids database record locking (or semaphores). Honestly, doing any kind of heavy database activity in a simulation program in a bad idea because of overhead.

QUOTE:
----2) Execution of the backtests composing an optimization in parallel (optimization level parallelization):
This parallelization may be implemented using Background Workers (or .NET 4 tasks); very easily in my opinion.

The best type of parallelization would be ... the parallelization of type 1 (strategy level parallelization), especially for backtests in Data Sets with a large number of symbols. In addition, this parallelization would also directly benefit the optimizers. Type 2 parallelization would be good especially for optimization of strategies using single symbol or Data Sets with few symbols.

Agreed. Optimizes may need to be optionally tweaked to take full advantage of this.

-----
There is a dark side to multi-threading to achieve more multi-core utilization. Does the processor chip have enough cache memory to keep all the processor cores stoked with data? That is, is there enough cache to fit this multi-core problem on chip? I think with daily bars, the answer is "yes"; otherwise, the answer is "no". So WL needs a method to disable some of the cores all core parallelism when the resolution of "scale" goes beyond daily bars to minimize simulation time. This adjustment needs to be done while the WL parameter optimizer is running by monitoring Intel-chip cache-miss rates. Some assembly is required (because C# code won't work here).

I don't believe the Windows OS supports getting into the supervisory mode of the processor chip. That's for device drivers to do. So one needs to write a read-only (for monitoring only) Windows device driver. WL can then use that device driver as a window into the processor's cache miss rate. If the miss rate gets too high, I would disable the parallelism altogether for that bar scale and range.