Hard drive size requirements for backtesting on large data sets
Author: torpedo333
Creation Date: 1/14/2011 9:50 PM
profile picture

torpedo333

#1
Does anyone have any suggestions as to how large a hard drive I should buy to run simulations on large data sets? I plan on backtesting systems on 1000 symbols going back 10 years each. Weekly, Daily and Intra-day data? (Cone, are you out there?)

Torpedo
profile picture

Eugene

#2
An ASCII file containing 10 years of 1-minute DTOHLCV data for a futures contract trading 24/7 (almost) would take up to 120-170 Megabytes. (Consequently, the data for a U.S. stock would take much less than that 'cause trading session only lasts 6 1/2 hours.) Space taken by Daily data is omissible.

Multiply that number by your symbol count (1000). Even though 1,000 futures trading 24/7 each don't exist in the whole world, it would be 120-170 Gigabytes. For U.S. stocks, the amount is divided by at least 3 and you have only 40-60 Gb of disk space to secure. Add 20-30 Gb for some operating system, apps, figure out how much the swap/hibernation files would take depending on installed RAM, and you have a good estimate.

Long story short: even the least capacious hard drive in stock i.e. 250-320 Gbytes would be OK for this task.
profile picture

Cone

#3
HDD isn't a problem here. What you should be concerned about is memory required. For large intraday simulations, you'll need a 64-bit O.S. and an absolute minimum of 4GB, preferrably 8GB or more.

As a rule of thumb, with 2GB you can run a 1-min intraday backtest on 100 symbols over 10 months of data (assuming 390 bars per day). With 4GB, you can probably get up to 2 years.
profile picture

torpedo333

#4
Thanks for the comments Cone and Eugene.

I'm planning on buying a PC with a 160GB PCI-Express Solid State Drive. And a backup drive of 1 TB DATA hard drive. What do you guys know about Solid State Drives vs. regular Hard Drives?

Thanks.
Torpedo

P.S. The PC will have 12GB of RAM
profile picture

Eugene

#5
Re: SSD / HDD differences, try Google. It's the best source of information out there. As Cone said, HDD performance is almost never the bottleneck for typical WL tasks, but memory and CPU have much larger impact. The more the better.