I like using MySQL to power my backtests. It’s just the data store that I’m fastest with. So, I naturally wanted to load data that we bought from Quandl.com into MySQL for analysis. The problem is that the file is just too big to input (via
LOAD DATA INFILE) directly, and the solution is breaking it up into chunks. It’s easy using GNU split – and here, 100,000 rows is a good size.
You’ll also notice how I designed the table with indexes:
CREATE TABLE `adj_eod` ( `id` int(11) NOT NULL AUTO_INCREMENT, `symbol` varchar(10) DEFAULT NULL, `date` date DEFAULT NULL, `open` decimal(35,20) DEFAULT NULL, `high` decimal(35,20) DEFAULT NULL, `low` decimal(35,20) DEFAULT NULL, `close` decimal(35,20) DEFAULT NULL, `volume` int(11) DEFAULT NULL, PRIMARY KEY (`id`), UNIQUE KEY `unique_index` (`symbol`,`date`), KEY `symbol` (`symbol`), KEY `date` (`date`) )
I generally like to look up information by symbol and date, so I added keys there. The unique index key is descriptive and also valuable for decreasing lookup times (though there is an insertion penalty).
Two more notes: sometimes I was only interested in using data from a certain time period, so I filtered it with the awk utility. And I wasn’t interested in certain rows in the Quandl CSV, so I told MySQL’s
LOAD DATA INFILE to discard them by marking them
Anyway, here’s the script.
I welcome comments and suggestions for improvement