Hi everyone,
I was looking at a book to find some algorithm to detect clusters within financial data. I managed to find a decent algorithm for that matter, but I then wanted to test it on some real data.
Hence, I had to get the data from somewhere and I chose to use Yahoo! Finance. The thing is, I wanted to do it in an automatized way in order to be able to re-use the module for further projects.
I remembered Luca Bolognese’s presentation of F# a few years back (certainly one of the best presentations I’ve ever seen) and I thought it was a good time to give it a try.
The idea is quite simple: you can import data from Yahoo! Finance by parsing the CSV files the website produces. These files are actually generated automatically, which means that you can access them by querying the right URL with the right parameters.
The following module shows how to generate the right query to get the CSV file from Yahoo! Finance:
module QueryModule open System let dateToList (d:DateTime) = [d.Month;d.Day;d.Year] let parameters =['a';'b';'c';'d';'e';'f'] let parametersToString (d1:DateTime) (d2:DateTime)= let datesList = (dateToList d1) @ (dateToList d2) let rec innerFunc (dList: int list) pars = match pars with | [] -> "" | x::[] -> x.ToString() + "=" + dList.Head.ToString() | x::y -> x.ToString() + "=" + dList.Head.ToString() + "&" + (innerFunc dList.Tail y) innerFunc datesList parameters let getFileUrl (ticker:string) start stop = "http://ichart.finance.yahoo.com/table.csv?s="+ticker+"&"+(parametersToString start stop)+"&ignore=.csv"
The idea is now to write a module to download this data. This would be pretty straightforward and I wouldn’t make you waste your time by reading further code right here.
What I’m trying to do here is to download these files asynchronously. Doing this using C# is possible but requires some heavy coding and the result will most likely contain some bugs. F# (and functional programming languages in general) allows you to do it much more easily.
Let’s see how to implement an asynchronous function in F# then:
module DownloadModule open QueryModule open Microsoft.FSharp.Control.WebExtensions open System open System.Net open System.IO let parseData (rawData:string) = rawData.Split('\n') |> Array.toList |> List.tail |> List.map (fun x -> x.Split(',')) |> List.filter (fun x -> x.Length = 7) |> List.map (fun x -> (Convert.ToDateTime(x.[0]),float x.[6])) let getCSV ticker dStart dEnd = async { let query = getFileUrl ticker dStart dEnd let req = WebRequest.Create(query) use! resp = req.AsyncGetResponse() use stream= resp.GetResponseStream() use reader = new StreamReader(stream) let content = reader.ReadToEnd() let ts = parseData content return ts }
The whole trick in this code resides in the “async” block and the “use!” keyword. Basically, “use!” tells to F# not to wait for the result, that is, to proceed asynchronously if possible.
You now can run this function in parallel to download multiple ticker as follows:
let testPrices= ["MSFT";"YHOO"] |>List.map (fun x -> getCSV x (DateTime.Parse("01.01.2000")) (DateTime.Parse("01.01.2010"))) |> Async.Parallel |> Async.RunSynchronously;;
This works fine and it’s much faster than it would have been if the list was processed sequentially.
I pushed the example a little further by thinking: “What if I want to do some operation on the time series now?”.
Well, I could wait until the parallel execution is done and the run the following function to get the returns synchronously:
module AnalysisModule open System let getReturns (prices:(DateTime *float)list) = [for i in 1..(prices.Length-1) -> i] |> List.map (fun i ->(fst (List.nth prices i), (snd (List.nth prices i))/(snd (List.nth prices (i-1) )) - 1.0))
That wouldn’t be optimal; once the download of MSFT is done, I don’t need to wait until the download of YHOO is done as well before starting to process the returns…
So, how can I do this easily, without modifying any of the previous function I wrote.
let testReturns = ["MSFT";"YHOO"] |> List.map (fun ticker -> async { let! prices = getCSV ticker (DateTime.Parse("01.01.2000")) (DateTime.Parse("01.01.2010")) return getReturns prices }) |>Async.Parallel |>Async.RunSynchronously;;
This way, everything is computed in parallel and I have a really great performance!
Hope you enjoyed this little demo!
See you next time!
Leave a Reply