Asynchronous Financial Data processing using F#

Hi everyone,

I was looking at a book to find some algorithm to detect clusters within financial data. I managed to find a decent algorithm for that matter, but I then wanted to test it on some real data.

Hence, I had to get the data from somewhere and I chose to use Yahoo! Finance. The thing is, I wanted to do it in an automatized way in order to be able to re-use the module for further projects.

I remembered Luca Bolognese’s presentation of F# a few years back (certainly one of the best presentations I’ve ever seen) and I thought it was a good time to give it a try.

The idea is quite simple: you can import data from Yahoo! Finance by parsing the CSV files the website produces. These files are actually generated automatically, which means that you can access them by querying the right URL with the right parameters.

The following module shows how to generate the right query to get the CSV file from Yahoo! Finance:

module QueryModule
open System
let dateToList (d:DateTime) = [d.Month;d.Day;d.Year]

let parameters =['a';'b';'c';'d';'e';'f']

let parametersToString (d1:DateTime) (d2:DateTime)=
    let datesList = (dateToList d1) @ (dateToList d2)
    let rec innerFunc (dList: int list) pars =
        match pars with
        | [] -> ""
        | x::[] -> x.ToString() + "=" + dList.Head.ToString()
        | x::y -> x.ToString() + "=" + dList.Head.ToString() + "&" + (innerFunc dList.Tail y)

    innerFunc datesList parameters

let getFileUrl (ticker:string) start stop =
    "http://ichart.finance.yahoo.com/table.csv?s="+ticker+"&"+(parametersToString start stop)+"&ignore=.csv"

The idea is now to write a module to download this data. This would be pretty straightforward and I wouldn’t make you waste your time by reading further code right here.

What I’m trying to do here is to download these files asynchronously. Doing this using C# is possible but requires some heavy coding and the result will most likely contain some bugs. F# (and functional programming languages in general) allows you to do it much more easily.

Let’s see how to implement an asynchronous function in F# then:

module DownloadModule
open QueryModule
open Microsoft.FSharp.Control.WebExtensions
open System
open System.Net
open System.IO

let parseData (rawData:string) =
    rawData.Split('\n')
    |> Array.toList
    |> List.tail
    |> List.map (fun x -> x.Split(','))
    |> List.filter (fun x -> x.Length = 7)
    |> List.map (fun x -> (Convert.ToDateTime(x.[0]),float x.[6]))

let getCSV ticker dStart dEnd =
    async   {
            let query = getFileUrl ticker dStart dEnd
            let req = WebRequest.Create(query)
            use! resp = req.AsyncGetResponse()
            use stream= resp.GetResponseStream()
            use reader = new StreamReader(stream)
            let content = reader.ReadToEnd()
            let ts = parseData content
            return ts
            }

The whole trick in this code resides in the “async” block and the “use!” keyword. Basically, “use!” tells to F# not to wait for the result, that is, to proceed asynchronously if possible.

You now can run this function in parallel to download multiple ticker as follows:

let testPrices=
    ["MSFT";"YHOO"]
    |>List.map (fun x -> getCSV x (DateTime.Parse("01.01.2000")) (DateTime.Parse("01.01.2010")))
    |> Async.Parallel
    |> Async.RunSynchronously;;

This works fine and it’s much faster than it would have been if the list was processed sequentially.

I pushed the example a little further by thinking: “What if I want to do some operation on the time series now?”.

Well, I could wait until the parallel execution is done and the run the following function to get the returns synchronously:

module AnalysisModule
open System

let getReturns (prices:(DateTime *float)list) =
    [for i in 1..(prices.Length-1) -> i]
    |> List.map (fun i ->(fst (List.nth prices i), (snd (List.nth prices i))/(snd (List.nth prices (i-1) )) - 1.0))

That wouldn’t be optimal; once the download of MSFT is done, I don’t need to wait until the download of YHOO is done as well before starting to process the returns…

So, how can I do this easily, without modifying any of the previous function I wrote.

let testReturns =
    ["MSFT";"YHOO"]
    |> List.map (fun ticker -> async {
                        let! prices = getCSV ticker (DateTime.Parse("01.01.2000")) (DateTime.Parse("01.01.2010"))
                        return getReturns prices
                   })
    |>Async.Parallel
    |>Async.RunSynchronously;;

This way, everything is computed in parallel and I have a really great performance!

Hope you enjoyed this little demo!

See you next time!

Functional approach to portfolio modeling

Good evening everybody, I’ve been paying attention to portfolio modelling for the past few months. When you tackle such problem, you first try to think about how you could represent a portfolio as an object so that you can dive into your C#/C++/Java code so that you can start making money ASAP. However, you’ll soon find yourself cornered in numerous problems, especially when you want to backtest different allocation strategies.

The object-oriented approach

Usually, when people model a portfolio, they will see it as a mapping between assets and weights associated to a date. There is however a misinterpretation of what a portfolio is. Indeed, what the common programmer describer above in his model is in fact a snapshot of the portfolio stat at some time t. If you were to make some changes in between two dates (all the subsequent instances of the portfolio will then be erroneous and the programmer would have to recompute them all to get the right simulation. Let’s take an example. Say we have a portfolio going through time t=1,2,3. We assume the stock has two assets, Microsoft (MSFT) and Yahoo (YHOO), and that the allocation strategy is to have an equally weighted portfolio (weights={0.5,0.5}). Here’s how the implementation would look like (in C#):

class Portfolio
{
     public DateTime date;
     public Dictionary<Asset,double> allocation;

     public Portfolio() {}

     public void Optimize()
     {
          int n=allocation.Count;
          foreach (var pair in allocation)
          {
                pair.Value=1/n;
          }
      }
}

class History
{
    public Dictionary<DateTime,Portfolio> history;

    public Add(DateTime t, Portfolio p)
    {
         history.Add(t,p);
    }
}

Now assume you want to add another stock (STO) to the portfolio at time 2, the previous implementation needs to be extended as follows:

class History
{
    public Dictionary<DateTime,Portfolio> history;

    public Add(DateTime t, Portfolio p)
    {
         history.Add(t,p);
         if (history.Any(x=>x.Key>=t))
         {
              /* Compare the new portfolio composition with the subsequent states
               * Take the necessary operation to adjust the portfolio.
               * OUCH!!!! This is complicated.
              */
         }
    }
}

As you can see in the comments I added, backward changes requires recomputing the portfolio at time 2 and 3. This is computationally intensive and the kind of function you really do not fancy writing. I’m not even discussing the probability that some bug will exist or the change that you would have to add if you were to make more complicated backward operations. Furthermore, assuming you want to see how the portfolio behave before a change, all the information about the previous simulation would be lost. This is because the class actually stores the stateof the portfolio, not really the portfolio itself. “Thanks for the heads-up Einstein! You got anything better to do?” Well, as a matter of fact, yes.

The functional approach

I would like to introduce a different way of representing a portfolio; a way which would be especially meaningful in a functional environment (F#, Scala). First of all, I would like to define a portfolio snapshot as a list of tuples of an Asset and a double representing each asset and its weight. To me, a portfolio is a strategy more than an object. In terms of allocation, the strategy outputs portfolio snapshots with weights, and these weights can actually generate buy/sell order to adjust a “real” portfolio (a basket of real assets) position. But the whole point here is that the portfolio is in fact just as set of operations. In my opinion, the correct way of representing an operation in a programming language is a function. In our case, an operation would be a function. This includes adding an asset to the portfolio, optimizing a portfolio and so on. The portfolio is hence a list of tuples of type DateTime*Operation (the date being the time at which the operation should occur). Let’s just define some formal definitions to these concepts (in F#):

type Action<'a>='a->'a

type Change<'a> = System.DateTime * Action<'a>;

type History<'a>; = Change<'a> list

An action is a function taking some type as an input and return a modified version of this object (actually, a new instance of the object with the modification included). A changeis an action happening at a certain time. Finally, a history is a list of changes. Simple. Now, how do we apply this to portfolio modeling? First, some more definitions:

type Weight=float

type Asset = string

type AssetAlloc=Weight * Asset

type PortfolioAlloc= AssetAlloc list

type PortfolioAction=Action<PortfolioAlloc>

type PortfolioChange = Change<PortfolioAlloc>

type Portfolio = History<PortfolioAlloc>

Thanks to the F# syntax, the code is pretty much self-explanatory. Now, let’s define a simple portfolio action consisting in adding an asset to the portfolio:

let addAsset (ass:Asset) (w:Weight) (pFolioAlloc: PortfolioAlloc) : PortfolioAlloc =
    match List.tryFind (fun p -> snd p = ass) pFolioAlloc with
    |Some _ -> (w,ass) :: pFolioAlloc |> List.filter (fun pa -> snd pa <> ass)
    |None -> (w,ass) :: pFolioAlloc

For those of you not familiar with functional programming, this might look complicated, but if you look a bit into the language (particularly Pattern Matching), you’ll see it’s actually quite trivial. Let’s continue with the optimization of the portfolio:

let equWeightPortfolio (pFAlloc:PortfolioAlloc) : PortfolioAlloc =
    let w:Weight = 1.0/(float pFAlloc.Length)
    pFAlloc |> List.map (fun alloc -> (w, snd alloc))

Trivial. Finally, we need a function that will evaluate the portfolio. This requires the application in succession of all the changes to an initial portfolio allocation (most of the time, an empty portfolio, initially). This kind of operation is well-known in functional programming, it simply consists in foldinga list:

let getPortfolioAllocFromInit (pFolio:Portfolio) (t : System.DateTime) (init:PortfolioAlloc) =
    pFolio |> List.filter (fun pc -> fst pc <= t)
    |> List.sortBy (fun pc -> fst pc)
    |> List.map (fun pc -> snd pc)
    |> List.fold (fun alloc action -> action(alloc)) init

let getPortfolioAlloc (pFolio:Portfolio) (t : System.DateTime) = getPortfolioAllocFromInit pFolio t []

The first function implements the general logic: first sorting the operations and then applying them sequentially. The second function just modifies the first one by giving an initial empty portfolio. The application of this model is as follows:

let addSPAction : PortfolioAction = addAsset "S&P500" 0.0;;
let addMSAction : PortfolioAction = addAsset "MSFT" 0.0;;
let myPortfolio:Portfolio = [(System.DateTime.Parse("01.01.2011"),addSPAction);
                            (System.DateTime.Parse("31.01.2011"),equWeightPortfolio);
                            (System.DateTime.Parse("01.02.2011"),addMSAction);
                            (System.DateTime.Parse("28.02.2011"),equWeightPortfolio);
                            ];;
let myPortfolioAlloc t = getPortfolioAlloc myPortfolio t;

let endAlloc = myPortfolioAlloc System.DateTime.Now;;

let janAlloc = myPortfolioAlloc(System.DateTime.Parse("01.02.2011"));;

which outputs:

val endAlloc : PortfolioAlloc = [(0.5, "MSFT"); (0.5, "S&P500")]
val janAlloc : PortfolioAlloc = [(0.0, "MSFT"); (1.0, "S&P500")]

When hence see how trivial it is to compute the state of the portfolio at different times. With this representation, altering a portfolio at time t=2 means actually adding a function to a list, but does not change the subsequent operations. The state of the portfolio (the snapshot) is computed on demand. Let’s try doing so:

let addYHOOAction:PortfolioAction = addAsset "YHOO" 0.0
let myPortfolio2:Portfolio= (System.DateTime.Parse("30.01.2011"),addYHOOAction)::myPortfolio
let myPortfolioAlloc2 t = getPortfolioAlloc myPortfolio2 t
let endAlloc2 = myPortfolioAlloc2 System.DateTime.Now
let janAlloc2 = myPortfolioAlloc2(System.DateTime.Parse("01.02.2011"));;

which outputs:

val endAlloc2 : PortfolioAlloc =
  [(0.3333333333, "MSFT"); (0.3333333333, "YHOO"); (0.3333333333, "S&P500")]
val janAlloc2 : PortfolioAlloc =
  [(0.0, "MSFT"); (0.5, "YHOO"); (0.5, "S&P500")]

As you can see, no rocket science to add backward operations! Note that we could have done so for any portfolio action… The model could be improved of course, but the idea is here. Keeping track of the old simulation would just mean keeping the date of creation of the change. I hope you enjoyed the ride, please feel free to comment! See you next time!