Tag Archives: functional-programming

C# Parallel Processing for Stock Paths simulation

Good evening everyone!

2013 hasn’t started very well for me as I’ve been sick for the past week and completely unable to work on anything even on a bit of CFA. Thankfully, I’m now starting to get better and I can again put a few words – or pieces of code – together.

I decided to write this quick post because my fiancee’s sister Elsa forwarded me a few questions she had about the material of an exam she had this morning in Embedded Systems. As you might guess, some of the material was dedicated to handling concurrency issues which made me realize that I never discussed parallel programming in C#.

Besides, I just bought a brand new laptop as my previous one was really looking like it was going to die any minute and, without a computer to play around with, I would really be like a professional hockey player without a stick  (I hope you like the metaphor). As a matter of fact, this new computer (an Asus G55V) is quite powerful: it has an Intel i7 2.4 Ghz processor with 12 GB of RAM. With this kind of setup, I should be able to run 7 computations at a time (using each of the 7 cores of the Intel i7). Cool huh?

Therefore, I decided to create a little, very simple example of parallel processing in C# applied to quantitative finance. The classic case of parallel processing in this field is usually referring to so-called Monte-Carlo simulations, where you simulate a very large amount of different paths for the price of a stock price according to a given model.

I will split this post in two distinct parts: the model and the simulation. For those of you who are not interested about the financial aspects of this post, feel free to just jump to Part II.

Part I : the model

Definition

I chose the model to be the Geometric Brownian Motion, which is used for the well-known Black-Scholes famework. The model states the dynamics of a price process $S_t$ as follows:

$$ dS_t = \mu S_t dt + \sigma S_t dW_t $$

Note that the return are intrinsically modeled like this:

$$\frac{dS_t}{S_t} = \mu dt + \sigma dW_t \sim \mathcal{N}(\mu ~ dt, \sigma^2 dt)$$

To simulate this price process using a programming language, I need to discretize the equation:

$$ \Delta S_t = S_{t + \Delta t} – S_t = \mu S_t \Delta t + \sigma S_t W_{\Delta t} $$

For simplicity’s sake, I’ll assume my model to be looking at daily prices and I will simulate prices every day so $\Delta t = 1$:

$$ \Delta S_t = S_{t + 1} – S_t = \mu S_t + \sigma S_t Z_t \quad \text{with} \quad Z_t \sim \mathcal{N}(0,1) $$

The implementation

The implementation is very simple: I created a very simple StockPath class which is supposed to simulate a the path of the prices of an arbitrary stock. I gave it the following private methods:

/// <summary>
/// Computes the change in price (delta S) between two points in the paths.
/// </summary>
/// <param name="currentStockPrice">The price of the stack at the first (known) point.</param>
/// <param name="randomNoise">The random noise used to estimate the change.</param>
/// <returns>The absolute change in price between two points according to the model.</returns>
/// <remarks>The underlying model assumes a geometric brownian motion.</remarks>
private double _computeStockPriceChange(double currentStockPrice, double randomNoise)
{
    return currentStockPrice * _mean * _timeInterval + _standardDeviation * currentStockPrice * randomNoise;
}

/// <summary>
/// Computes the next stock price given a <paramref name="currentStockPrice"/>.
/// </summary>
/// <param name="currentStockPrice">The price of the stack at the first (known) point.</param>
/// <param name="randomNoise">The random noise used to estimate the change.</param>
/// <returns>The next stock price according to the model. This value will never be less than 0.0.</returns>
/// <remarks>
/// The model makes sure that a price cannot actually go below 0.0.
/// The underlying model assumes a geometric brownian motion.
private double _computeNextStockPrice(double currentStockPrice, double randomNoise)
{
    return Math.Max(currentStockPrice + _computeStockPriceChange(currentStockPrice, randomNoise), 0.0);
}

Note that I also made sure that the stock price $S_t$ would never go below 0.0 (this is not the case in the mathematical model I stated above).

The computation of the path is performed using Math.Net library to generate the normal variables, and is done in the constructor:

/// <summary>
/// Initializes a stock path.
/// </summary>
/// <param name="nbrSteps">The number of steps in the path.</param>
/// <param name="mean">The mean of the returns.</param>
/// <param name="std">The standard deviation of the returns.</param>
/// <param name="timeInterval">The time between two evaluation points.</param>
/// <param name="initialPoint">The starting point of a path.</param>
public StockPath(int nbrSteps, double mean, double std, double timeInterval, double initialPoint)
{
	//Assigns internal setup variables
	_timeInterval = timeInterval;
	_mean = mean;
	_standardDeviation = std;

	//Using Math.Net library to compute the required random noise.
	Normal normal = Normal.WithMeanStdDev(0, 1);
	IEnumerable<double> returns = normal.Samples().Take(nbrSteps);

	//Explicit implementation of aggregation mechanism.
	_lastPrice = returns.Aggregate(initialPoint, _computeNextStockPrice);
}

Note that for simplicity and to diminish the computation time, I do not save each step of the price computation: I save only the last price.

Part II: simulation

For those of you who weren’t interested in the model specifics, this is where the main topic of this posts starts. We previously defined how we compute each path. The main problem with path computation is that you cannot parallelize it because you need to know $S_t$ to compute $S_{t+1}$. However, you can compute different paths in parallel, as they are completely independent from each other. To demonstrate the advantage of using parallel computations, I will want to simulate a very large amount of paths (100’000 of them) each of which will have 10 years of daily data (2520 points).

Why is it useful to compute the paths in parallel?

Let’s take a simple real-life example: cooking. Assume you invited a group of friends for dinner. You decided to have smoked salmon and toasts for appetizers and a pizza for the main. Now, what you would most probably do is prepare the pizza, put it in the oven, and while it is cooking, you would make the toasts, and while the toasts are in the toaster, you would put the salmon in a plate and so on. Once everything is done, you put everything in the table. In classic programming, you just cannot do that.

Haven’t we already been doing exactly that forever?

No, but everything was made to make you believe so. To keep it simple, a processor (CPU) can only do a single thing at a time: that is, if you were a processor, you would be stuck in front of the oven while the pizza is cooking. Some of you might argue that old computers (who had only a single CPU) managed to do several things at a time: you could still move the mouse or open a text editor while running an installation. That’s not really true! Actually, what the computer was doing was to alternate between different tasks. To come back to the kitchen example, if you were a computer, you could stop the oven, put the first glass on the table, switch back the oven, wait a little, stop the oven again, put the second glass on the table, restart the oven, wait again a little, and so on… This could look like you’re doing several things at the same time, but in fact, you’re splitting your time between several tasks. In short, a CPU performs computations sequentially. The problem with this “fake” parallelization is that although it looks like everything happens at the same time, you actually waste a lot of time because you can’t do many things at the same time.

What about today?

Modern computers are equipped with multiple cores. For example, my computer has an Intel i7 processor: it has 7 different cores that can process computations on their own. This means that you can compute 7 different things in parallel. They work as a team; for the kitchen example the team in the kitchen. This does not mean that I can compute anything in parallel, I need the tasks to be independent (at least to some extent). Now, this doesn’t mean that I can really compute things 7 times quicker. The reason for that is because the computer needs to perform some operations to make synchronize the results together once they are done. In the kitchen example, the team need to make sure that they do not do things recklessly; they need the plates to be ready at a given time; they need to organize themselves. Overall, there is an organisational overhead, but it is offset by the ability of being able to perform several things at the same time.

Is it difficult to perform parallel computations in C#?

There are several ways to perform parallel computing in C#. Today, I would like to discuss the simplest one: PLINQ. In my functional programming post,  I already showed how to use the classical LINQ framework to handle collections functionally. What is incredible about PLINQ is that it automatically handles the parallel aspect of collection operations after the function AsParallel() is called. So, first, I define a creator function which creates a new stock path for a given index (the index is actually useless in our case, but enables us to treat the computations functionally):

Func<int,StockPath> creator = x => new StockPath(nbrPoints, mean, stdDev);

What I want to do is to make sure that this creator is called in parallel several times. Here is now the function I implemented to run a simulation of several paths of a given amounts of points:

/// <summary>
/// Runs a simulation and prints results on the console.
/// </summary>
/// <param name="nbrPoints">The number of points for each path to be generated.</param>
/// <param name="nbrPaths">The number of paths to be generated.</param>
/// <param name="simulationName">The name of the simulation</param>
/// <param name="creator">The function used to create the <seealso cref="StockPath"/> from a given index.</param>
/// <param name="mode">The <see cref="Program.ExecutionMode"/></param>
public static void RunSimulation(int nbrPoints, int nbrPaths, string simulationName, Func<int,StockPath> creator, ExecutionMode mode)
{
	Stopwatch stopWatch = new Stopwatch();
	StockPath[] paths = new StockPath[nbrPaths];
	IEnumerable<int> indices = Enumerable.Range(0, nbrPaths - 1);
	Console.WriteLine("Starting " + simulationName + " simulation.");
	stopWatch.Start();

	switch (mode)
	{
		case ExecutionMode.CLASSIC:
			paths = indices.Select(creator).ToArray();
			break;
		case ExecutionMode.PARALLEL:
			paths = indices.AsParallel().Select(creator).ToArray();
			break;
		default:
			throw new ArgumentException("Unknown execution mode", "mode");
	}

	stopWatch.Stop();
	Console.WriteLine("End of " + simulationName + " simulation.");
	var lastPrices = paths.Select(x => x.LastPrice);
	Console.WriteLine("Min price: " + lastPrices.Min().ToString("N2"));
	Console.WriteLine("Max price: " + lastPrices.Max().ToString("N2"));
	Console.WriteLine("Computation time: " + stopWatch.Elapsed.TotalSeconds.ToString("N2") + " sec");
}

The two most important lines are the number #20 and #23. The rest of the function basically computes the executions time and prints the results on the console. Running the simulation in both modes (sequential and parallel) gives me the following results:

Execution mode Computation time
Sequential 32.33 sec
Parallel 06.29 sec

As you can see, running the paths in parallel improved greatly the computation time of the simulation; it’s about 5 times quicker. As expected, we did not perform 7 times better (the number of available cores) because we had to do the extra background work of synchronizing the results at the end of the computation.

Summary

Given the extreme simplicity (only adding the AsParallel() call) of the code to get into parallel mode, using PLINQ seems to be a very good and very accessible solution for simple cases such as this one.

Of course the performance will vary with the processor being used but in general, I would recommend using this PLINQ solution. There are a few important things to think about when using parallelism: think about whether it is really useful:

  • Are the computations really independent?
  • Are there enough computations to compensate for the extra work required to synchronize the results?

The whole project is available on my GitHub account here: https://github.com/SRKX/ParallelStockPaths.

I hope you enjoyed the demo,

See you next time,

Jeremie

Why should we use functional programming: a simple example

Good evening,

I’m just taking the time to add more content on functional programming. When I talk about this paradigm with programmers who haven’t had the time to work with it, they always tell me that the code is difficult to understand and that it would be difficult to maintain. I would like to show a simple example which shows that this statement is wrong.

Because I would agree that if you work in an environment where nobody works in F# it is not very likely that you will switch programming technology, I will use C# for my example. Indeed, C# supports some functional programming features as discussed in this book:

Functional Programming in C#: Classic Programming Techniques for Modern Projects (Wrox Programmer to Programmer)

I will take a simple example to demonstrate why functional programming can actually be much simpler to read and help developers avoid making mistakes. Let’s assume you have a list of integers and that you want to get the sum of the squares of the elements of the list.

In imperative programming (the paradigm used normally in C#) you would proceed as follows:

int[] myList = new int[] {1,2,3};

double res=0;
for (int i=0; i<3; i++)
{
   res+=Math.Pow(myList[i],2);
}

return res;

As you can see, it takes about 6 lines (with clear formatting) because you have to setup a result variable, and the you have to make sure that this variable is implemented by the square of each number. You also need to see that you don’t make any mistake in the definition of the loop as it depends on the length of the list.

Now, let’s see how you can do exactly the same thing using functional programming:

int[] myList = new int[] {1,2,3};

return myList.Select(x=>Math.Pow(x,2)).Sum();

As a matter of fact, you can see that the “algorithm” is much shorter there is no problem considering the length of the list or setting up a return variable. I also believe that the code is easier to read, because you do not have to tell the program everything it has to do, you just explain the logic and the actual implementation happens in the background.

That’s it for today,

See you next time!

Asynchronous Financial Data processing using F#

Hi everyone,

I was looking at a book to find some algorithm to detect clusters within financial data. I managed to find a decent algorithm for that matter, but I then wanted to test it on some real data.

Hence, I had to get the data from somewhere and I chose to use Yahoo! Finance. The thing is, I wanted to do it in an automatized way in order to be able to re-use the module for further projects.

I remembered Luca Bolognese’s presentation of F# a few years back (certainly one of the best presentations I’ve ever seen) and I thought it was a good time to give it a try.

The idea is quite simple: you can import data from Yahoo! Finance by parsing the CSV files the website produces. These files are actually generated automatically, which means that you can access them by querying the right URL with the right parameters.

The following module shows how to generate the right query to get the CSV file from Yahoo! Finance:

module QueryModule
open System
let dateToList (d:DateTime) = [d.Month;d.Day;d.Year]

let parameters =['a';'b';'c';'d';'e';'f']

let parametersToString (d1:DateTime) (d2:DateTime)=
    let datesList = (dateToList d1) @ (dateToList d2)
    let rec innerFunc (dList: int list) pars =
        match pars with
        | [] -> ""
        | x::[] -> x.ToString() + "=" + dList.Head.ToString()
        | x::y -> x.ToString() + "=" + dList.Head.ToString() + "&" + (innerFunc dList.Tail y)

    innerFunc datesList parameters

let getFileUrl (ticker:string) start stop =
    "http://ichart.finance.yahoo.com/table.csv?s="+ticker+"&"+(parametersToString start stop)+"&ignore=.csv"

The idea is now to write a module to download this data. This would be pretty straightforward and I wouldn’t make you waste your time by reading further code right here.

What I’m trying to do here is to download these files asynchronously. Doing this using C# is possible but requires some heavy coding and the result will most likely contain some bugs. F# (and functional programming languages in general) allows you to do it much more easily.

Let’s see how to implement an asynchronous function in F# then:

module DownloadModule
open QueryModule
open Microsoft.FSharp.Control.WebExtensions
open System
open System.Net
open System.IO

let parseData (rawData:string) =
    rawData.Split('\n')
    |> Array.toList
    |> List.tail
    |> List.map (fun x -> x.Split(','))
    |> List.filter (fun x -> x.Length = 7)
    |> List.map (fun x -> (Convert.ToDateTime(x.[0]),float x.[6]))

let getCSV ticker dStart dEnd =
    async   {
            let query = getFileUrl ticker dStart dEnd
            let req = WebRequest.Create(query)
            use! resp = req.AsyncGetResponse()
            use stream= resp.GetResponseStream()
            use reader = new StreamReader(stream)
            let content = reader.ReadToEnd()
            let ts = parseData content
            return ts
            }

The whole trick in this code resides in the “async” block and the “use!” keyword. Basically, “use!” tells to F# not to wait for the result, that is, to proceed asynchronously if possible.

You now can run this function in parallel to download multiple ticker as follows:

let testPrices=
    ["MSFT";"YHOO"]
    |>List.map (fun x -> getCSV x (DateTime.Parse("01.01.2000")) (DateTime.Parse("01.01.2010")))
    |> Async.Parallel
    |> Async.RunSynchronously;;

This works fine and it’s much faster than it would have been if the list was processed sequentially.

I pushed the example a little further by thinking: “What if I want to do some operation on the time series now?”.

Well, I could wait until the parallel execution is done and the run the following function to get the returns synchronously:

module AnalysisModule
open System

let getReturns (prices:(DateTime *float)list) =
    [for i in 1..(prices.Length-1) -> i]
    |> List.map (fun i ->(fst (List.nth prices i), (snd (List.nth prices i))/(snd (List.nth prices (i-1) )) - 1.0))

That wouldn’t be optimal; once the download of MSFT is done, I don’t need to wait until the download of YHOO is done as well before starting to process the returns…

So, how can I do this easily, without modifying any of the previous function I wrote.

let testReturns =
    ["MSFT";"YHOO"]
    |> List.map (fun ticker -> async {
                        let! prices = getCSV ticker (DateTime.Parse("01.01.2000")) (DateTime.Parse("01.01.2010"))
                        return getReturns prices
                   })
    |>Async.Parallel
    |>Async.RunSynchronously;;

This way, everything is computed in parallel and I have a really great performance!

Hope you enjoyed this little demo!

See you next time!

Functional approach to portfolio modeling

Good evening everybody, I’ve been paying attention to portfolio modelling for the past few months. When you tackle such problem, you first try to think about how you could represent a portfolio as an object so that you can dive into your C#/C++/Java code so that you can start making money ASAP. However, you’ll soon find yourself cornered in numerous problems, especially when you want to backtest different allocation strategies.

The object-oriented approach

Usually, when people model a portfolio, they will see it as a mapping between assets and weights associated to a date. There is however a misinterpretation of what a portfolio is. Indeed, what the common programmer describer above in his model is in fact a snapshot of the portfolio stat at some time t. If you were to make some changes in between two dates (all the subsequent instances of the portfolio will then be erroneous and the programmer would have to recompute them all to get the right simulation. Let’s take an example. Say we have a portfolio going through time t=1,2,3. We assume the stock has two assets, Microsoft (MSFT) and Yahoo (YHOO), and that the allocation strategy is to have an equally weighted portfolio (weights={0.5,0.5}). Here’s how the implementation would look like (in C#):

class Portfolio
{
     public DateTime date;
     public Dictionary<Asset,double> allocation;

     public Portfolio() {}

     public void Optimize()
     {
          int n=allocation.Count;
          foreach (var pair in allocation)
          {
                pair.Value=1/n;
          }
      }
}

class History
{
    public Dictionary<DateTime,Portfolio> history;

    public Add(DateTime t, Portfolio p)
    {
         history.Add(t,p);
    }
}

Now assume you want to add another stock (STO) to the portfolio at time 2, the previous implementation needs to be extended as follows:

class History
{
    public Dictionary<DateTime,Portfolio> history;

    public Add(DateTime t, Portfolio p)
    {
         history.Add(t,p);
         if (history.Any(x=>x.Key>=t))
         {
              /* Compare the new portfolio composition with the subsequent states
               * Take the necessary operation to adjust the portfolio.
               * OUCH!!!! This is complicated.
              */
         }
    }
}

As you can see in the comments I added, backward changes requires recomputing the portfolio at time 2 and 3. This is computationally intensive and the kind of function you really do not fancy writing. I’m not even discussing the probability that some bug will exist or the change that you would have to add if you were to make more complicated backward operations. Furthermore, assuming you want to see how the portfolio behave before a change, all the information about the previous simulation would be lost. This is because the class actually stores the stateof the portfolio, not really the portfolio itself. “Thanks for the heads-up Einstein! You got anything better to do?” Well, as a matter of fact, yes.

The functional approach

I would like to introduce a different way of representing a portfolio; a way which would be especially meaningful in a functional environment (F#, Scala). First of all, I would like to define a portfolio snapshot as a list of tuples of an Asset and a double representing each asset and its weight. To me, a portfolio is a strategy more than an object. In terms of allocation, the strategy outputs portfolio snapshots with weights, and these weights can actually generate buy/sell order to adjust a “real” portfolio (a basket of real assets) position. But the whole point here is that the portfolio is in fact just as set of operations. In my opinion, the correct way of representing an operation in a programming language is a function. In our case, an operation would be a function. This includes adding an asset to the portfolio, optimizing a portfolio and so on. The portfolio is hence a list of tuples of type DateTime*Operation (the date being the time at which the operation should occur). Let’s just define some formal definitions to these concepts (in F#):

type Action<'a>='a->'a

type Change<'a> = System.DateTime * Action<'a>;

type History<'a>; = Change<'a> list

An action is a function taking some type as an input and return a modified version of this object (actually, a new instance of the object with the modification included). A changeis an action happening at a certain time. Finally, a history is a list of changes. Simple. Now, how do we apply this to portfolio modeling? First, some more definitions:

type Weight=float

type Asset = string

type AssetAlloc=Weight * Asset

type PortfolioAlloc= AssetAlloc list

type PortfolioAction=Action<PortfolioAlloc>

type PortfolioChange = Change<PortfolioAlloc>

type Portfolio = History<PortfolioAlloc>

Thanks to the F# syntax, the code is pretty much self-explanatory. Now, let’s define a simple portfolio action consisting in adding an asset to the portfolio:

let addAsset (ass:Asset) (w:Weight) (pFolioAlloc: PortfolioAlloc) : PortfolioAlloc =
    match List.tryFind (fun p -> snd p = ass) pFolioAlloc with
    |Some _ -> (w,ass) :: pFolioAlloc |> List.filter (fun pa -> snd pa <> ass)
    |None -> (w,ass) :: pFolioAlloc

For those of you not familiar with functional programming, this might look complicated, but if you look a bit into the language (particularly Pattern Matching), you’ll see it’s actually quite trivial. Let’s continue with the optimization of the portfolio:

let equWeightPortfolio (pFAlloc:PortfolioAlloc) : PortfolioAlloc =
    let w:Weight = 1.0/(float pFAlloc.Length)
    pFAlloc |> List.map (fun alloc -> (w, snd alloc))

Trivial. Finally, we need a function that will evaluate the portfolio. This requires the application in succession of all the changes to an initial portfolio allocation (most of the time, an empty portfolio, initially). This kind of operation is well-known in functional programming, it simply consists in foldinga list:

let getPortfolioAllocFromInit (pFolio:Portfolio) (t : System.DateTime) (init:PortfolioAlloc) =
    pFolio |> List.filter (fun pc -> fst pc <= t)
    |> List.sortBy (fun pc -> fst pc)
    |> List.map (fun pc -> snd pc)
    |> List.fold (fun alloc action -> action(alloc)) init

let getPortfolioAlloc (pFolio:Portfolio) (t : System.DateTime) = getPortfolioAllocFromInit pFolio t []

The first function implements the general logic: first sorting the operations and then applying them sequentially. The second function just modifies the first one by giving an initial empty portfolio. The application of this model is as follows:

let addSPAction : PortfolioAction = addAsset "S&P500" 0.0;;
let addMSAction : PortfolioAction = addAsset "MSFT" 0.0;;
let myPortfolio:Portfolio = [(System.DateTime.Parse("01.01.2011"),addSPAction);
                            (System.DateTime.Parse("31.01.2011"),equWeightPortfolio);
                            (System.DateTime.Parse("01.02.2011"),addMSAction);
                            (System.DateTime.Parse("28.02.2011"),equWeightPortfolio);
                            ];;
let myPortfolioAlloc t = getPortfolioAlloc myPortfolio t;

let endAlloc = myPortfolioAlloc System.DateTime.Now;;

let janAlloc = myPortfolioAlloc(System.DateTime.Parse("01.02.2011"));;

which outputs:

val endAlloc : PortfolioAlloc = [(0.5, "MSFT"); (0.5, "S&P500")]
val janAlloc : PortfolioAlloc = [(0.0, "MSFT"); (1.0, "S&P500")]

When hence see how trivial it is to compute the state of the portfolio at different times. With this representation, altering a portfolio at time t=2 means actually adding a function to a list, but does not change the subsequent operations. The state of the portfolio (the snapshot) is computed on demand. Let’s try doing so:

let addYHOOAction:PortfolioAction = addAsset "YHOO" 0.0
let myPortfolio2:Portfolio= (System.DateTime.Parse("30.01.2011"),addYHOOAction)::myPortfolio
let myPortfolioAlloc2 t = getPortfolioAlloc myPortfolio2 t
let endAlloc2 = myPortfolioAlloc2 System.DateTime.Now
let janAlloc2 = myPortfolioAlloc2(System.DateTime.Parse("01.02.2011"));;

which outputs:

val endAlloc2 : PortfolioAlloc =
  [(0.3333333333, "MSFT"); (0.3333333333, "YHOO"); (0.3333333333, "S&P500")]
val janAlloc2 : PortfolioAlloc =
  [(0.0, "MSFT"); (0.5, "YHOO"); (0.5, "S&P500")]

As you can see, no rocket science to add backward operations! Note that we could have done so for any portfolio action… The model could be improved of course, but the idea is here. Keeping track of the old simulation would just mean keeping the date of creation of the change. I hope you enjoyed the ride, please feel free to comment! See you next time!