CFA Level II: Quantitative Methods, Multiple Regression

Following my first post about the Level II and specifically correlation, I am now moving on to the main topic if this year’s curriculum: multiple regression. Before I get started, I want to mentioned that the program talks about regression in 2 steps: it starts by discussing the method with 1 independent variable and then with multiple variables. I will only talk about the multiple variable version, because it is generalized. If you understand the general framework, you can answer any question about the specific 1 variable case.

Multiple Regression is all about describing a dependent variable $Y$ with a linear combination of $k$ variables $X_k, k \in 1..K$. This is expressed mathematically as follows:

$$Y_i = b_0 + b_1 X_{1,i} + …. + b_K X_{K,i} + \epsilon_i$$

Basically, you are trying to estimate the variable $Y_i$ with the values of the different $X_{k,i}$. The regression process consists in estimating the parameters $b_0, b_1, … , b_k$ with an optimization method over a sample of size $n$ (there are $n$ known values for each of the independent variable $X_k$ and the dependent variable $Y$, represented by the $i$ index). When all $X_k=0$, $Y$ has a default value $b_0$ (called the intercept). The error term $\epsilon_i$ is there because the model will not be able to determine $Y_i$ exactly; there is hence a residual part of the value of $Y$ which is unexplained by the model which is normally distributed with mean 0 and constant variance $\forall i$.

So to sum up, the inputs are:

• $n$ known values of $Y$
• $n$ known values of each $X_i$

and the outputs are:

• $b_0, b_1, … , b_k$

So, say you have a set of new values for each $X_k$, you can estimate the value for Y, denoted $\hat{Y}$ by doing:

$$\hat{Y} = b_0 + b_1 X_1 + …. + b_K X_K$$

The most important part of this section is the enumeration of its underlying assumptions:

1. There is a linear relationship between the independent variable $X_k$ and the dependent variable $Y$.
2. The independent variable $X_k$ are not correlated with each other.
3. The expected value of $\epsilon_i$ is 0 for all $i$.
4. The variance of $\epsilon_i$ is constant for all $i$.
5. The variable $\epsilon$ is normally distributed,
6. The error terms $\epsilon_1, …, \epsilon_n$ are not correlated with each other.

If one of these assumptions is not verified for the sample being analyzed, then the model is misspecified and we will see in a subsequent post how to detect this problem and how to handle it. Note that point 2) only mentions colinearity between the independent variables; there is no problem if $Y$ is correlated to one of the $X_k$, it’s what we’re looking for.

Now, remember in my first post on the Quant Methods I said that one we would have to compute the statistical significance of estimated parameters. This is exactly what were are going to do now. The thing is, the output parameters of a regression (the coefficients $b_0, b_1, … b_K$) are only statistical estimates. As a matter of fact, there is uncertainty about this estimation. Therefore, the regression algorithm usually outputs the standard deviations $s_{b_k}$ for each parameter $b_k$. This allows us to create a statistical test to determine whether the estimate $\hat{b}_k$ is statistically different from some hypothesized value $b_k$ with a level of confidence of $1-\alpha$. The null hypothesis $H_0$, which we want to reject, is that $\hat{b}_k=b_k$. The test goes as follows:

$$t=\frac{\hat{b}_k – b_k}{s_{b_k}}$$

If the null hypothesis $H_0$ is verified, the variable $t$ follows a t distribution with $n-K-1$ degrees of freedom. So, you can simply look at the value in the t-distribution form for the desired level on confidence to find the critical value $c_\alpha$. If $t>c_\alpha$ or if $t<-c_\alpha$, then $H_0$ can be rejected and we can conclude that $\hat{b}_k$ is statistically different from $b_k$.

Usually, we are asked to determine whether some estimate $\hat{b}_k$ is statistically significant. As explained in the previous post, this means that we want to test the null hypothesis $H_0: \hat{b}_k = 0$. So, you can just run the same test than before with $b=0$:

$$t=\frac{\hat{b}_k}{s_{b_k}}$$

That’s it for today. The concepts presented here are essential to succeed the Quantitative Method part of the CFA Level II exam. They are nonetheless quite easy to grasp and the formulas are very simple. Next, we will look at the method to analyze how well a regression model does at explaining the dependent variable.

CFA Level II: Economics, Exchange Rates Basics

Good evening everyone,

My weekly task is to go through the Economics part of the Level II curriculum. I was a bit afraid of it, because it was clearly my week point at the Level I and because I think that this topic covers a lot of material compared to its allocated number of questions.

In this level, the first challenge is to take into account the bid-ask spread for currency exchange rates. Just as we saw for security markets at Level I, exchange rate do not value a single “value”. That is, you cannot buy and sell a currency at the same price instantaneously. This is because you need to go through a dealer who has to make money for providing liquidity: this economical gain is provided by the bid-ask spread.

Let’s go back to the basics by looking at the exchange rate $\frac{CHF}{EUR}$. The currency in the denominator is the base currency; it is the asset being traded. The currency in the numerator is the price currency; it is the currency used to price the underlying asset which is in this case another currency. This is exactly like if you were trading a stock $S$. The price in CHF could be see as the $\frac{CHF}{S}$ “exchange rate”, i.e. the number of CHF being offered for one unit of $S$. Now as mentioned before, exchange are quoted with bid and ask prices:

$$\frac{CHF}{EUR} = 1.21 \quad – \quad 1.22$$

This means that $\frac{CHF}{EUR}_\text{bid}$ is 1.21 and $\frac{CHF}{EUR}_\text{ask}$ is 1.22. Again, you are trading the base currency: here Euros.

• The bid price is the highest price you can sell it for to the dealer.
• The ask price is the lowest price you can buy it for to the dealer.

If you want to make sure you got it right, just make sure  you can’t instantaneously buy the base currency at a given price (which you believe to be the ask) and sell it at a higher price (which you believe to be the bid). In this example, you can buy a EUR for 1,22 CHF and sell it instantaneously for 1.21 CHF making a loss of 1.22-1.21=-0.01 CHF. In fact, the loss can be seen as the price of liquidity which is the service provided by the dealer for which he has to be compensated. So the lower value is the bid, the higher value is the ask (also called the offer).

Recall from Level I that you could convert exchange rates by doing:

$$\frac{CHF}{EUR} = \frac{1}{\frac{EUR}{CHF}}$$

This is simple algebra and it works fine as long as you don’t have the bid-ask spread to take into account. The problem is that at Level II, you do. To invert the exchange rate with this higher level of complexity, you have to learn the following formula:

$$\frac{EUR}{CHF}_\text{bid} = \frac{1}{\frac{CHF}{EUR}_\text{ask}}$$

This might look complicated at first, but I got something in my bag to help you learning it. Look do the following steps:

• Define what you want on the left-hand side of the equation (currency in the numerator, currency in the denominator, bid or ask).

$$\frac{A}{B}_\text{side}$$

• On the right-hand side of the equation, write the inverse function:

$$\frac{A}{B}_\text{side}=\frac{1}{\cdot}$$

• On the right-hand side, replace the $\cdot$ by the inverted exchange rate:

$$\frac{A}{B}_\text{side}=\frac{1}{\frac{B}{A}_\cdot}$$

• Finally, replace the remaining dot on the right-hand side by the opposite side:

$$\frac{A}{B}_\text{side}=\frac{1}{\frac{B}{A}_\text{opp. side}}$$

Let’s take show how this work using our base example:

$$\frac{EUR}{CHF}_\text{bid}=\frac{1}{\frac{CHF}{EUR}_\text{ask}}$$

Simple. You can simply apply this method interchangeably to suit your needs. Actually, you might wonder what you need that to compute cross rates, which will be the subject of another post. Until that, grasp the concepts presented here and stay tuned on this blog!

CFA Level II, Quantitative Methods: Correlation

Good evening everyone,

So I’m finally getting started to write about the CFA Level II material, and as I process in the classic order of the curriculum, I will start with the Quantitative Methods part. If you have a bit of experience in quantitative finance, I believe this part is quite straightforward. Actually, it talks about many different things and therefore I will make a different post for each topic to keep each of them as short as possible. This will hopefully make it as readable and accessible as possible for all types of readers.

As a brief introductory note, I would say that this section is very much like the rest of the CFA Level II curriculum, it builds on the concepts learnt in the Level I. Let me write that again: the concepts – not essentially all the formulas. For the Quant Methods part, you have to be comfortable with the Hypothesis Testing part I recapped in this post.

So let’s get started for the first post which will be about correlation. This part is actually quite easy because you’ve pretty much seen everything at the Level I. Let me restate the two simple definitions of the sample covariance and the sample correlation of two sample $X$ and $Y$:

$$\text{cov}_{X,Y} = \frac{\sum_{i=1}^n (X_i – \bar{X})(Y_i – \bar{X})}{n-1}$$

$$r_{X,Y} = \frac{\text{cov}_{X,Y}}{s_X s_Y}$$

where $s_X$ and $s_Y$ are the standard deviation of the respective samples.

The problem of the sample covariance is that it doesn’t really give you a good idea of the strength of the relationship between the two processes; it very much depends on each the samples’ variances. This is where the sample correlation is actually useful because it is bounded: $r_{X,Y} \in [-1,1]$. Taking the simple example, of using twice the same sample, we have $\text{cov}_{X,X} = s_X^2$ and $r_{X,X}=1$.

In general, I would say that the Quant Methods part of the Level II mainly focuses on understanding underlying models and their assumptions, not on learning and applying formulas – it was more the case in Level I. For correlation, there is one key thing to understand, it detects a linear relationship between two samples. This means that the correlation is only useful to detect a relationship of the kind $Y = aX+b+\epsilon$.

A good way to visualize whether there is a correlation between two samples is to look at a scatter plot. To create a few examples, I decided to use MATLAB as we can generate random processes and scatter plots very easily. So I will create 3 processes:

• A basic $X \sim \mathcal{N}(0,1)$ of size $n=1000$
• A process which is a linear combination of $X$.
• A process which is not a linear combination of $X$.
• And another process $Y \sim \mathcal{N}(0,1)$ of size $n=1000$ independent from $X$.

<br />
%Parameter definition<br />
n=1000;<br />
%Process definitions<br />
x=randn(n,1);<br />
linear_x=5+2*x;<br />
squared_x=x.^2;<br />
y=randn(n,1);<br />
%Plotting<br />
figure();<br />
scatter(x,y);<br />
figure();<br />
scatter(x,squared_x);<br />
figure();<br />
scatter(x,linear_x);<br />


The script presented above generates 3 different scatter plots which I will now present and comment. First, let’s take the example of the two independent process X and Y. In this instance, there is not correlation by definition. This give a scatter plot like this:

Now, let’s look at the scatter plot for the process which is a linear combination of X. Obviously, this is an example of two processes with positive correlation (actually, perfect correlation).

Graphically, we can say that there is correlation between two samples if there is a line on the scatter plot. If the line has a positive slope (it goes from down-left to up-right), the correlation will be positive. If it has a negative slope (it goes from top-left to bottom-right), the correlation will be negative. The magnitude of the correlation is determined by how much the points lie on a straight line. In the example above, all the points perfectly line on a straight line.  So a general framework to “estimate” correlation from a scatter plot would be the following:

• Do the points lie approximately on a straight line?
• If no, then there is no correlation, stop here.
• If yes, then there is correlation and continue:
• Is the slope of the line positive or negative?
• If positive, then the correlation $r \geq 0$
• If negative, then the correlation $r \leq 0$
• If there is no slope (vertical or horizontal straight line), then one of the sample is constant and there is no correlation $r=0$.
• How much are the points on a straight line?
• The more they look to be on a straight line, the more $|r|$ is close to 1.

In the example above, the points look like a straight line, the slope is positive, and the points are very much on a straight line, so $r \simeq 1$.

Finally, let’s look at the third process which is simply the squared values of X, so the relationship is not linear:

If we apply the decision framework presented above, we can clearly see that the points do not lie on a straight line, and hence we conclude that there is no correlation between the samples.

Now let’s look at the values given by MATLAB for the correlations:

Samples Correlation
Independent processes X and Y 0.00
X and Linear Combination of X 1.00
X and squared X 0.02

As we can see, MATLAB confirms what we determined looking at the scatter plots.

I now want to explain a very important concept in the Level II curriculum: the statistical significance of an estimation. Quite simply, given the fact that we estimated some value $\hat{b}$, we say that this estimation is statistically significant with some confidence level $(1-\alpha)$, if we manage to reject the hypothesis $H_0 : b=0$ with that confidence level. To do so, we will perform a statistical test which will depend on the value we are trying to estimate.

For the given post, we would like to determine whether the sample correlation $r$ we estimate is statistically significant. There is a simple test given by the CFA institute:

$$t = \frac{r \sqrt{n-2}}{\sqrt{1-r^2}}$$

This variable $t$ follows a Student-t law, with $n-2$ degrees of freedom. This means that, if you know alpha, you can simply look at the critical value $t_c$ in the distribution table and reject the hypothesis $H_0 : r=0$ if $t < -t_c$ or $t > t_c$. Although you are supposed to know how to compute this test yourself, you can also be given the p-value of the estimated statistic. Recall from Level I, the p-value is the minimal $\alpha$ for which $H_0$ can be rejected. Quite simply, if the p-value is smaller than the $\alpha$ you consider for the test, you can reject the null hypothesis $H_0$ and conclude that the estimated value is statistically significant.

Let’s look at the p-values MATLAB gives me for the correlation we discussed in the example:

Samples Correlation P-Value
Independent processes X and Y -0.01 0.77
X and Linear Combination of X 1.00 0.00
X and squared X 0.02 0.53

If we consider an $\alpha$ of 5%, we can see that we cannot reject the null hypothesis for the independent and squared process, so they are not statistically significant. On the other hand, the linear combination of X has a p-value close to 0, which means that it is statistically significant for virtually any $\alpha$.

One last word on the important points to keep in mind about correlation:

• Correlation detects only a linear relationship; not a nonlinear one.
• Correlation is very sensible to outliers (“weird” points, often erroneous, included in the datasets).
• Correlation can be spurious; you can detect a statistically significant correlation whereas there is no economic rationale behind it.

I hope you enjoyed this first post on the CFA Level II, and I’ll be back shortly with more!

C# Parallel Processing for Stock Paths simulation

Good evening everyone!

2013 hasn’t started very well for me as I’ve been sick for the past week and completely unable to work on anything even on a bit of CFA. Thankfully, I’m now starting to get better and I can again put a few words – or pieces of code – together.

I decided to write this quick post because my fiancee’s sister Elsa forwarded me a few questions she had about the material of an exam she had this morning in Embedded Systems. As you might guess, some of the material was dedicated to handling concurrency issues which made me realize that I never discussed parallel programming in C#.

Besides, I just bought a brand new laptop as my previous one was really looking like it was going to die any minute and, without a computer to play around with, I would really be like a professional hockey player without a stick  (I hope you like the metaphor). As a matter of fact, this new computer (an Asus G55V) is quite powerful: it has an Intel i7 2.4 Ghz processor with 12 GB of RAM. With this kind of setup, I should be able to run 7 computations at a time (using each of the 7 cores of the Intel i7). Cool huh?

Therefore, I decided to create a little, very simple example of parallel processing in C# applied to quantitative finance. The classic case of parallel processing in this field is usually referring to so-called Monte-Carlo simulations, where you simulate a very large amount of different paths for the price of a stock price according to a given model.

I will split this post in two distinct parts: the model and the simulation. For those of you who are not interested about the financial aspects of this post, feel free to just jump to Part II.

Part I : the model

Definition

I chose the model to be the Geometric Brownian Motion, which is used for the well-known Black-Scholes famework. The model states the dynamics of a price process $S_t$ as follows:

$$dS_t = \mu S_t dt + \sigma S_t dW_t$$

Note that the return are intrinsically modeled like this:

$$\frac{dS_t}{S_t} = \mu dt + \sigma dW_t \sim \mathcal{N}(\mu ~ dt, \sigma^2 dt)$$

To simulate this price process using a programming language, I need to discretize the equation:

$$\Delta S_t = S_{t + \Delta t} – S_t = \mu S_t \Delta t + \sigma S_t W_{\Delta t}$$

For simplicity’s sake, I’ll assume my model to be looking at daily prices and I will simulate prices every day so $\Delta t = 1$:

$$\Delta S_t = S_{t + 1} – S_t = \mu S_t + \sigma S_t Z_t \quad \text{with} \quad Z_t \sim \mathcal{N}(0,1)$$

The implementation

The implementation is very simple: I created a very simple StockPath class which is supposed to simulate a the path of the prices of an arbitrary stock. I gave it the following private methods:

/// <summary>
/// Computes the change in price (delta S) between two points in the paths.
/// </summary>
/// <param name="currentStockPrice">The price of the stack at the first (known) point.</param>
/// <param name="randomNoise">The random noise used to estimate the change.</param>
/// <returns>The absolute change in price between two points according to the model.</returns>
/// <remarks>The underlying model assumes a geometric brownian motion.</remarks>
private double _computeStockPriceChange(double currentStockPrice, double randomNoise)
{
return currentStockPrice * _mean * _timeInterval + _standardDeviation * currentStockPrice * randomNoise;
}

/// <summary>
/// Computes the next stock price given a <paramref name="currentStockPrice"/>.
/// </summary>
/// <param name="currentStockPrice">The price of the stack at the first (known) point.</param>
/// <param name="randomNoise">The random noise used to estimate the change.</param>
/// <returns>The next stock price according to the model. This value will never be less than 0.0.</returns>
/// <remarks>
/// The model makes sure that a price cannot actually go below 0.0.
/// The underlying model assumes a geometric brownian motion.
private double _computeNextStockPrice(double currentStockPrice, double randomNoise)
{
return Math.Max(currentStockPrice + _computeStockPriceChange(currentStockPrice, randomNoise), 0.0);
}


Note that I also made sure that the stock price $S_t$ would never go below 0.0 (this is not the case in the mathematical model I stated above).

The computation of the path is performed using Math.Net library to generate the normal variables, and is done in the constructor:

/// <summary>
/// Initializes a stock path.
/// </summary>
/// <param name="nbrSteps">The number of steps in the path.</param>
/// <param name="mean">The mean of the returns.</param>
/// <param name="std">The standard deviation of the returns.</param>
/// <param name="timeInterval">The time between two evaluation points.</param>
/// <param name="initialPoint">The starting point of a path.</param>
public StockPath(int nbrSteps, double mean, double std, double timeInterval, double initialPoint)
{
//Assigns internal setup variables
_timeInterval = timeInterval;
_mean = mean;
_standardDeviation = std;

//Using Math.Net library to compute the required random noise.
Normal normal = Normal.WithMeanStdDev(0, 1);
IEnumerable<double> returns = normal.Samples().Take(nbrSteps);

//Explicit implementation of aggregation mechanism.
_lastPrice = returns.Aggregate(initialPoint, _computeNextStockPrice);
}


Note that for simplicity and to diminish the computation time, I do not save each step of the price computation: I save only the last price.

Part II: simulation

For those of you who weren’t interested in the model specifics, this is where the main topic of this posts starts. We previously defined how we compute each path. The main problem with path computation is that you cannot parallelize it because you need to know $S_t$ to compute $S_{t+1}$. However, you can compute different paths in parallel, as they are completely independent from each other. To demonstrate the advantage of using parallel computations, I will want to simulate a very large amount of paths (100’000 of them) each of which will have 10 years of daily data (2520 points).

Why is it useful to compute the paths in parallel?

Let’s take a simple real-life example: cooking. Assume you invited a group of friends for dinner. You decided to have smoked salmon and toasts for appetizers and a pizza for the main. Now, what you would most probably do is prepare the pizza, put it in the oven, and while it is cooking, you would make the toasts, and while the toasts are in the toaster, you would put the salmon in a plate and so on. Once everything is done, you put everything in the table. In classic programming, you just cannot do that.

Haven’t we already been doing exactly that forever?

No, but everything was made to make you believe so. To keep it simple, a processor (CPU) can only do a single thing at a time: that is, if you were a processor, you would be stuck in front of the oven while the pizza is cooking. Some of you might argue that old computers (who had only a single CPU) managed to do several things at a time: you could still move the mouse or open a text editor while running an installation. That’s not really true! Actually, what the computer was doing was to alternate between different tasks. To come back to the kitchen example, if you were a computer, you could stop the oven, put the first glass on the table, switch back the oven, wait a little, stop the oven again, put the second glass on the table, restart the oven, wait again a little, and so on… This could look like you’re doing several things at the same time, but in fact, you’re splitting your time between several tasks. In short, a CPU performs computations sequentially. The problem with this “fake” parallelization is that although it looks like everything happens at the same time, you actually waste a lot of time because you can’t do many things at the same time.

Modern computers are equipped with multiple cores. For example, my computer has an Intel i7 processor: it has 7 different cores that can process computations on their own. This means that you can compute 7 different things in parallel. They work as a team; for the kitchen example the team in the kitchen. This does not mean that I can compute anything in parallel, I need the tasks to be independent (at least to some extent). Now, this doesn’t mean that I can really compute things 7 times quicker. The reason for that is because the computer needs to perform some operations to make synchronize the results together once they are done. In the kitchen example, the team need to make sure that they do not do things recklessly; they need the plates to be ready at a given time; they need to organize themselves. Overall, there is an organisational overhead, but it is offset by the ability of being able to perform several things at the same time.

Is it difficult to perform parallel computations in C#?

There are several ways to perform parallel computing in C#. Today, I would like to discuss the simplest one: PLINQ. In my functional programming post,  I already showed how to use the classical LINQ framework to handle collections functionally. What is incredible about PLINQ is that it automatically handles the parallel aspect of collection operations after the function AsParallel() is called. So, first, I define a creator function which creates a new stock path for a given index (the index is actually useless in our case, but enables us to treat the computations functionally):

Func<int,StockPath> creator = x => new StockPath(nbrPoints, mean, stdDev);


What I want to do is to make sure that this creator is called in parallel several times. Here is now the function I implemented to run a simulation of several paths of a given amounts of points:

/// <summary>
/// Runs a simulation and prints results on the console.
/// </summary>
/// <param name="nbrPoints">The number of points for each path to be generated.</param>
/// <param name="nbrPaths">The number of paths to be generated.</param>
/// <param name="simulationName">The name of the simulation</param>
/// <param name="creator">The function used to create the <seealso cref="StockPath"/> from a given index.</param>
/// <param name="mode">The <see cref="Program.ExecutionMode"/></param>
public static void RunSimulation(int nbrPoints, int nbrPaths, string simulationName, Func<int,StockPath> creator, ExecutionMode mode)
{
Stopwatch stopWatch = new Stopwatch();
StockPath[] paths = new StockPath[nbrPaths];
IEnumerable<int> indices = Enumerable.Range(0, nbrPaths - 1);
Console.WriteLine("Starting " + simulationName + " simulation.");
stopWatch.Start();

switch (mode)
{
case ExecutionMode.CLASSIC:
paths = indices.Select(creator).ToArray();
break;
case ExecutionMode.PARALLEL:
paths = indices.AsParallel().Select(creator).ToArray();
break;
default:
throw new ArgumentException("Unknown execution mode", "mode");
}

stopWatch.Stop();
Console.WriteLine("End of " + simulationName + " simulation.");
var lastPrices = paths.Select(x => x.LastPrice);
Console.WriteLine("Min price: " + lastPrices.Min().ToString("N2"));
Console.WriteLine("Max price: " + lastPrices.Max().ToString("N2"));
Console.WriteLine("Computation time: " + stopWatch.Elapsed.TotalSeconds.ToString("N2") + " sec");
}


The two most important lines are the number #20 and #23. The rest of the function basically computes the executions time and prints the results on the console. Running the simulation in both modes (sequential and parallel) gives me the following results:

Execution mode Computation time
Sequential 32.33 sec
Parallel 06.29 sec

As you can see, running the paths in parallel improved greatly the computation time of the simulation; it’s about 5 times quicker. As expected, we did not perform 7 times better (the number of available cores) because we had to do the extra background work of synchronizing the results at the end of the computation.

Summary

Given the extreme simplicity (only adding the AsParallel() call) of the code to get into parallel mode, using PLINQ seems to be a very good and very accessible solution for simple cases such as this one.

Of course the performance will vary with the processor being used but in general, I would recommend using this PLINQ solution. There are a few important things to think about when using parallelism: think about whether it is really useful:

• Are the computations really independent?
• Are there enough computations to compensate for the extra work required to synchronize the results?

The whole project is available on my GitHub account here: https://github.com/SRKX/ParallelStockPaths.

I hope you enjoyed the demo,

See you next time,

Jeremie