Exploring Python’s Software Engineering Capabilities with a Correlation Matrix Example

Python is often celebrated for its simplicity and accessibility, especially in scientific and mathematical applications. However, it’s also equipped with robust software engineering features that allow for the creation of well-structured and maintainable code.

In this post, I’ll illustrate some of these capabilities through the implementation of a helper class that allows for the easy definition of correlations between an arbitrary number of assets. This task, while seemingly straightforward, involves subtle complexities that make it a good example to showcase Python’s strengths.

Objective Definition

Let us define the way we would like this library to work, starting by the most important aspect: value assignment.

>>> rho = CorrelationMatrix()
>>> rho[ "A", "B" ] = 0.5
>>> rho[ "B", "A" ] == rho[ "A", "B" ] #Automatic Symmetry
True
>>> print( rho[ "A", "B" ] )
0.5

Here, we can see a couple of interesting things:

The matrix is designed to be subscriptable, meaning you can assign and retrieve elements as if it was a dictionary
- Furthermore, we use pairs for setting and retrieving values (e.g. rho[ "A", "B" ] rather than rho[ "A" ][ "B" ])
Symmetry is guaranteed after assignment (i.e. setting the correlation for a given pair $(A,B)$ ensure the value is also set for $(B,A)$)

Furthermore, we would like the matrix to raise errors when provided values are incorrect (either from a typing or from a value perspective).

>>> rho[ "A", "C" ] = "aaa"
Raises TypeError
>>> rho[ "A", "C" ] = 1.25
Raises ValueError
>>> rho[ "A", "C" ] = -1.25
Raises ValueError
>>> rho[ "A", "A" ] = 0.8
Raises ValueError

Correlation values should be of type float or int
Correlation values have to be $\in [-1,1]$
Correlation of any key with itself is always 1.

These requirements indeed do seem reasonably simple, but they in fact rely on some important programming concepts which Python is actually able to handle.

I would like to highlight that I am not claiming that other languages such as C# or Java would not be able to perform similar operations – or even accept a very similar syntax, but rather that Python is actually doing a good job at supporting those concepts.

As you can see, in the example above, we allow the user to add as many “entities” as desired (“A”, “B” or “C” in the example above). In practice, we might want to be able to restrict this behavior, which is why I would like to introduce the notion of the matrix to be frozen. This allows the user to define a list of entities during instantiation, but prohibits him for adding any extra one at a later stage.

Solution Suggestion

I have created a first iteration of code of this logic, which is available on Github, here (please note that the code I will be showing below refers to version 1.0; I want to refine it at a later stage).

We will below review some interesting aspects of my suggested solution.

Using Inheritance and Object-Oriented concepts

It is important to realize is that a correlation matrices are in fact a particular case of symmetric matrices, which are defined as follows:

$$A~\text{is symmetric} \Longleftrightarrow a_{ij}=a_{ji} \quad \forall i,j$$

Here is an example of a symmetric matrix:

$$\begin{bmatrix} 1 & 2 & 3 \\ 2 & 4 & 5 \\ 3 & 5 & 6 \\ \end{bmatrix}$$

Correlation matrices are hence symmetric matrices with extra requirements. Namely, in correlation matrices we need:

$$ a_{kl} \in [-1,1] \quad \forall k,l$$

$$ a_{ii} = 1 \quad \forall i$$

From a programming perspective, we can represent this kind of relationship using inheritance. Hence we will create 2 classes:

SymmetricMatrix class
- Subscriptable
- Implements all logic about symmetric assignments
CorrealtionMatrix class
- Inherits from SymmetricMatrix
- Adds logic to ensure extra logic is respected during assignment

Modelling our code this way allows to make the SymmetricMatrix code re-usable for other use cases. It also allows us as programmer to separate the responsibility of different pieces of code between different units of logic. This allows us to more easily identify where problem originate from, and makes the code easier to understand.

Concretely, the value assignment logic in CorrelationMatrix simply ensures the value is in $[-1,1]$ (or is equal to 1 if both entities are the same), and then trivially calls the assignment method from the parent SymmetricMatrix class.

This type of Object-Oriented programming concept is very common in modern languages, and quite naturally supported in Python.

Making the class subscriptable

In order to maximize code clarity, we wanted to be able to access the matrices using subscripts such as rho[ "A", "B" ], both for assignment and retrieval.

This is easily doable in Python, by defining the methods __getitem__( self, key ) and __setitem__( self, key, value ), full specifications here.

Note that __getitem__ contains only 1 input argument key. The reason why rho[ "A", "B" ] is valid is because "A", "B" will be interpreted as a 2-tuple and hence the value key will be assigned the value ("A", "B").

Concretely, the code x = rho[ "A", "B" ] is “translated” to x = rho.__getitem__( ( "A", "B" ) ), whilst rho[ "A", "B" ] = y is “translated” to rho.__setitem__( ( "A", "B" ), y ).

I will not go into the exact coding logic in this post, it is all available in the Github repository, but I just want to highlight the underlying logic. The key concept to grasp here, is that the programmer needs to actually store the value somewhere inside the instance. In this case, we have define an internal dictionary in a private variable called __values.

The logic in rho.__getitem__( ( "A", "B" ) ) then simply consists in determining whether the pair ( "A", "B" ) or the pair ( "B", "A" ) are in __values, and return the corresponding value. If neither pair is present in the keys of __values, then an IndexError is raised.

Similarly, the logic in rho.__setitem__( ( "A", "B" ), y ) consists in determining whether the pair ( "A", "B" ) or the pair ( "B", "A" ) are in __values, and if so, update the corresponding value by setting it to y. If neither is present in the keys of __values, then an IndexError is raised if the matrix is was meant to be frozen, or the value is initiated in __values with key ( "A", "B" ). I also included another smaller layer of logic for the assignment, which consists in ensuring that both “A” and “B” has previously been defined as an entity of the matrix. If not, then an error is raised.

Conclusion

In conclusion, this example not only underscores Python’s flexibility and powerful object-oriented features but also showcases its capability to handle complex software engineering challenges with elegance. By implementing a CorrelationMatrix through inheritance from a SymmetricMatrix, we have adhered to the DRY (Don’t Repeat Yourself) principle, thereby enhancing the modularity and clarity of our code. The subscriptability feature, demonstrated in our approach, adds a layer of intuitive interaction with the matrix, allowing users to access and modify elements using simple syntax akin to that used with dictionaries. This not only simplifies the management of the code but also provides a reusable framework that can be adapted to solve similar problems in various domains. In future articles, I plan to revisit this project to explore how the proposed code can be further enhanced using the latest Python features or integrating other libraries, pushing the boundaries of what can be achieved with thoughtful software design.