This post will introduce the first part (of multiple) where we build up a personal finance model to help simulate future time periods based on certain chosen input variables. We will input variables such as our current investable asset base, our annual salary, expected monthly inflows and outflows and a range of other relevant values. Firstly, after our necessary imports, we look to start on modelling inflows over time.

import pandas as pd import numpy as np import random import pandas_datareader.data as web import datetime import matplotlib as mpl import matplotlib.pyplot as plt %matplotlib inline plt.rcParams['figure.figsize'] = (12,6) # Set up model inflows inflows = {'active_annual_income':50_000} variables = {'start_date' : "01/01/2020", 'years': 10} income_gains_storage = [] months = variables['years'] * 12 for month in range(months): income = inflows['active_annual_income'] / 12 income_gains_storage.append(income) plt.plot(pd.Series(income_gains_storage).cumsum()) plt.show()

This looks like a pretty simplistic results, the cumulative income we have earned over 10 years, at 50k annual salary is unsurprisingly 10 x 50,000 = 500,000. One thing we might want to introduce is some logic to account for the fact the nasty tax man gets his hands on at least some (way too much) of it. For the moment, for simplicity’s sake, I will assume a flat tax rate of 25%. Of course most countries have a sliding scale of income bands which are subject to increasing rates of tax the more one earns. We can come back to this later and see if we can implement something more along those lines – but for now lets stick with our flat 25%.

Let’s add an entry to our ‘variables’ dictionary to represent the tax rate.

I have purposefully named it to reflect the fact this is tax being applied to one’s income from employment – what i have called “active_annual_income” in the “inflows” dictionary. This is as opposed to taxes that we might incur against other sources of income, such as investment income (which we shall introduce further down the line).

inflows = {'active_annual_income':50_000} variables = {'start_date' : "01/01/2020", 'years': 10, 'tax_on_active_income_gains': 0.25} # add our tax rate on active income in % terms income_gains_storage = [] months = variables['years'] * 12 for month in range(months): # add the effect of the tax rate being applied to our income income = (inflows['active_annual_income'] * (1 - variables['tax_on_active_income_gains'])) / 12 income_gains_storage.append(income) plt.plot(pd.Series(income_gains_storage).cumsum()) plt.xlabel('Month') plt.ylabel('Cumulative Income') plt.show()

So again, at this point, nothing too fantastically insightful or useful in terms of output; we just end up now after the 10 years with 375K, which is just the 75% left over after the taxman has taken his 25%. This logic is, of course, very over-simplistic as not only do we not account for any kind of sliding-scale tax bands at which different levels of income are taxed, but we also have no concept of the time-value-of-money here. A dollar of income taken by the tax man today hurts us more than a dollar taken in say 5 years from now. Not only would we be able to earn interest on the dollar if we were allowed to keep it, we could also invest it and put it to work in search of higher returns.

We will get around to modelling these investment returns shortly – in the real-world they are stochastic in nature and display a certain element of random variation from one period to the next. We can’t just assume if our calculated annual expected investment returns are 10% per year that we will realise those exact returns every year, nor receive them in a linear fashion of 12 nice, evenly sized monthly instalments. The fact is that it is not only the size of our returns that determine our ending value of wealth accumulated; the order of those returns matter, along with their volatility. I won’t spend too much time going through geometric means vs arithmetic means and such, but suffice to say the higher the volatility of a set of returns, the more damaging that is to our end wealth (on average). As an example if we were to see returns of -20% in period 1 and +20% in period 2 we would end up with 0.8 x 1.2 = 0.96 i.e. we would be down 4% from a starting capital of 1, vs if we saw more volatile returns of -50% and +50 where we would end up with 0.5 x 1.5 = 0.75 – i.e. we would be down 25%. This is even though both examples give us an arithmetic return of 0% (-0.2 + 0.2 = 0) and (-0.5 + 0.5 = 0). If interested in further reading, a more detailed explanation of the relationship between arithmetic and geometric means can be found here: https://brilliant.org/wiki/arithmetic-mean-geometric-mean/

Just before we move on to the investment logic, let’s quickly add some logic to allow for an annual salary increase, set as a percentage value of the current active annual salary. The way we shall implement it is to assume our boss will sit down with us once a year and offer us an increase that will take effect at that time. This is how we go about it:

inflows = {'active_annual_income':50_000} variables = {'start_date' : "01/01/2020", 'years': 10, 'tax_on_active_income_gains': 0.25, 'avg_ann_income_raise':0.05} # add our annual salary raise in % terms income_gains_storage = [] months = variables['years'] * 12 for month in range(months): income = (inflows['active_annual_income'] * (1 - variables['tax_on_active_income_gains'])) / 12 income_gains_storage.append(income) # every 12 months we increase our base salary by the average annual percentage increase if (month % 12 == 0) and (month > 0): # we dont want to apply the raise on the very first month inflows['active_annual_income'] *= (1 + (variables['avg_ann_income_raise'])) plt.plot(pd.Series(income_gains_storage).cumsum()) plt.xlabel('Month') plt.ylabel('Cumulative Income') plt.show()

Ok now let’s get to modelling those stochastic investment returns mentioned previously. There are numerous ways to go about this task; the exact approach may well differ based on individual circumstances or beliefs etc. To keep it relatively simple for illustration purposes, we shall use the S&P 500 Total Return index as a proxy for “the market” and use historic data to calculate our values regarding “average annual returns” and related levels of volatility. We will calculate these two values (average return and standard deviation) – we shall then use them as inputs to draw from a Standard Normal distribution and use the values received as our assumed “market returns” across the period in question.

So we are now going to:

- Download historic S&P 500 index prices
- Calculate the average monthly return and standard deviation of a chosen historic period
- Store these values to use later as inputs to our Standard Normal distribution

The values are coming out as a monthly return of 0.6% and a monthly volatility (standard deviation) of 4.15%.

Before we attack this logic, we need to create a new input in our “inflows” dictionary to represent our starting capital, or starting assets. Also we add the calculated values of average monthly return and volatility to our “variables” dictionary.

# download S&P historic prices data start, end = datetime.datetime(2000, 12, 31), datetime.datetime(2020, 1,1) tickers = ["^SP500TR"] sp = pd.DataFrame([web.DataReader(ticker, 'yahoo', start, end).loc[:, 'Adj Close'] for ticker in tickers], index=tickers).T.fillna(method='ffill') # calculate average monthly return & volatility sp_monthly_pct_return = sp.resample('M').last().pct_change().mean().values[0] sp_monthly_std_dev = sp.resample('M').last().pct_change().std().values[0] inflows = {'active_annual_income':50_000, 'starting_assets': 250_000} # add value of starting assets variables = {'start_date' : "01/01/2020", 'years': 10, 'tax_on_active_income_gains': 0.25, 'avg_ann_income_raise':0.05, 'avg_monthly_market_returns': sp_monthly_pct_return, # add market return data to our inputs 'avg_monthly_market_volatility': sp_monthly_std_dev} # add market volatility data to our inputs income_gains_storage = [] investment_gains_storage = [] # create list to store our investment returns # create a list to store each period's starting assets value assets_starting_list = [inflows['starting_assets']] # input the first value in the list as today's assets assets_ending_list = [] # create a list to store each period's ending assets value months = variables['years'] * 12 for month in range(months): # small bit of logic to see if this is our first time through the monthly loop # and if it is not, then start to use the ending assets from the previous # period as this period's starting assets. if assets_ending_list: assets_starting_list.append(assets_ending_list[-1]) income = (inflows['active_annual_income'] * (1 - variables['tax_on_active_income_gains'])) / 12 income_gains_storage.append(income) if (month % 12 == 0) and (month > 0): inflows['active_annual_income'] *= (1 + (variables['avg_ann_income_raise'])) # generate a monthly market return by drawing from a normal distribution market_return = np.random.normal(variables['avg_monthly_market_returns'], variables['avg_monthly_market_volatility'], 1)[0] # calculate the investment return investment_return = assets_starting_list[-1] * market_return # store the investment return value in a list investment_gains_storage.append(investment_return) # calculate the end of period assets value assets_ending = assets_starting_list[-1] + investment_return + income # store ending assets value assets_ending_list.append(assets_ending) plt.plot(pd.Series(investment_gains_storage).cumsum()) plt.xlabel('Month') plt.ylabel('Cumulative Investment Returns') plt.show()

Now when we plot our series of investment returns we can see that they are indeed stochastic in nature, with the inherent randomness of returns meaning some months we are up, some we are down even though our average monthly return used as an input to the Standard Normal function was positive.

As a slight point to note – please be aware that some of the following plots are showing the series of cumulative monthly investment returns, whereas some of them are showing the series of ending period asset value – they are different things! Apologies if that is obvious, just I appreciate the plots can often look very similar on first sight. Make sure you check the y-axis label on each plot.

If you run the above code a number of times, each outcome will be different due to that fact – this will lead us on nicely later to the world of Monte Carlo methods and creating distributions of ending values – I will leave the juicy details until we get around to that section.

Ok so now we have modelled our monthly active salary (with an annual percentage increase), along with our monthly investment returns which we have subjected to a stochastic element to mimic the real-world.

If we plot our series of ending asset values and note how much we have ended up with, you may assume we are being over-optimistic. You would, of course, be correct; firstly we haven’t applied any taxes to our investment gains, and also the elephant in the room is that we are yet to include any kind of “outflows” – they’re those pesky bills and living costs we have to shell out to do things like eat food and have somewhere to live. Once we start to apply the “debit” side of the equation, we will see the situation balance out somewhat (and here’s me thinking I’d be a multi-millionaire in a few years).

plt.plot(pd.Series(assets_ending_list)) plt.xlabel('Month') plt.ylabel('Ending Asset Value') plt.show()

So let’s first apply a tax rate to our investment returns – the process is similar to when we applied tax to our active income. First we add an investment tax rate to the variables dictionary – the code is shown below:

start, end = datetime.datetime(2000, 12, 31), datetime.datetime(2020, 1,1) tickers = ["^SP500TR"] sp = pd.DataFrame([web.DataReader(ticker, 'yahoo', start, end).loc[:, 'Adj Close'] for ticker in tickers], index=tickers).T.fillna(method='ffill') sp_monthly_pct_return = sp.resample('M').last().pct_change().mean().values[0] sp_monthly_std_dev = sp.resample('M').last().pct_change().std().values[0] inflows = {'active_annual_income':50_000, 'starting_assets': 250_000} variables = {'start_date' : "01/01/2020", 'years': 10, 'tax_on_active_income_gains': 0.25, 'avg_ann_income_raise':0.05, 'tax_on_investment_gains': 0.35, # add investment returns tax rate 'avg_monthly_market_returns': sp_monthly_pct_return, 'avg_monthly_market_volatility': sp_monthly_std_dev} income_gains_storage = [] investment_gains_storage = [] assets_starting_list = [inflows['starting_assets']] assets_ending_list = [] months = variables['years'] * 12 for month in range(months): if assets_ending_list: assets_starting_list.append(assets_ending_list[-1]) income = (inflows['active_annual_income'] * (1 - variables['tax_on_active_income_gains'])) / 12 income_gains_storage.append(income) if (month % 12 == 0) and (month > 0): inflows['active_annual_income'] *= (1 + (variables['avg_ann_income_raise'])) market_return = np.random.normal(variables['avg_monthly_market_returns'], variables['avg_monthly_market_volatility'], 1)[0] # apply the tax rate to investment gains investment_return = (assets_starting_list[-1] * market_return) * (1 - variables['tax_on_investment_gains']) # store the investment return value in a list investment_gains_storage.append(investment_return) # calculate the end of period assets value assets_ending = assets_starting_list[-1] + investment_return + income # store ending assets value assets_ending_list.append(assets_ending) plt.plot(pd.Series(investment_gains_storage).cumsum()) plt.xlabel('Month') plt.ylabel('Cumulative Investment Returns') plt.show()

Let’s now start to add some regular monthly outflows and living costs to even up the balance. I have gone about it by creating a new “outflows” dictionary as shown below. I have also introduced a variable to reflect the general level of inflation which we will apply to our costs on an annual basis (much like we hiked our salary by a certain percentage once a year). We could of course deal with the annual inflation level in a stochastic manner as we did the investment returns, but for simplicity here we shall just apply them deterministically.

start, end = datetime.datetime(2000, 12, 31), datetime.datetime(2020, 1,1) tickers = ["^SP500TR"] sp = pd.DataFrame([web.DataReader(ticker, 'yahoo', start, end).loc[:, 'Adj Close'] for ticker in tickers], index=tickers).T.fillna(method='ffill') sp_monthly_pct_return = sp.resample('M').last().pct_change().mean().values[0] sp_monthly_std_dev = sp.resample('M').last().pct_change().std().values[0] inflows = {'active_annual_income':50_000, 'starting_assets': 250_000} # add our outflows dictionary outflows = {'rent':1500, 'credit_card_payment':750, 'medical_insurance':250, 'pension_contribution':500, 'misc': 1500} variables = {'start_date' : "01/01/2020", 'years': 10, 'tax_on_active_income_gains': 0.25, 'avg_ann_income_raise':0.05, 'avg_ann_inflation': 0.02, # add annual inflation rate 'tax_on_investment_gains': 0.35, 'avg_monthly_market_returns': sp_monthly_pct_return, 'avg_monthly_market_volatility': sp_monthly_std_dev} income_gains_storage = [] investment_gains_storage = [] assets_starting_list = [inflows['starting_assets']] assets_ending_list = [] months = variables['years'] * 12 for month in range(months): if assets_ending_list: assets_starting_list.append(assets_ending_list[-1]) # create a variable to keep track of our assets value assets = assets_starting_list[-1] # first we subtract our monthly costs from our overall assets assets -= (outflows['rent'] + outflows['credit_card_payment'] + \ outflows['medical_insurance'] + outflows['pension_contribution'] + \ outflows['misc']) # move our calculation of investment returns above that for income # and use our new "assets" variable as the investment base market_return = np.random.normal(variables['avg_monthly_market_returns'], variables['avg_monthly_market_volatility'], 1)[0] investment_return = (assets * market_return) * (1 - variables['tax_on_investment_gains']) investment_gains_storage.append(investment_return) # add investment returns to our "assets" variable assets += investment_return income = (inflows['active_annual_income'] * (1 - variables['tax_on_active_income_gains'])) / 12 income_gains_storage.append(income) if (month % 12 == 0): inflows['active_annual_income'] *= (1 + (variables['avg_ann_income_raise'])) # increase our outflows by inflation each year # I have chosen to list each one individually to enable easy customisation # (rather than looping over the outflows dict to increase them) outflows['rent'] *= (1 + (variables['avg_ann_inflation'])) outflows['credit_card_payment'] *= (1 + (variables['avg_ann_inflation'])) outflows['medical_insurance'] *= (1 + (variables['avg_ann_inflation'])) outflows['pension_contribution'] *= (1 + (variables['avg_ann_inflation'])) outflows['misc'] *= (1 + (variables['avg_ann_inflation'])) # add income gain to our "assets" variable assets += income # calculate the end of period assets value assets_ending = assets # store ending assets value assets_ending_list.append(assets_ending) plt.plot(pd.Series(assets_ending_list)) plt.xlabel('Month') plt.ylabel('Ending Asset Value') plt.show()

Above I have moved the application of outflows to the asset base above the code that adds the income or calculates the investment returns. The thinking here is that if you have bills to pay, then you make sure you pay them first as soon as you can, or at least hold the money aside for the month so you are certain you will have it when needed – rather than invest it and risk not having enough when it comes time to pay. I also assume here that salaries will be paid at the end of the month – or at least not early enough to be sat in your bank account, waiting to be invested or able to be used to pay bills for that month (better safe than sorry).

There is one more bit of logic I would like to apply before we move forward; currently the code allows for a situation where we go “bust”, losing everything we have in terms of investable assets, and continue “investing” a negative amount. This is obviously nonsensical as you cant make positive returns from a negative asset base, just because the market returns are negative. We will add a flag that checks to make sure our asset base is in positive territory before we start calculating and applying investment returns; if our asset base is negative, there are a few things we can do to ac count for that – just for example we could:

- Consider that situation “unnaceptable” to the extent we class ourselves as “ruined/bankrupt” and end the simulation with a “failure” flaf or label.
- Allow the simulation to continue and collect salary income each month, but each month where assets are non-positive, we set the investment gains to zero by default.
- Apply some non-trivial logic that applies interest and penalty charges, allow us to borrow and re-invest etc.

The last option, in my view, is just not realistic nor really the point of this exercise, with the best options being between 1 and 2. Which ever you choose is up to you – I tend to go for option 1, considering an asset base of zero to be unacceptable and a sign of abject failure with regards to money management and such. You could even argue that even getting close to a zero value asset base is unacceptable and want to set the “failure” threshold somewhere above zero. We can revisit these options later perhaps if it is of benefit.

I deal with a failed simulation by recording a “failure flag” – mark that run as failed for later analysis. This reasoning will become a bit clearer later on when we start to apply Monte Carlo methods to our model, carrying out 100s, 1000s or more simulations per “experiment”. Don’t worry if what I am saying isn’t making sense – we will tackle the Monte Carlo element and explain in due course.

Let’s apply the logic now to stop the simulation if we lose our entire asset base and effectively become financially “ruined”.

start, end = datetime.datetime(2000, 12, 31), datetime.datetime(2020, 1,1) tickers = ["^SP500TR"] sp = pd.DataFrame([web.DataReader(ticker, 'yahoo', start, end).loc[:, 'Adj Close'] for ticker in tickers], index=tickers).T.fillna(method='ffill') sp_monthly_pct_return = sp.resample('M').last().pct_change().mean().values[0] sp_monthly_std_dev = sp.resample('M').last().pct_change().std().values[0] inflows = {'active_annual_income':50_000, 'starting_assets': 75_000} # add our outflows dictionary outflows = {'rent':1500, 'credit_card_payment':750, 'medical_insurance':1250, 'pension_contribution':500, 'misc': 1500} variables = {'start_date' : "01/01/2020", 'years': 10, 'tax_on_active_income_gains': 0.25, 'avg_ann_income_raise':0.05, 'avg_ann_inflation': 0.02, 'tax_on_investment_gains': 0.35, 'avg_monthly_market_returns': sp_monthly_pct_return, 'avg_monthly_market_volatility': sp_monthly_std_dev} income_gains_storage = [] investment_gains_storage = [] assets_starting_list = [inflows['starting_assets']] assets_ending_list = [] months = variables['years'] * 12 # create a variable to signal if we have become financially "ruined" ruined = False for month in range(months): if assets_ending_list: assets_starting_list.append(assets_ending_list[-1]) assets = assets_starting_list[-1] assets -= (outflows['rent'] + outflows['credit_card_payment'] + \ outflows['medical_insurance'] + outflows['pension_contribution'] + \ outflows['misc']) # check if our asset base has a positive value and, if # not then set the "ruined" flag to 1 and end the simulation if assets <= 0: inv_gain = 0 ruined = True break market_return = np.random.normal(variables['avg_monthly_market_returns'], variables['avg_monthly_market_volatility'], 1)[0] investment_return = (assets * market_return) * (1 - variables['tax_on_investment_gains']) investment_gains_storage.append(investment_return) assets += investment_return income = (inflows['active_annual_income'] * (1 - variables['tax_on_active_income_gains'])) / 12 income_gains_storage.append(income) if (month % 12 == 0): inflows['active_annual_income'] *= (1 + (variables['avg_ann_income_raise'])) outflows['rent'] *= (1 + (variables['avg_ann_inflation'])) outflows['credit_card_payment'] *= (1 + (variables['avg_ann_inflation'])) outflows['medical_insurance'] *= (1 + (variables['avg_ann_inflation'])) outflows['pension_contribution'] *= (1 + (variables['avg_ann_inflation'])) outflows['misc'] *= (1 + (variables['avg_ann_inflation'])) assets += income assets_ending = assets assets_ending_list.append(assets_ending) plt.plot(pd.Series(assets_ending_list)) plt.xlabel('Month') plt.ylabel('Ending Asset Value') plt.show()

I have changed the values in the inflow and outflow dictionaries above to show a simulation that ends in “ruin” before the whole 10 year period has been simulated (I reduced the starting_assets in the inflow dictionary, and increased the medical insurance payment in the outflows dictionary FYI). We can see the x-axis only reaches as far as 36 or so months rather than the full 120 month (10 year) simulation period stated in the variables dictionary.

We will record and store the “ruined” flag a bit later on, in fact we will revisit this in the next post as this is getting a bit long as it is. Part 2 will cover a variety of things, e.g. Monte Carlo methods and subsequent probability estimates, intriducing “scenarios” with various “shock factors” implements, e.g. a one off asset-base reduction/loss, unexpected job and income loss, increase in predicted inflation costs among others.

Until next time…