Home Basic Data Analysis Native Excel Functionality with Python and openpyxl Module

Native Excel Functionality with Python and openpyxl Module

by s666

In this article I shall be looking into using the Python module “openpyxl” to manipulate data within the Python ecosystem, while also being able to tap into excel functionality directly. I believe Python is a much better ecosystem within which to do any kind of data munging/analysis, however Excel is a much used platform, favoured by many as the means of final presentation once the munging/analysis has been completed.

Also, while using Python within a business office environment, users can often come across the situation whereby they need to present their findings/results to their colleagues who often times don’t even have Python or the necessary modules installed.

So how about this – being able to use Python to do the data analysis grunt work, AND being able to use Python to call native Excel methodology including conditional formatting, chart creation/insertion among other things, rather than having to save tabular data to excel/csv and THEN opening Excel and creating charts and including formatting that way.

Sounds good to me!

I am going to use the same sample data I used a couple of blog posts ago – the “Financial Sample.xlsx” file which can be downloaded below if you wish to follow along.

Firstly I shall read in the data using Pandas and then just save it again to a new Excel workbook, just to show you what the output looks like and our startung point in wanting to make things look…well, just..better!

import pandas as pd

#read in data from relevant excel file
df = pd.read_excel('Financial Sample.xlsx',index_col='Date',parse_dates=True)

#convert pandas DataFrame index into a "datetime" index and sort chronologically
df.index = pd.to_datetime(df.index)
df.sort_index(inplace=True)

#limit data to the first 100 rows of data for future ease of presentation
df = df.iloc[:100]

#output data to a new excel file
df.to_excel('Simple Output.xlsx')

We can see immediately that the column widths do not adapt to the width of the data held within them, meaning that manyof our column headers are cut off, and the Date column (column 1) actually shows the data as “#s” instead of actual dates – this is again because the width of the data i s such that it doesn’t fit within the visible area of said column.

We could of course manually resize each column to the relevant size that wouldallow data to atcually be shown properly, but manual tobe means time consuming – and of course we can use the openpyxl module to automate this process anyway.

So instead of just using Pandas to output our data to a new “.xlsx” file, we are going to impot openpyxl and start using that.

We will define a quick helper function that will allow us to treat the data held within each column as text, and then set the column width to the width of this text value, plus 2 to make sure our columns are wide enough to show all the data held within each column without having to resize manually.

def as_text(value):
    if value is None:
        return ""
    return str(value)

from openpyxl import Workbook
from openpyxl.utils.dataframe import dataframe_to_rows
import pandas as pd

#read in data from relevant excel file
df = pd.read_excel('Financial Sample.xlsx',index_col='Date',parse_dates=True)

#convert pandas DataFrame index into a "datetime" index and sort chronologically
df.index = pd.to_datetime(df.index)
df.sort_index(inplace=True)

#limit data to the first 100 rows of data for future ease of presentation
df = df.iloc[:100]

#create a new Workbook object in memory
wb = Workbook()

#set the worksheet we want to  work as, as the active worksheet within the new workbook object
ws = wb.active

#this fucntion is a native openpyxl function that allows us towork specifically with Pandas DataFrames
#it allows us to iterate through the rows and append each one to our active worksheet.
for r in dataframe_to_rows(df, index=True, header=True):
        ws.append(r)
        
#we could just save the file at this point and we would have a full working ".xlsx" version of the data
#however it would suffer fro, the same column width issues that we experinced when using Pandas to save.
#instead we iterate through each column in the worksheet and set the width variable to the maximum width
#of data held witihin that particular column, then set the width of that column to that value, plus 2
#(just to make sure the data shows fully)

for column_cells in ws.columns:
    length = max(len(as_text(cell.value)) for cell in column_cells)
    ws.column_dimensions[column_cells[0].column].width = length + 2

#save the Workbook currently held in memory, to a file location on your harddrive
wb.save('openpyxl_1.xlsx')

We can see from the screen shot above that the data is now fully visible in each column and that the columns have indeed resized automatically. One issue that I notice is that the title of the “Date” column has disappeared as it is infact now an index rather than an offical “column” as classed by Pandas.

We can quickly just add a new line to the above code to fill in this value:

from openpyxl import Workbook
from openpyxl.utils.dataframe import dataframe_to_rows
import pandas as pd

#read in data from relevant excel file
df = pd.read_excel('Financial Sample.xlsx',index_col='Date',parse_dates=True)

#convert pandas DataFrame index into a "datetime" index and sort chronologically
df.index = pd.to_datetime(df.index)
df.sort_index(inplace=True)

#limit data to the first 100 rows of data for future ease of presentation
df = df.iloc[:100]

#create a new Workbook object in memory
wb = Workbook()

#set the worksheet we want to  work as, as the active worksheet within the new workbook object
ws = wb.active

#this fucntion is a native openpyxl function that allows us towork specifically with Pandas DataFrames
#it allows us to iterate through the rows and append each one to our active worksheet.
for r in dataframe_to_rows(df, index=True, header=True):
        ws.append(r)
        
#we could just save the file at this point and we would have a full working ".xlsx" version of the data
#however it would suffer fro, the same column width issues that we experinced when using Pandas to save.
#instead we iterate through each column in the worksheet and set the width variable to the maximum width
#of data held witihin that particular column, then set the width of that column to that value, plus 2
#(just to make sure the data shows fully)

for column_cells in ws.columns:
    length = max(len(as_text(cell.value)) for cell in column_cells)
    ws.column_dimensions[column_cells[0].column].width = length + 2
    
#set title of "Date" column
ws['A1'] = "Date"

#save the Workbook currently held in memory, to a file location on your harddrive
wb.save('openpyxl_1.xlsx')

Let us now use some more openpyxl magic, along with some Pandas fucntionality to create a pivot table from our data, paste it into a new sheet (which we will name) and then create a bar chart (a native Excel bar chart – NOT a Python or Pandas chart) to visualise our data.

from openpyxl import Workbook
from openpyxl.utils.dataframe import dataframe_to_rows
from openpyxl.chart import BarChart, Reference
import pandas as pd

#read in data from relevant excel file
df = pd.read_excel('Financial Sample.xlsx',index_col='Date',parse_dates=True)

#convert pandas DataFrame index into a "datetime" index and sort chronologically
df.index = pd.to_datetime(df.index)
df.sort_index(inplace=True)

#limit data to the first 100 rows of data for future ease of presentation
df = df.iloc[:100]

#create a new Workbook object in memory
wb = Workbook()

#set the worksheet we want to  work as, as the active worksheet within the new workbook object
ws = wb.active

#this fucntion is a native openpyxl function that allows us towork specifically with Pandas DataFrames
#it allows us to iterate through the rows and append each one to our active worksheet.
for r in dataframe_to_rows(df, index=True, header=True):
        ws.append(r)
        
#we could just save the file at this point and we would have a full working ".xlsx" version of the data
#however it would suffer fro, the same column width issues that we experinced when using Pandas to save.
#instead we iterate through each column in the worksheet and set the width variable to the maximum width
#of data held witihin that particular column, then set the width of that column to that value, plus 2
#(just to make sure the data shows fully)

for column_cells in ws.columns:
    length = max(len(as_text(cell.value)) for cell in column_cells)
    ws.column_dimensions[column_cells[0].column].width = length + 2
    
#set title of "Date" column
ws['A1'] = "Date"

#create and name new worksheet
ws = wb.create_sheet(title="Main Pivot Table")

#create pivoyt table using Pandas
data_piv = df.pivot_table(values='Units Sold',index='Segment',columns='Product',aggfunc='sum')

#append pivot table data to new worksheet
for r in dataframe_to_rows(data_piv, index=True, header=True):
        ws.append(r)

#set up and create new bar chart    
chart1 = BarChart()
chart1.type = "col"
chart1.style = 10
chart1.title = "Bar Chart"
chart1.y_axis.title = 'Units Sold'
chart1.x_axis.title = 'Segment'

#set data range for bar chart
data = Reference(ws, min_col=2, min_row=1, max_row=ws.max_row, max_col=ws.max_column)
cats = Reference(ws, min_col=1, min_row=2, max_row=ws.max_row)
chart1.add_data(data, titles_from_data=True)
chart1.set_categories(cats)
chart1.shape = 4

#place chart at specific cell reference - one row after the last currently populated row
ws.add_chart(chart1, "A"+str(ws.max_row +  1))

#save the Workbook currently held in memory, to a file location on your harddrive
wb.save('openpyxl_1.xlsx')

Lovely, we can see from the above that we have a lovely looking bar chart representaion of our pivot table data- and also note that this is a native Excel chart rather than a Pandas bar chart. A Pandas bar chart is shown below for comparison.

What we can do now, is use Python to iterate through the countries contained within the DataFrame “Countries” column, and create a worksheet for each country, containing data only relevant to that country. I have chosen to do this as a random example of what can be done – of course the logic could be applied to any categorical column within the DataFrame.

I am going to be including all the previous code in each example paste of code thatI do, so that at the end we have a fully executable code snippet that can be run in it’s entirety, and that was built up iteratively as we go along. I believe that is the best way to learn to build more and more complex routines…

So let’s get onto creating our “Country” worksheets!

from openpyxl import Workbook
from openpyxl.utils.dataframe import dataframe_to_rows
from openpyxl.chart import BarChart, Reference
import pandas as pd

#read in data from relevant excel file
df = pd.read_excel('Financial Sample.xlsx',index_col='Date',parse_dates=True)

#convert pandas DataFrame index into a "datetime" index and sort chronologically
df.index = pd.to_datetime(df.index)
df.sort_index(inplace=True)

#limit data to the first 100 rows of data for future ease of presentation
df = df.iloc[:100]

#create a new Workbook object in memory
wb = Workbook()

#set the worksheet we want to  work as, as the active worksheet within the new workbook object
ws = wb.active

#this fucntion is a native openpyxl function that allows us towork specifically with Pandas DataFrames
#it allows us to iterate through the rows and append each one to our active worksheet.
for r in dataframe_to_rows(df, index=True, header=True):
        ws.append(r)
        
#we could just save the file at this point and we would have a full working ".xlsx" version of the data
#however it would suffer fro, the same column width issues that we experinced when using Pandas to save.
#instead we iterate through each column in the worksheet and set the width variable to the maximum width
#of data held witihin that particular column, then set the width of that column to that value, plus 2
#(just to make sure the data shows fully)

for column_cells in ws.columns:
    length = max(len(as_text(cell.value)) for cell in column_cells)
    ws.column_dimensions[column_cells[0].column].width = length + 2
    
#set title of "Date" column
ws['A1'] = "Date"

#create and name new worksheet
ws = wb.create_sheet(title="Main Pivot Table")

#create pivoyt table using Pandas
data_piv = df.pivot_table(values='Units Sold',index='Segment',columns='Product',aggfunc='sum')

#append pivot table data to new worksheet
for r in dataframe_to_rows(data_piv, index=True, header=True):
        ws.append(r)

#set up and create new bar chart    
chart1 = BarChart()
chart1.type = "col"
chart1.style = 10
chart1.title = "Bar Chart"
chart1.y_axis.title = 'Units Sold'
chart1.x_axis.title = 'Segment'

#set data range for bar chart
data = Reference(ws, min_col=2, min_row=1, max_row=ws.max_row, max_col=ws.max_column)
cats = Reference(ws, min_col=1, min_row=2, max_row=ws.max_row)
chart1.add_data(data, titles_from_data=True)
chart1.set_categories(cats)
chart1.shape = 4

#place chart at specific cell reference - one row after the last currently populated row
ws.add_chart(chart1, "A"+str(ws.max_row +  1))

#create list of unique countries held within DataFrame "Country" column
countries = list(set(df['Country'].tolist()))

#iterate through list of countries
for country in countries:
    #create new worksheet with name of specifc country in current iteration
    ws = wb.create_sheet(title=country)
    #filter data to hold only data for said specific country
    data = df[df['Country'] == country]
    
    #append data to new worksheet
    for r in dataframe_to_rows(data, index=True, header=True):
        ws.append(r)
        
    #set title of "Date" column
    ws['A1'] = "Date"
    
    #use code we ecrated previously to make sure column widths are sufficent to show all data
    for column_cells in ws.columns:
        length = max(len(as_text(cell.value)) for cell in column_cells)
        ws.column_dimensions[column_cells[0].column].width = length + 2

#save the Workbook currently held in memory, to a file location on your harddrive
wb.save('openpyxl_1.xlsx')

We can see from the screen shot above that we now have an individual worksheet for each of the unique countries within the DataFrame data, along with the relevant data pasted into each relevant sheet. Note that all the column widths are also such that we can clearly see all the data without having to manually adjust column widths.

from openpyxl import Workbook
from openpyxl.utils.dataframe import dataframe_to_rows
from openpyxl.chart import BarChart, Reference, LineChart
import pandas as pd

#read in data from relevant excel file
df = pd.read_excel('Financial Sample.xlsx',index_col='Date',parse_dates=True)

#convert pandas DataFrame index into a "datetime" index and sort chronologically
df.index = pd.to_datetime(df.index)
df.sort_index(inplace=True)

#limit data to the first 100 rows of data for future ease of presentation
df = df.iloc[:100]

#create a new Workbook object in memory
wb = Workbook()

#set the worksheet we want to  work as, as the active worksheet within the new workbook object
ws = wb.active

#this fucntion is a native openpyxl function that allows us towork specifically with Pandas DataFrames
#it allows us to iterate through the rows and append each one to our active worksheet.
for r in dataframe_to_rows(df, index=True, header=True):
        ws.append(r)
        
#we could just save the file at this point and we would have a full working ".xlsx" version of the data
#however it would suffer fro, the same column width issues that we experinced when using Pandas to save.
#instead we iterate through each column in the worksheet and set the width variable to the maximum width
#of data held witihin that particular column, then set the width of that column to that value, plus 2
#(just to make sure the data shows fully)

for column_cells in ws.columns:
    length = max(len(as_text(cell.value)) for cell in column_cells)
    ws.column_dimensions[column_cells[0].column].width = length + 2
    
#set title of "Date" column
ws['A1'] = "Date"

#create and name new worksheet
ws = wb.create_sheet(title="Main Pivot Table")

#create pivoyt table using Pandas
data_piv = df.pivot_table(values='Units Sold',index='Segment',columns='Product',aggfunc='sum')

#append pivot table data to new worksheet
for r in dataframe_to_rows(data_piv, index=True, header=True):
        ws.append(r)

#set up and create new bar chart    
chart1 = BarChart()
chart1.type = "col"
chart1.style = 10
chart1.title = "Bar Chart"
chart1.y_axis.title = 'Units Sold'
chart1.x_axis.title = 'Segment'

#set data range for bar chart
data = Reference(ws, min_col=2, min_row=1, max_row=ws.max_row, max_col=ws.max_column)
cats = Reference(ws, min_col=1, min_row=2, max_row=ws.max_row)
chart1.add_data(data, titles_from_data=True)
chart1.set_categories(cats)
chart1.shape = 4

#place chart at specific cell reference - one row after the last currently populated row
ws.add_chart(chart1, "A"+str(ws.max_row +  1))

#create list of unique countries held within DataFrame "Country" column
countries = list(set(df['Country'].tolist()))

#iterate through list of countries
for country in countries:
    #create new worksheet with name of specifc country in current iteration
    ws = wb.create_sheet(title=country)
    #filter data to hold only data for said specific country
    data = df[df['Country'] == country]
    #add cumulative profit column that we can chart with a line chart
    data['Cum Profit'] = data['Profit'].cumsum()
    
    #append data to new worksheet
    for r in dataframe_to_rows(data, index=True, header=True):
        ws.append(r)
        
    #set title of "Date" column
    ws['A1'] = "Date"
    
    #use code we ecrated previously to make sure column widths are sufficent to show all data
    for column_cells in ws.columns:
        length = max(len(as_text(cell.value)) for cell in column_cells)
        ws.column_dimensions[column_cells[0].column].width = length + 2
        
        
    c1 = LineChart()
    c1.title = "Profit Chart"
    c1.style = 13
    c1.y_axis.title = 'Cumulative Profit'
    c1.x_axis.title = 'Time'

    data = Reference(ws, min_col=17, min_row=1, max_row=ws.max_row)
    c1.add_data(data, titles_from_data=True)

    # Style the lines
    s1 = c1.series[0]
    s1.smooth = True # Make the line smooth
    s1.marker.graphicalProperties.solidFill = "FF0000" # Marker filling
    s1.marker.graphicalProperties.line.solidFill = "FF0000" # Marker outline

    ws.add_chart(c1, "A10")

#save the Workbook currently held in memory, to a file location on your harddrive
wb.save('openpyxl_1.xlsx')

Great – we now have a line chart of our “Cum Profit” column! The x-axis currently however shows time points as incrementing integers rather than date-time points…let’s see if we can change this.

For this one, we are going to stop limiting our initiak DataFrame data to being only the first 100 rows – and instead we are going to use the full data available. This is to allow us to have enough monthky data points ti plot a relevant line chart.

from openpyxl import Workbook
from openpyxl.utils.dataframe import dataframe_to_rows
from openpyxl.chart import BarChart, Reference, LineChart
from openpyxl.chart.axis import DateAxis
import pandas as pd

#read in data from relevant excel file
df = pd.read_excel('Financial Sample.xlsx',index_col='Date',parse_dates=True)

#convert pandas DataFrame index into a "datetime" index and sort chronologically
df.index = pd.to_datetime(df.index)
df.sort_index(inplace=True)

#create a new Workbook object in memory
wb = Workbook()

#set the worksheet we want to  work as, as the active worksheet within the new workbook object
ws = wb.active

#this fucntion is a native openpyxl function that allows us towork specifically with Pandas DataFrames
#it allows us to iterate through the rows and append each one to our active worksheet.
for r in dataframe_to_rows(df, index=True, header=True):
        ws.append(r)
        
#we could just save the file at this point and we would have a full working ".xlsx" version of the data
#however it would suffer fro, the same column width issues that we experinced when using Pandas to save.
#instead we iterate through each column in the worksheet and set the width variable to the maximum width
#of data held witihin that particular column, then set the width of that column to that value, plus 2
#(just to make sure the data shows fully)

for column_cells in ws.columns:
    length = max(len(as_text(cell.value)) for cell in column_cells)
    ws.column_dimensions[column_cells[0].column].width = length + 2
    
#set title of "Date" column
ws['A1'] = "Date"

#create and name new worksheet
ws = wb.create_sheet(title="Main Pivot Table")

#create pivoyt table using Pandas
data_piv = df.pivot_table(values='Units Sold',index='Segment',columns='Product',aggfunc='sum')

#append pivot table data to new worksheet
for r in dataframe_to_rows(data_piv, index=True, header=True):
        ws.append(r)

#set up and create new bar chart    
chart1 = BarChart()
chart1.type = "col"
chart1.style = 10
chart1.title = "Bar Chart"
chart1.y_axis.title = 'Units Sold'
chart1.x_axis.title = 'Segment'

#set data range for bar chart
data = Reference(ws, min_col=2, min_row=1, max_row=ws.max_row, max_col=ws.max_column)
cats = Reference(ws, min_col=1, min_row=2, max_row=ws.max_row)
chart1.add_data(data, titles_from_data=True)
chart1.set_categories(cats)
chart1.shape = 4

#place chart at specific cell reference - one row after the last currently populated row
ws.add_chart(chart1, "A"+str(ws.max_row +  1))

#create list of unique countries held within DataFrame "Country" column
countries = list(set(df['Country'].tolist()))

#iterate through list of countries
for country in countries:
    #create new worksheet with name of specifc country in current iteration
    ws = wb.create_sheet(title=country)
    #filter data to hold only data for said specific country
    data = df[df['Country'] == country]
    #add cumulative profit column that we can chart with a line chart
    data['Cum Profit'] = data['Profit'].cumsum()
    
    #append data to new worksheet
    for r in dataframe_to_rows(data, index=True, header=True):
        ws.append(r)
        
    #set title of "Date" column
    ws['A1'] = "Date"
    
    #use code we ecrated previously to make sure column widths are sufficent to show all data
    for column_cells in ws.columns:
        length = max(len(as_text(cell.value)) for cell in column_cells)
        ws.column_dimensions[column_cells[0].column].width = length + 2
        
        
    c1 = LineChart()
    c1.title = "Profit Chart with Date Axis"
    c1.style = 13
    c1.y_axis.title = 'Cumulative Profit'
    c1.x_axis.title = 'Time'
    c1.y_axis.crossAx = 500
    c1.x_axis = DateAxis(crossAx=100)
    c1.x_axis.number_format = 'd-mmm'
    c1.x_axis.majorTimeUnit = "months"
    

    data = Reference(ws, min_col=17, min_row=1, max_row=ws.max_row)
    c1.add_data(data, titles_from_data=True)
    dates = Reference(ws, min_col=1, min_row=2, max_row=ws.max_row)
    c1.set_categories(dates)

    # Style the lines
    s1 = c1.series[0]
    s1.smooth = True # Make the line smooth
    s1.marker.graphicalProperties.solidFill = "FF0000" # Marker filling
    s1.marker.graphicalProperties.line.solidFill = "FF0000" # Marker outline

    ws.add_chart(c1, "A10")

#save the Workbook currently held in memory, to a file location on your harddrive
wb.save('openpyxl_1.xlsx')

We can now see our line chart has a datetime axis, rather than an integer axis. Fantastic!

I think what I will do finally, for this blog post, is to insert some conditional formatting into our pivot table cells so that we can quickly, and easily visually identify high and low values – and well, it’s juts a good excuse to show an example of conditional formatting.

from openpyxl import Workbook
from openpyxl.utils.dataframe import dataframe_to_rows
from openpyxl.chart import BarChart, Reference, LineChart
from openpyxl.chart.axis import DateAxis
from openpyxl.utils import get_column_letter
from openpyxl.formatting.rule import ColorScaleRule
import pandas as pd

#read in data from relevant excel file
df = pd.read_excel('Financial Sample.xlsx',index_col='Date',parse_dates=True)

#convert pandas DataFrame index into a "datetime" index and sort chronologically
df.index = pd.to_datetime(df.index)
df.sort_index(inplace=True)

#create a new Workbook object in memory
wb = Workbook()

#set the worksheet we want to  work as, as the active worksheet within the new workbook object
ws = wb.active

#this fucntion is a native openpyxl function that allows us towork specifically with Pandas DataFrames
#it allows us to iterate through the rows and append each one to our active worksheet.
for r in dataframe_to_rows(df, index=True, header=True):
        ws.append(r)
        
#we could just save the file at this point and we would have a full working ".xlsx" version of the data
#however it would suffer fro, the same column width issues that we experinced when using Pandas to save.
#instead we iterate through each column in the worksheet and set the width variable to the maximum width
#of data held witihin that particular column, then set the width of that column to that value, plus 2
#(just to make sure the data shows fully)

for column_cells in ws.columns:
    length = max(len(as_text(cell.value)) for cell in column_cells)
    ws.column_dimensions[column_cells[0].column].width = length + 2
    
#set title of "Date" column
ws['A1'] = "Date"

#create and name new worksheet
ws = wb.create_sheet(title="Main Pivot Table")

#create pivoyt table using Pandas
data_piv = df.pivot_table(values='Units Sold',index='Segment',columns='Product',aggfunc='sum')

#append pivot table data to new worksheet
for r in dataframe_to_rows(data_piv, index=True, header=True):
        ws.append(r)
        
#add some conditional formatting to our pivit table data
# Add a three-color scale
#This scale is a "percentile" scale with the 50% percentile being represented by white, below this would be
#more and more red, while above this midpoint would be more and more green
ws.conditional_formatting.add('A1:'+get_column_letter(ws.max_column)+str(ws.max_row),
           ColorScaleRule(start_type='percentile', start_value=10, start_color='ff0000',
                        mid_type='percentile', mid_value=50, mid_color='ffffff',
                        end_type='percentile', end_value=90, end_color='00ff00')) 

#set up and create new bar chart    
chart1 = BarChart()
chart1.type = "col"
chart1.style = 10
chart1.title = "Bar Chart"
chart1.y_axis.title = 'Units Sold'
chart1.x_axis.title = 'Segment'

#set data range for bar chart
data = Reference(ws, min_col=2, min_row=1, max_row=ws.max_row, max_col=ws.max_column)
cats = Reference(ws, min_col=1, min_row=2, max_row=ws.max_row)
chart1.add_data(data, titles_from_data=True)
chart1.set_categories(cats)
chart1.shape = 4

#place chart at specific cell reference - one row after the last currently populated row
ws.add_chart(chart1, "A"+str(ws.max_row +  1))

#create list of unique countries held within DataFrame "Country" column
countries = list(set(df['Country'].tolist()))

#iterate through list of countries
for country in countries:
    #create new worksheet with name of specifc country in current iteration
    ws = wb.create_sheet(title=country)
    #filter data to hold only data for said specific country
    data = df[df['Country'] == country]
    #add cumulative profit column that we can chart with a line chart
    data['Cum Profit'] = data['Profit'].cumsum()
    
    #append data to new worksheet
    for r in dataframe_to_rows(data, index=True, header=True):
        ws.append(r)
        
    #set title of "Date" column
    ws['A1'] = "Date"
    
    #use code we ecrated previously to make sure column widths are sufficent to show all data
    for column_cells in ws.columns:
        length = max(len(as_text(cell.value)) for cell in column_cells)
        ws.column_dimensions[column_cells[0].column].width = length + 2
        
        
    c1 = LineChart()
    c1.title = "Profit Chart with Date Axis"
    c1.style = 13
    c1.y_axis.title = 'Cumulative Profit'
    c1.x_axis.title = 'Time'
    c1.y_axis.crossAx = 500
    c1.x_axis = DateAxis(crossAx=100)
    c1.x_axis.number_format = 'd-mmm'
    c1.x_axis.majorTimeUnit = "months"
    

    data = Reference(ws, min_col=17, min_row=1, max_row=ws.max_row)
    c1.add_data(data, titles_from_data=True)
    dates = Reference(ws, min_col=1, min_row=2, max_row=ws.max_row)
    c1.set_categories(dates)

    # Style the lines
    s1 = c1.series[0]
    s1.smooth = True # Make the line smooth
    s1.marker.graphicalProperties.solidFill = "FF0000" # Marker filling
    s1.marker.graphicalProperties.line.solidFill = "FF0000" # Marker outline

    ws.add_chart(c1, "A10")

#save the Workbook currently held in memory, to a file location on your harddrive
wb.save('openpyxl_1.xlsx')

From the above screenshot we can see that conditional formatting has indeed been applied to the pivot table we created earlier- this is a three-colour scale which sets the mid percentile (i.e. the 50th percentile) as white, with anythiong above this being green and anything below being red – the intensity of the colour increases the further away the value is from the 50th percentile value.

I think I’ll leave it there for now – although we have just scratched teh surface of openpyxl, hopefully I have been able to show the power of the module, in it’s ability to leverage off of native Excel functionality while using Python and Pandas under the hood for the data munging/analysis.

Being able to use native Excel functionality to create charts and add formatting etc, is fundementally different from using Python funrionality and then just pasting in images of those charts.

If there is any point anyone would like expanded or any particular openpyxl functionality they are interested in, do let me know.

Anyway, until next time…

You may also like

2 comments

Weng Hoe Lee July 22, 2018 - 1:29 pm

typo ‘teh’ in the 4th-last paragraph

Reply
Daniel Kramer September 27, 2019 - 10:20 am

When running the script I am getting an error message: “NameError: name ‘value’ is not defined”.

Please help.

Thank you!

Daniel

Reply

Leave a Reply

%d bloggers like this: