DataFrame – Input & Output

In this article, you will discover how to load data as input from an Excel workbook and then create a new Excel workbook as output to save your work with Python.

There are several ways to load an Excel workbook with Python depending on the file format.

The best way with Python to manipulate data is to use the DataFrame object. A DataFrame is a data structure in two dimensions, a table of rows and named columns like an Excel spreadsheet or a SQL table. DataFrame is often used for data analytics. This object is common through different languages. To manipulate DataFrame with Python, you need to use the Pandas library. To install pandas, you can use the pip install pandas command in your terminal if the pip tool is installed on your machine and you can install the package in your condo environmement if you use anaconda which is a distribution for Python to simplify package management and deployment.

Republic of Malta

Input – Load an Excel File ‘.xlsx’

To load an Excel file with data in a worksheet, you can use the read_excel method by specifying several arguments. The read_excel function supports the following extensions : xls, xlsx, xls, xlsb, odf, ods, odt. Your Excel workbook will not be opened and data will be automatically stored in the pandas DataFrame object. Be careful, do not have an empty column in your data table. The example below demonstrates how you can use the read_excel function.

import pandas as pd

# input files - initialisation
path_file_database = "C:/Users/Name/INPUT_DATA.xlsx"

# loading dataframe
df_data = pd.read_excel(path_file_database, sheet_name="Data", index_col = None)

The df_data variable corresponds to the loading data frame containing data from your excel sheet named ‘Data’.

You can use the usecols argument to store one specified column in your DataFrame from your excel sheet.

# loading DataFrame
df_data = pd.read_excel(path_file_database, sheet_name="Data", index_col=None, uselcols="A:E")

You can consult the pandas documentation for more details on the read_excel function and his arguments.

Input – Load an Excel File ‘.csv

To load an Excel file with a csv format, you can use the read_csv. This function supports only the csv extension. Be careful to specify the separator of your data and the decimal separator.

import pandas as pd

# input files - initialisation
path_file_database = "C:/Users/Name/INPUT_DATA.xlsx"

# loading dataframe
df_data = pd.read_csv(path_file_database, sep=";", decimal=".", header=0)

Input – Load an Excel File ‘.csv’

Using pandas specific read functions you can also load many types of data such as : a SAS database with pd.read_sas, a json database with pd.read_json, a html database with pd.read_html, a xml database with pd.read_xml, a SQL database with pd.read_sql


Output – Load an Excel File ‘.csv’

As seen in the Run Script Python article, by using openpyxl library you can create an excel workbook. You will find below a function with two arguments: the name of the file to save and the path of the file to save to create a workbook with a sheet named “Data – Results”.

def create_workbook_sheets(name_file_to_save, path_file_to_save):
    """
    Procedure to create and save the final Excel workbook and sheets
    --------------
    : param name_file_to_save: name of the file to save for the workbook
    : param path_file_to_save: the path to save the file
    """

    # create a new excel workbook with one sheet
    wb = op.Workbook()
    wb.title = name_file_to_save
    sh_data = wb.active
    sh_data.title = "Data - Results"

   # zoom & gridlines
   sh_data.sheet_view.zoomScale = 90
   sh_data.sheet_view.showGridLines = True

    # save the workbook
    wb.save(filename = path_file_to_save)

This function above can be applied in your python script :

path_folder = "C:/Users/Name/"
name_file = "Data_Results"
create_workbook_sheets(name_file, path_folder)

Output – Write DataFrame in Excel workbook

Now you have a new workbook and you need to write the result of your analysis process in different Excel sheets. The example below demonstrates how you can display a DataFrame object in a workbook. This function uses two arguments: the path of the excel workbook and the DataFrame. You need to use the ExcelWriter function of pandas which allows you to write the DataFrame in an Excel sheet.

import openpyxl as op
import pandas as pd

def fill_data_workbook(path_file_to_save, df_data):
    """
    Procedure to fill workbook sheet of the final workbook with DataFrame.
    ----------
    :param path_file_to_save : name of the file to save
    :param df_data : the DataFrame to write
    """

    # write DataFrame on an Excel file
    wb_result = op.load_workbook(path_file_to_save)
    with pd.ExcelWriter(path_file_to_save, engine="openpyxl", mode="a") as writer:
        writer.book = wb_result
        # ExcelWriter uses writer.sheets to access sheets of the workbook
        writer.sheets = dict((ws.title, ws) for ws in writer.book.worksheets)
        df_data.to_excel(writer, "Data - Results)
        writer.save()
    
    # save workbook
    wb_result.save(filename = path_file_to_save)

Output – Layout sheet in Excel workbook

The example below demonstrates how you can do layout in your excel sheet using openpyxl. You need to load the workbook first and apply available functions on cells, ranges, rows, columns objects.

wb_result = op.load_workbook(path_file_to_save)

sh_data = wb_results["Data"]

# layout rows on sheet "Data"
last_row = sh_data.max_row
for row_range in range(1, last_row+1):
    sh_data.cell(row=row_range, column=1).fill = PatternFill(start_color="7030A0", end_color="7030A0", fill_type = "solid")
    sh_data.cell(row=row_range, column=1).font = Font(size=11, color="FFFFF", bold = True)

# layout named columns on sheet "Data"
for col_range in range(1, 4):
    sh_data.cell(row=1, column=col_range).fill = PatternFill(start_color="7030A0", end_color="7030A0", fill_type = "solid")
    sh_data.cell(row=1, column= col_range).font = Font(size=11, color="FFFFF", bold=True)
    sh_data.cell(row=1, column= col_range).alignment = Alignment(horizontal="center", vertical="center", wrap_text=True)

sh_data.row_dimensions[1].height=27
sh_data.column_dimensions["B"].width=10
sh_data.column_dimensions["B"].width=10

# save workbook
wb_result.save(filename=path_file_to_save)