Use Jupyter Notebook to create cells automatically

Make no mistake about it — Python’s goal is to optimize readability. Computer programming is already a difficult endeavor. Python wants to make it a little bit easier.

One of the biggest fundamental choices of how to start your Python programming project is knowing which integrated development environment (IDE) to use. Now if you’re a data scientist, it would be a mistake to not at least consider Jupyter Notebook. One of the biggest reasons why many first fall in love with Jupyter is that it allows the user to execute a portion of their code without having to run everything all at once.

Jupyter also has a lot of tips and tricks on how to make your project easier and one of the [not-so] hidden abilities is using a module to affect Jupyter itself. That module is called IPython.

There’s little information online as to how to create several cells at once, but it has its uses. Let’s say you’d want to upload several CSV files at once and save them into their own separate cell? There’s a way to do that! Before we can do that, let’s first create the data. I’ll include three data frames. We could easily do that manually without this package, but as one can imagine, if we were working with dozens or more different files, it would be prudent to automate the process. Now let’s get started! We’ll do things the way that involves more grunt work.

import pandas as pd

df1 = pd.read_csv('class2.csv1)

df

df2 = pd.read_csv('class2.csv')

df3 = pd.read_csv('class3.csv')

This took some cells to create and type in. Sure, there are shortcuts. You can click on a cell and type in a to create a cell above the selected cell or b to create one below. You can also copy and paste and make some adjustments, such as replacing class1.csv with class2.csv, but a huge philosophy about Python is that automation is possible. After all, humans are inherently prone to mistakes. Now let’s try this again with one cell that exists to define the parameters, one that can tell Jupyter to automatically create more cells.

from IPython.core.getipython import get_ipython #importing the module

file_list = ['class1.csv', 'class2.csv', 'class3.csv'] 
name_list = ['df1', 'df2', 'df3']

def create_new_cell(contents): 
    shell = get_ipython() 
    payload = dict(
        source='set_next_input',
        text=contents,
        replace=False,
    )
    shell.payload_manager.write_payload(payload, single=False)

def get_df(file_name, df_name):
    content = '''{df} = pd.read_csv('{file}')'''\
               .format(df=df_name, file=file_name)
    create_new_cell(content)

for file, name in zip(file_list, name_list):
    get_df(file, name)

A lot happened in this cell, so we’ll break this down a bit. We created file_list, which is filled with the CSV files that we’ve already used before. Next, we created name_list, which is filled with the names of the data frames, entered in corresponding order to the file list.

Next, we created a function, where we generated the framework to use IPython to generate a dictionary-based payload that will feed into the next function.

The next function, get_df, is used to format the terms of the read_csv. It takes the file and the name from the two lists we already created. It is important to recognize that the content is what will generate in the created cells, and so string formatting is paramount here.

Running this cell will automatically generate the next three cells, which you can then run on your own — or you can highlight all the cells and press ctrl + enter, which will run all the cells at once. Check the output.

We can even take things one step further. Let’s say you know that you want to do the same thing to all of these data frames beyond merely importing the cells, you can do that by editing the content variable. Considering these data frames all have the same columns, I’ll add in visualization.

from IPython.core.getipython import get_ipython
import seaborn as sns

file_list = ['class1.csv', 'class2.csv', 'class3.csv']
name_list = ['df1', 'df2', 'df3']

def create_new_cell(contents):
    shell = get_ipython()
    payload = dict(
        source='set_next_input',
        text=contents,
        replace=False,
    )
    shell.payload_manager.write_payload(payload, single=False)

def get_df(file_name, df_name):
    content = '''{df} = pd.read_csv('{file}')\nsns.barplot(data={df}, x='Name', y='Test Score')'''\
               .format(df=df_name, file=file_name)
    create_new_cell(content)

for file, name in zip(file_list, name_list):
    get_df(file, name)

If you run this, the output will look something like this:

It’s extremely important to note that when you add new lines, you’ll need to make use of the string command \n, which tells the string to output a new line.

This article is a tool, but one that can work to make your next project to work easier and less tedious. It takes hassle and allows Python to do your job for you, so that you can save time, reduce the ability for error, and allow you to focus on what you do best — programming!