Coding in Notebooks

From Sustainability Methods

Introduction

There are two main types of python files: scripts and notebooks. Scripts are files that contain the extension .py while notebooks have .ipynb). Coding notebook is sort of a software document where you can write and run code in separate cells. Also, you can have text cells where you put your annotations or explanations of the code or any other thing. Lastly, when you run your code cells, an output cell will be produce containing either the output of your code or an error. Jupyter notebook is very popular for any kind of python data programmers. The alternate version for that is Google Colab or Kaggle notebooks. If you are coming from coding in R, the notebook format will seem a bit strange (but you also have the option to work with notebooks in Rstudio).

The concept of coding in notebooks revolves around an interactive environment that blends code execution, text explanations, and visualizations into a single narrative. This approach fosters an iterative workflow where you can experiment, analyze results, and document your progress all within one place.

Logic of Notebook Coding

Here's what makes notebooks a powerful tool:

  1. Interactive Execution: Notebooks allow you to write and execute code in discrete sections called cells. This lets you test small code snippets and see the immediate output, facilitating a more exploratory coding style.
  2. Modularization: Code is organized into cells, which can contain code, text explanations, visualizations, or a combination of these. This modularity makes the code easier to read, understand, and modify.
  3. Iteration: Cells can be executed individually or in sequence, allowing you to experiment with small code snippets, see the results immediately, and refine your approach as you go. This promotes a more exploratory and dynamic coding style.
  4. Documentation and Explanation: You can interweave code cells with text cells containing explanations, markdown formatting, and even images. This creates a documented record of your thought process and analysis steps, not only for you, but for your colleages.
  5. Visualization Integration:Embedding visualizations directly within the notebook allows you to see how your code affects the data and results visually. This promotes a more intuitive understanding of the analysis.

In essence, what makes coding notebooks crucial for data analysis and science is that it streamlines tasks for data inspection and exploration, data cleaning and wrangling, data visualization, development of machine learning models, among others.


Jupyter Notebook: The Pioneering Tool

When refering to coding in notebooks, Jupyter Notebook is the original and most widely used notebook environment. It features a web-based interface where:

  1. Code and text cells are arranged sequentially.
  2. Different cell types allow for code execution (code cells), explanations (markdown cells), and output display (including images).
  3. Markdown formatting enables rich text elements like headings, bullet points, and embedded images within text cells.
  4. Jupyter Notebook supports various kernels, allowing code execution in different languages like Python, R, Julia, etc.

JupyterLab: More than a Jupyter Notebook

JupyterLab is a more recent and advanced environment that builds upon Jupyter Notebook. In contrast to the sole notebook, Jupyter Lab offers several enhancements:

  • Modular Interface: A customizable user interface with a sidebar for file management, tabs for multiple notebooks, and a separate console area.
  • Coding Formats: Not only notebooks, but also the usual python scripts, as well as sole markdown and text files can be created and edited in jupyter lab.
  • Richer Extensions: A wider range of extensions that can add functionalities like code completion, version control integration, and custom visualization tools.
  • Integrated File System: A more robust file system view, allowing for easier management of notebooks, data files, and project directories.

If you are getting serious with data analysis and data science, it is better to opt for Jupyter Lab directly.

Getting started with Jupyter Notebooks

(1) Text cells

Text cells allow you to write text in Markdown format. To change the appearance of words and phrases, you add Markdown syntax instead of clicking buttons. For example, you can create bold text by surrounding the text with double asterisks.

Markdown Rendered Output
**bold text** bold text
_italicized text_ italicized text
<font color='green'>colored text</font> colored text
<mark>highlighted text</mark> highlighted text

You can find an overview of Markdown syntax on https://www.markdownguide.org/basic-syntax/ or https://medium.com/analytics-vidhya/the-ultimate-markdown-guide-for-jupyter-notebook-d5e5abf728fd. Unfortunately, Jupyter Notebooks do not support all features of the Markdown format. Therefore, you may come across some things in those sources that don't work in notebooks.

(2) Code cells

The more interesting part of Jupyter Notebooks is code cells. Code cells look different from text cells, and if you work in JupyterHub you can also see in the top navigation bar that the cell type is displayed as "Code" when you click on a code cell. What you write in these cells will be executed according to the rules of the Python programming language - not the rules of the Markdown language.

If you attempted to write plain text in the code cell and execute it, you most likely received an error message... that was intentional ;)

The reason for this is that the text you write in code cells must follow the "grammar" of the Python programming language - which is somewhat different from the grammar of natural language. The following example uses the correct "grammar" and therefore runs smoothly.

# code following the "grammar" of the Python language runs smoothly
print("Hello World")
Hello World

Before we dive into the rules of Python, let's first answer a very basic question: What actually happens when we execute a code cell?

Here's a highly simplified explanation: The code you've written is first translated into a series of instructions that the computer can understand (zeros and ones). These instructions are then executed - either on your computer if you're using Anaconda or in the cloud if you're using JupyterHub or Colab - and if the instructions produce a result, it is returned and displayed below the code cell. In other words, Python code is translated into machine language, executed, and the result is returned to you.

With this knowledge, you can probably guess why you received an error message earlier. The "translation rules" that Python uses to convert to machine language could not be applied to our input. Therefore, the computer couldn't understand the input and returned an error message.

Editing Jupyter Notebooks

Next, you will learn how to edit Jupyter Notebooks. The buttons depend on how you are using Jupyter Notebooks (e.g. JupyterHub, Google Colab, Anaconda). We will provide the details for JupyterHub and Colab here. You do not need to remember all of this straight away, but can look things up here when needed.

  • adding a new code cell: click the + symbol (JupyterHub) or +Code symbol (Colab) in the upper left corner of your notebook
  • adding a new Markdown text cell: click the + symbol in the upper left corner of your notebook and change the cell type of the new cell from Code to Markdown using the dropdown menu (JupyterHub). If you are using Google Colab, you can simply click the +Text symbol in the upper left corner.
  • changing the cell type: click on the cell and use the dropdown menu to change the cell type from Code to Markdown or vice versa (JupyterHub)
  • writing in cells: click on the area of the cell or directly on the spot where you want to make a change (JupyterHub and Colab)
  • executing code cells: You can execute code cells by clicking the "Run" button (in JupyterHub: at the top of the navigation bar; in Google Colab: on the left side of the cell) or by pressing Shift+Enter when you have selected the code cell.
  • executing all cells: if you have a long notebook and do not want to execute each cell manually, you can execute all cells in the notebook with one button: click on the two arrows that look like a fast forward button (JupyterHub) or go to Runtime > Run all (Colab)
  • getting a nicely rendered output from Markdown cells: if you edited a Markdown cell or accidentally clicked on it, it looks "ugly". To get a nicely rendered output you can click the "Run" button or press Shift+Enter when you have selected the cell.
  • editing cells: under the Edit tab (JupyterHub and Colab) you can copy, paste, delete and undo cells.
  • downloading a notebook: if you want to download a Jupyter notebook, go to File > Download > .ipynb (JupyterHub) or to File > Download as > Notebook (.ipynb) (Colab)

Clicking all these buttons can become tedious. Keyboard shortcuts can save you a lot of time. In JupyterHub the respective shortcuts are indicated in the menu. Note that you need to be in the correct mode for some of the shortcuts to work: click on the left to the cell you want to edit, so that the left border of the cell turns blue. Now you can use the shortcuts. In Colab you can get an overview of the shortcuts under Tools > Keyboard shortcuts.

Summary

  1. Jupyter Lab is an Integrated Development Environment that can combine text, images, and executable code when working with Noteboks.
  2. Text contained in text cells conforms to Markdown format and can be formatted accordingly.
  3. Code contained in code cells needs to be written according to the rules of the Python programming language.
  4. Keyboard shortcuts save a lot of time when editing Jupyter Notebooks.

References

[1] https://jupyter.org/ [2] https://www.dataquest.io/blog/jupyter-notebook-tutorial/

The authors of this entry are Wanja Tolksdorf and Gustavo Rodriguez.