Testing Pandoc and Jupyter Notebooks

23 minute read

Published:

Jupyter Notebooks to markdown and html with Pandoc

For several months now, the universal document converter pandoc has had support for Jupyter Notebooks. This means that with a single call, you can convert .ipynb files to any of the output formats that Pandoc supports (and vice-versa!). This post is a quick exploration of what this looks like.

Note that for this post, we're using Pandoc version 2.7.3. Also, some of what's below is hard to interpret without actually opening the files that are created by Pandoc. For the sake of this blog post, I'm going to stick with the raw text output here, though you can expand the outputs if you wish, I recommend copy/pasting some of these commands on your own if you'd like to try.

from subprocess import run as sbrun
from subprocess import PIPE, CalledProcessError
from pathlib import Path
from IPython.display import HTML, Markdown

# A helper function to capture errors and outputs
def run(cmd, *args, **kwargs):
    try:
        out = sbrun(cmd.split(), stderr=PIPE, stdout=PIPE, check=True, *args, **kwargs)
        out = out.stdout.decode()
        if len(out) > 1:
            print(out)
    except CalledProcessError as e:
        print(e.stderr.decode())

Our base notebook

First off, let's take a look at our base notebook. We'll convert this document to both Markdown and HTML using Pandoc.

The notebook will be fairly minimal in order to make it easier to inspect its contents. It has a collection of markdown with mixed content, as well as code cells with various outputs.

See this link for the notebook we'll use.

.ipynb to markdown

Let's try converting this notebook to markdown. This should preserve as much information as possible about the input Jupyter notebook. This should include all markdown cells, cell metadata, and outputs with code cells.

A few pandoc options

Here are a few pandoc options that are relevant to our use-case:

  • --resource-path defines the path where Pandoc will look for resources that are linked in the notebook. This allows us to discover images etc that are in a different folder from where we are invocing pandoc.
  • --extract-media is a path where images and other media will be extracted at conversion time. Any links to images etc should point to files at this path in the output format.
  • -s (or --standalone) tells Pandoc that the output should be a "standalone" format. This does different things depending on the output, such as adding a header if converting to HTML.
  • -o the output file, and implicitly the output file type (e.g., markdown)
  • -t the type of output file if we want to override the default (e.g., GitHub-flavored markdown vs. Pandoc markdown).

Converting to GitHub-flavored markdown

Let's start by converting to GitHub-flavored markdown. By not specifying an output file with -o, we'll cause Pandoc to print the result to the screen, which we'll display here.

# ipynb -> gfmd
run(f'pandoc inputs/notebooks.ipynb --resource-path=inputs -s --extract-media=outputs/images -t gfm')
<div class="cell markdown">

# Here's a demo notebook

This is a demo notebook to play around with the pandoc ipynb support

## Markdown

As it is markdown, you can embed images, HTML, etc into your posts\!

![](outputs/images/ca17e56d65946db885db7f8f50a9605a6a94e6a7.jpg)

Here's one \(inline_{math}\) and

\[
math^{blocks}
\]

``` python
def my_functino():
    mystring = "you can also include python cells"
    return mystring
```

</div>

<div class="cell markdown" data-tags="[&quot;heresatag&quot;]">

# Code cells

## Matplotlib output with metadata

The below code cell has some metadata attached to it. It also outputs a
figure. Both should be included in the output format.

</div>

<div class="cell code" data-execution_count="7" data-slideshow="{&quot;slide_type&quot;:&quot;subslide&quot;}" data-tags="[&quot;mytag&quot;,&quot;parameters&quot;]">

``` python
from matplotlib import rcParams, cycler
import matplotlib.pyplot as plt
import numpy as np
plt.ion()

data = np.random.rand(2, 1000) * 100
fig, ax = plt.subplots()
ax.scatter(*data, s=data[1], c=data[0])
```

<div class="output execute_result" data-execution_count="7">

    <matplotlib.collections.PathCollection at 0x7f6e8d6269e8>

</div>

<div class="output display_data">

![](outputs/images/e843a737607d119ec5b2750a2bb737c915f1b6e8.png)

</div>

</div>

<div class="cell markdown">

## DataFrames

</div>

<div class="cell code" data-execution_count="8">

``` python
import pandas as pd
pd.DataFrame([['hi', 'there'], ['this', 'is'], ['a', 'DataFrame']], columns=['Word A', 'Word B'])
```

<div class="output execute_result" data-execution_count="8">

``` 
  Word A     Word B
0     hi      there
1   this         is
2      a  DataFrame
```

</div>

</div>

<div class="cell markdown">

# Bibliography

Let's test the bibliography here

Testing this \[bibliography @holdgraf\_rapid\_2016\]

@holdgraf\_evidence\_2014

</div>

<div class="cell markdown">

### The actual bibliography

The bibliography will be placed at the end of the file

</div>

Note that cells are divided by hard-coded <div>s, and cell-level metadata (such as tags) are encoded within the HTML (e.g. data-tags). Also note that we haven't gotten the bibliography to render, probably because we didn't enable the citeproc processor on pandoc (we'll try that later). Finally, note that there's no notebook-level metadata in this output because GFM doesn't support a YAML header.

To pandoc-flavored markdown

# ipynb -> pandoc md
run(f'pandoc inputs/notebooks.ipynb --resource-path=inputs -s --extract-media=outputs/images')
<!DOCTYPE html>
<html xmlns="http://www.w3.org/1999/xhtml" lang="" xml:lang="">
<head>
  <meta charset="utf-8" />
  <meta name="generator" content="pandoc" />
  <meta name="viewport" content="width=device-width, initial-scale=1.0, user-scalable=yes" />
  <title>notebooks</title>
  <style>
      code{white-space: pre-wrap;}
      span.smallcaps{font-variant: small-caps;}
      span.underline{text-decoration: underline;}
      div.column{display: inline-block; vertical-align: top; width: 50%;}
  </style>
  <style>
code.sourceCode > span { display: inline-block; line-height: 1.25; }
code.sourceCode > span { color: inherit; text-decoration: inherit; }
code.sourceCode > span:empty { height: 1.2em; }
.sourceCode { overflow: visible; }
code.sourceCode { white-space: pre; position: relative; }
div.sourceCode { margin: 1em 0; }
pre.sourceCode { margin: 0; }
@media screen {
div.sourceCode { overflow: auto; }
}
@media print {
code.sourceCode { white-space: pre-wrap; }
code.sourceCode > span { text-indent: -5em; padding-left: 5em; }
}
pre.numberSource code
  { counter-reset: source-line 0; }
pre.numberSource code > span
  { position: relative; left: -4em; counter-increment: source-line; }
pre.numberSource code > span > a:first-child::before
  { content: counter(source-line);
    position: relative; left: -1em; text-align: right; vertical-align: baseline;
    border: none; display: inline-block;
    -webkit-touch-callout: none; -webkit-user-select: none;
    -khtml-user-select: none; -moz-user-select: none;
    -ms-user-select: none; user-select: none;
    padding: 0 4px; width: 4em;
    color: #aaaaaa;
  }
pre.numberSource { margin-left: 3em; border-left: 1px solid #aaaaaa;  padding-left: 4px; }
div.sourceCode
  {   }
@media screen {
code.sourceCode > span > a:first-child::before { text-decoration: underline; }
}
code span.al { color: #ff0000; font-weight: bold; } /* Alert */
code span.an { color: #60a0b0; font-weight: bold; font-style: italic; } /* Annotation */
code span.at { color: #7d9029; } /* Attribute */
code span.bn { color: #40a070; } /* BaseN */
code span.bu { } /* BuiltIn */
code span.cf { color: #007020; font-weight: bold; } /* ControlFlow */
code span.ch { color: #4070a0; } /* Char */
code span.cn { color: #880000; } /* Constant */
code span.co { color: #60a0b0; font-style: italic; } /* Comment */
code span.cv { color: #60a0b0; font-weight: bold; font-style: italic; } /* CommentVar */
code span.do { color: #ba2121; font-style: italic; } /* Documentation */
code span.dt { color: #902000; } /* DataType */
code span.dv { color: #40a070; } /* DecVal */
code span.er { color: #ff0000; font-weight: bold; } /* Error */
code span.ex { } /* Extension */
code span.fl { color: #40a070; } /* Float */
code span.fu { color: #06287e; } /* Function */
code span.im { } /* Import */
code span.in { color: #60a0b0; font-weight: bold; font-style: italic; } /* Information */
code span.kw { color: #007020; font-weight: bold; } /* Keyword */
code span.op { color: #666666; } /* Operator */
code span.ot { color: #007020; } /* Other */
code span.pp { color: #bc7a00; } /* Preprocessor */
code span.sc { color: #4070a0; } /* SpecialChar */
code span.ss { color: #bb6688; } /* SpecialString */
code span.st { color: #4070a0; } /* String */
code span.va { color: #19177c; } /* Variable */
code span.vs { color: #4070a0; } /* VerbatimString */
code span.wa { color: #60a0b0; font-weight: bold; font-style: italic; } /* Warning */
  </style>
  <!--[if lt IE 9]>
    <script src="//cdnjs.cloudflare.com/ajax/libs/html5shiv/3.7.3/html5shiv-printshiv.min.js"></script>
  <![endif]-->
</head>
<body>
<div class="cell markdown">
<h1 id="heres-a-demo-notebook">Here's a demo notebook</h1>
<p>This is a demo notebook to play around with the pandoc ipynb support</p>
<h2 id="markdown">Markdown</h2>
<p>As it is markdown, you can embed images, HTML, etc into your posts!</p>
<p><img src="outputs/images/ca17e56d65946db885db7f8f50a9605a6a94e6a7.jpg" /></p>
<p>Here's one <span class="math inline"><em>i</em><em>n</em><em>l</em><em>i</em><em>n</em><em>e</em><sub><em>m</em><em>a</em><em>t</em><em>h</em></sub></span> and</p>
<p><br /><span class="math display"><em>m</em><em>a</em><em>t</em><em>h</em><sup><em>b</em><em>l</em><em>o</em><em>c</em><em>k</em><em>s</em></sup></span><br /></p>
<div class="sourceCode" id="cb1"><pre class="sourceCode python"><code class="sourceCode python"><span id="cb1-1"><a href="#cb1-1"></a><span class="kw">def</span> my_functino():</span>
<span id="cb1-2"><a href="#cb1-2"></a>    mystring <span class="op">=</span> <span class="st">&quot;you can also include python cells&quot;</span></span>
<span id="cb1-3"><a href="#cb1-3"></a>    <span class="cf">return</span> mystring</span></code></pre></div>
</div>
<div class="cell markdown" data-tags="[&quot;heresatag&quot;]">
<h1 id="code-cells">Code cells</h1>
<h2 id="matplotlib-output-with-metadata">Matplotlib output with metadata</h2>
<p>The below code cell has some metadata attached to it. It also outputs a figure. Both should be included in the output format.</p>
</div>
<div class="cell code" data-execution_count="7" data-slideshow="{&quot;slide_type&quot;:&quot;subslide&quot;}" data-tags="[&quot;mytag&quot;,&quot;parameters&quot;]">
<div class="sourceCode" id="cb2"><pre class="sourceCode python"><code class="sourceCode python"><span id="cb2-1"><a href="#cb2-1"></a><span class="im">from</span> matplotlib <span class="im">import</span> rcParams, cycler</span>
<span id="cb2-2"><a href="#cb2-2"></a><span class="im">import</span> matplotlib.pyplot <span class="im">as</span> plt</span>
<span id="cb2-3"><a href="#cb2-3"></a><span class="im">import</span> numpy <span class="im">as</span> np</span>
<span id="cb2-4"><a href="#cb2-4"></a>plt.ion()</span>
<span id="cb2-5"><a href="#cb2-5"></a></span>
<span id="cb2-6"><a href="#cb2-6"></a>data <span class="op">=</span> np.random.rand(<span class="dv">2</span>, <span class="dv">1000</span>) <span class="op">*</span> <span class="dv">100</span></span>
<span id="cb2-7"><a href="#cb2-7"></a>fig, ax <span class="op">=</span> plt.subplots()</span>
<span id="cb2-8"><a href="#cb2-8"></a>ax.scatter(<span class="op">*</span>data, s<span class="op">=</span>data[<span class="dv">1</span>], c<span class="op">=</span>data[<span class="dv">0</span>])</span></code></pre></div>
<div class="output execute_result" data-execution_count="7">
<pre><code>&lt;matplotlib.collections.PathCollection at 0x7f6e8d6269e8&gt;</code></pre>
</div>
<div class="output display_data">
<p><img src="outputs/images/e843a737607d119ec5b2750a2bb737c915f1b6e8.png" /></p>
</div>
</div>
<div class="cell markdown">
<h2 id="dataframes">DataFrames</h2>
</div>
<div class="cell code" data-execution_count="8">
<div class="sourceCode" id="cb4"><pre class="sourceCode python"><code class="sourceCode python"><span id="cb4-1"><a href="#cb4-1"></a><span class="im">import</span> pandas <span class="im">as</span> pd</span>
<span id="cb4-2"><a href="#cb4-2"></a>pd.DataFrame([[<span class="st">&#39;hi&#39;</span>, <span class="st">&#39;there&#39;</span>], [<span class="st">&#39;this&#39;</span>, <span class="st">&#39;is&#39;</span>], [<span class="st">&#39;a&#39;</span>, <span class="st">&#39;DataFrame&#39;</span>]], columns<span class="op">=</span>[<span class="st">&#39;Word A&#39;</span>, <span class="st">&#39;Word B&#39;</span>])</span></code></pre></div>
<div class="output execute_result" data-execution_count="8">
<div>
<style scoped>
    .dataframe tbody tr th:only-of-type {
        vertical-align: middle;
    }

    .dataframe tbody tr th {
        vertical-align: top;
    }

    .dataframe thead th {
        text-align: right;
    }
</style>
<table border="1" class="dataframe">
  <thead>
    <tr style="text-align: right;">
      <th></th>
      <th>Word A</th>
      <th>Word B</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <th>0</th>
      <td>hi</td>
      <td>there</td>
    </tr>
    <tr>
      <th>1</th>
      <td>this</td>
      <td>is</td>
    </tr>
    <tr>
      <th>2</th>
      <td>a</td>
      <td>DataFrame</td>
    </tr>
  </tbody>
</table>
</div>
</div>
</div>
<div class="cell markdown">
<h1 id="bibliography">Bibliography</h1>
<p>Let's test the bibliography here</p>
<p>Testing this [bibliography @holdgraf_rapid_2016]</p>
<p>@holdgraf_evidence_2014</p>
</div>
<div class="cell markdown">
<h3 id="the-actual-bibliography">The actual bibliography</h3>
<p>The bibliography will be placed at the end of the file</p>
</div>
</body>
</html>

Now we've got something a little bit cleaner without all the hard-coded HTML. The ::: fences are how Pandoc-flavored markdown denote different divs, and cell-level metadata is encoded similar to how GFM worked.

.ipynb to HTML

Next let's try converting .ipynb to HTML. This should let us view the notebook as a web-page as well as include all of the extra metadata inside the HTML elements. We'll start with a vanilla HTML conversion. Note that the only thing we had to do was change the output file extension to .html and Pandoc inferred the output type for us:

# ipynb -> HTML
run(f'pandoc inputs/notebooks.ipynb --resource-path=inputs -s --extract-media=outputs/images')
<!DOCTYPE html>
<html xmlns="http://www.w3.org/1999/xhtml" lang="" xml:lang="">
<head>
  <meta charset="utf-8" />
  <meta name="generator" content="pandoc" />
  <meta name="viewport" content="width=device-width, initial-scale=1.0, user-scalable=yes" />
  <title>notebooks</title>
  <style>
      code{white-space: pre-wrap;}
      span.smallcaps{font-variant: small-caps;}
      span.underline{text-decoration: underline;}
      div.column{display: inline-block; vertical-align: top; width: 50%;}
  </style>
  <style>
code.sourceCode > span { display: inline-block; line-height: 1.25; }
code.sourceCode > span { color: inherit; text-decoration: inherit; }
code.sourceCode > span:empty { height: 1.2em; }
.sourceCode { overflow: visible; }
code.sourceCode { white-space: pre; position: relative; }
div.sourceCode { margin: 1em 0; }
pre.sourceCode { margin: 0; }
@media screen {
div.sourceCode { overflow: auto; }
}
@media print {
code.sourceCode { white-space: pre-wrap; }
code.sourceCode > span { text-indent: -5em; padding-left: 5em; }
}
pre.numberSource code
  { counter-reset: source-line 0; }
pre.numberSource code > span
  { position: relative; left: -4em; counter-increment: source-line; }
pre.numberSource code > span > a:first-child::before
  { content: counter(source-line);
    position: relative; left: -1em; text-align: right; vertical-align: baseline;
    border: none; display: inline-block;
    -webkit-touch-callout: none; -webkit-user-select: none;
    -khtml-user-select: none; -moz-user-select: none;
    -ms-user-select: none; user-select: none;
    padding: 0 4px; width: 4em;
    color: #aaaaaa;
  }
pre.numberSource { margin-left: 3em; border-left: 1px solid #aaaaaa;  padding-left: 4px; }
div.sourceCode
  {   }
@media screen {
code.sourceCode > span > a:first-child::before { text-decoration: underline; }
}
code span.al { color: #ff0000; font-weight: bold; } /* Alert */
code span.an { color: #60a0b0; font-weight: bold; font-style: italic; } /* Annotation */
code span.at { color: #7d9029; } /* Attribute */
code span.bn { color: #40a070; } /* BaseN */
code span.bu { } /* BuiltIn */
code span.cf { color: #007020; font-weight: bold; } /* ControlFlow */
code span.ch { color: #4070a0; } /* Char */
code span.cn { color: #880000; } /* Constant */
code span.co { color: #60a0b0; font-style: italic; } /* Comment */
code span.cv { color: #60a0b0; font-weight: bold; font-style: italic; } /* CommentVar */
code span.do { color: #ba2121; font-style: italic; } /* Documentation */
code span.dt { color: #902000; } /* DataType */
code span.dv { color: #40a070; } /* DecVal */
code span.er { color: #ff0000; font-weight: bold; } /* Error */
code span.ex { } /* Extension */
code span.fl { color: #40a070; } /* Float */
code span.fu { color: #06287e; } /* Function */
code span.im { } /* Import */
code span.in { color: #60a0b0; font-weight: bold; font-style: italic; } /* Information */
code span.kw { color: #007020; font-weight: bold; } /* Keyword */
code span.op { color: #666666; } /* Operator */
code span.ot { color: #007020; } /* Other */
code span.pp { color: #bc7a00; } /* Preprocessor */
code span.sc { color: #4070a0; } /* SpecialChar */
code span.ss { color: #bb6688; } /* SpecialString */
code span.st { color: #4070a0; } /* String */
code span.va { color: #19177c; } /* Variable */
code span.vs { color: #4070a0; } /* VerbatimString */
code span.wa { color: #60a0b0; font-weight: bold; font-style: italic; } /* Warning */
  </style>
  <!--[if lt IE 9]>
    <script src="//cdnjs.cloudflare.com/ajax/libs/html5shiv/3.7.3/html5shiv-printshiv.min.js"></script>
  <![endif]-->
</head>
<body>
<div class="cell markdown">
<h1 id="heres-a-demo-notebook">Here's a demo notebook</h1>
<p>This is a demo notebook to play around with the pandoc ipynb support</p>
<h2 id="markdown">Markdown</h2>
<p>As it is markdown, you can embed images, HTML, etc into your posts!</p>
<p><img src="outputs/images/ca17e56d65946db885db7f8f50a9605a6a94e6a7.jpg" /></p>
<p>Here's one <span class="math inline"><em>i</em><em>n</em><em>l</em><em>i</em><em>n</em><em>e</em><sub><em>m</em><em>a</em><em>t</em><em>h</em></sub></span> and</p>
<p><br /><span class="math display"><em>m</em><em>a</em><em>t</em><em>h</em><sup><em>b</em><em>l</em><em>o</em><em>c</em><em>k</em><em>s</em></sup></span><br /></p>
<div class="sourceCode" id="cb1"><pre class="sourceCode python"><code class="sourceCode python"><span id="cb1-1"><a href="#cb1-1"></a><span class="kw">def</span> my_functino():</span>
<span id="cb1-2"><a href="#cb1-2"></a>    mystring <span class="op">=</span> <span class="st">&quot;you can also include python cells&quot;</span></span>
<span id="cb1-3"><a href="#cb1-3"></a>    <span class="cf">return</span> mystring</span></code></pre></div>
</div>
<div class="cell markdown" data-tags="[&quot;heresatag&quot;]">
<h1 id="code-cells">Code cells</h1>
<h2 id="matplotlib-output-with-metadata">Matplotlib output with metadata</h2>
<p>The below code cell has some metadata attached to it. It also outputs a figure. Both should be included in the output format.</p>
</div>
<div class="cell code" data-execution_count="7" data-slideshow="{&quot;slide_type&quot;:&quot;subslide&quot;}" data-tags="[&quot;mytag&quot;,&quot;parameters&quot;]">
<div class="sourceCode" id="cb2"><pre class="sourceCode python"><code class="sourceCode python"><span id="cb2-1"><a href="#cb2-1"></a><span class="im">from</span> matplotlib <span class="im">import</span> rcParams, cycler</span>
<span id="cb2-2"><a href="#cb2-2"></a><span class="im">import</span> matplotlib.pyplot <span class="im">as</span> plt</span>
<span id="cb2-3"><a href="#cb2-3"></a><span class="im">import</span> numpy <span class="im">as</span> np</span>
<span id="cb2-4"><a href="#cb2-4"></a>plt.ion()</span>
<span id="cb2-5"><a href="#cb2-5"></a></span>
<span id="cb2-6"><a href="#cb2-6"></a>data <span class="op">=</span> np.random.rand(<span class="dv">2</span>, <span class="dv">1000</span>) <span class="op">*</span> <span class="dv">100</span></span>
<span id="cb2-7"><a href="#cb2-7"></a>fig, ax <span class="op">=</span> plt.subplots()</span>
<span id="cb2-8"><a href="#cb2-8"></a>ax.scatter(<span class="op">*</span>data, s<span class="op">=</span>data[<span class="dv">1</span>], c<span class="op">=</span>data[<span class="dv">0</span>])</span></code></pre></div>
<div class="output execute_result" data-execution_count="7">
<pre><code>&lt;matplotlib.collections.PathCollection at 0x7f6e8d6269e8&gt;</code></pre>
</div>
<div class="output display_data">
<p><img src="outputs/images/e843a737607d119ec5b2750a2bb737c915f1b6e8.png" /></p>
</div>
</div>
<div class="cell markdown">
<h2 id="dataframes">DataFrames</h2>
</div>
<div class="cell code" data-execution_count="8">
<div class="sourceCode" id="cb4"><pre class="sourceCode python"><code class="sourceCode python"><span id="cb4-1"><a href="#cb4-1"></a><span class="im">import</span> pandas <span class="im">as</span> pd</span>
<span id="cb4-2"><a href="#cb4-2"></a>pd.DataFrame([[<span class="st">&#39;hi&#39;</span>, <span class="st">&#39;there&#39;</span>], [<span class="st">&#39;this&#39;</span>, <span class="st">&#39;is&#39;</span>], [<span class="st">&#39;a&#39;</span>, <span class="st">&#39;DataFrame&#39;</span>]], columns<span class="op">=</span>[<span class="st">&#39;Word A&#39;</span>, <span class="st">&#39;Word B&#39;</span>])</span></code></pre></div>
<div class="output execute_result" data-execution_count="8">
<div>
<style scoped>
    .dataframe tbody tr th:only-of-type {
        vertical-align: middle;
    }

    .dataframe tbody tr th {
        vertical-align: top;
    }

    .dataframe thead th {
        text-align: right;
    }
</style>
<table border="1" class="dataframe">
  <thead>
    <tr style="text-align: right;">
      <th></th>
      <th>Word A</th>
      <th>Word B</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <th>0</th>
      <td>hi</td>
      <td>there</td>
    </tr>
    <tr>
      <th>1</th>
      <td>this</td>
      <td>is</td>
    </tr>
    <tr>
      <th>2</th>
      <td>a</td>
      <td>DataFrame</td>
    </tr>
  </tbody>
</table>
</div>
</div>
</div>
<div class="cell markdown">
<h1 id="bibliography">Bibliography</h1>
<p>Let's test the bibliography here</p>
<p>Testing this [bibliography @holdgraf_rapid_2016]</p>
<p>@holdgraf_evidence_2014</p>
</div>
<div class="cell markdown">
<h3 id="the-actual-bibliography">The actual bibliography</h3>
<p>The bibliography will be placed at the end of the file</p>
</div>
</body>
</html>

This time our math rendered properly, along with everything else except for the bibliography. Let's get that working now.

We've included a bibliography with our input file. With this (and using the citeproc citation style, we can use pandoc-citeproc to automatically render a bibliography within each page. To do so, we've used the following extra options:

  • --bibliography specifies the path to a BibTex file
  • -f ipynb+citations tells Pandoc that our input format has citations in it. Before, the ipynb was inferred from the input extension. Now we've made it explicit as well.
# ipynb -> HTML with citations
run(f'pandoc inputs/notebooks.ipynb -f ipynb+citations --bibliography inputs/references.bib --resource-path=inputs -s --extract-media=outputs/images')
<!DOCTYPE html>
<html xmlns="http://www.w3.org/1999/xhtml" lang="" xml:lang="">
<head>
  <meta charset="utf-8" />
  <meta name="generator" content="pandoc" />
  <meta name="viewport" content="width=device-width, initial-scale=1.0, user-scalable=yes" />
  <title>notebooks</title>
  <style>
      code{white-space: pre-wrap;}
      span.smallcaps{font-variant: small-caps;}
      span.underline{text-decoration: underline;}
      div.column{display: inline-block; vertical-align: top; width: 50%;}
  </style>
  <style>
code.sourceCode > span { display: inline-block; line-height: 1.25; }
code.sourceCode > span { color: inherit; text-decoration: inherit; }
code.sourceCode > span:empty { height: 1.2em; }
.sourceCode { overflow: visible; }
code.sourceCode { white-space: pre; position: relative; }
div.sourceCode { margin: 1em 0; }
pre.sourceCode { margin: 0; }
@media screen {
div.sourceCode { overflow: auto; }
}
@media print {
code.sourceCode { white-space: pre-wrap; }
code.sourceCode > span { text-indent: -5em; padding-left: 5em; }
}
pre.numberSource code
  { counter-reset: source-line 0; }
pre.numberSource code > span
  { position: relative; left: -4em; counter-increment: source-line; }
pre.numberSource code > span > a:first-child::before
  { content: counter(source-line);
    position: relative; left: -1em; text-align: right; vertical-align: baseline;
    border: none; display: inline-block;
    -webkit-touch-callout: none; -webkit-user-select: none;
    -khtml-user-select: none; -moz-user-select: none;
    -ms-user-select: none; user-select: none;
    padding: 0 4px; width: 4em;
    color: #aaaaaa;
  }
pre.numberSource { margin-left: 3em; border-left: 1px solid #aaaaaa;  padding-left: 4px; }
div.sourceCode
  {   }
@media screen {
code.sourceCode > span > a:first-child::before { text-decoration: underline; }
}
code span.al { color: #ff0000; font-weight: bold; } /* Alert */
code span.an { color: #60a0b0; font-weight: bold; font-style: italic; } /* Annotation */
code span.at { color: #7d9029; } /* Attribute */
code span.bn { color: #40a070; } /* BaseN */
code span.bu { } /* BuiltIn */
code span.cf { color: #007020; font-weight: bold; } /* ControlFlow */
code span.ch { color: #4070a0; } /* Char */
code span.cn { color: #880000; } /* Constant */
code span.co { color: #60a0b0; font-style: italic; } /* Comment */
code span.cv { color: #60a0b0; font-weight: bold; font-style: italic; } /* CommentVar */
code span.do { color: #ba2121; font-style: italic; } /* Documentation */
code span.dt { color: #902000; } /* DataType */
code span.dv { color: #40a070; } /* DecVal */
code span.er { color: #ff0000; font-weight: bold; } /* Error */
code span.ex { } /* Extension */
code span.fl { color: #40a070; } /* Float */
code span.fu { color: #06287e; } /* Function */
code span.im { } /* Import */
code span.in { color: #60a0b0; font-weight: bold; font-style: italic; } /* Information */
code span.kw { color: #007020; font-weight: bold; } /* Keyword */
code span.op { color: #666666; } /* Operator */
code span.ot { color: #007020; } /* Other */
code span.pp { color: #bc7a00; } /* Preprocessor */
code span.sc { color: #4070a0; } /* SpecialChar */
code span.ss { color: #bb6688; } /* SpecialString */
code span.st { color: #4070a0; } /* String */
code span.va { color: #19177c; } /* Variable */
code span.vs { color: #4070a0; } /* VerbatimString */
code span.wa { color: #60a0b0; font-weight: bold; font-style: italic; } /* Warning */
  </style>
  <!--[if lt IE 9]>
    <script src="//cdnjs.cloudflare.com/ajax/libs/html5shiv/3.7.3/html5shiv-printshiv.min.js"></script>
  <![endif]-->
</head>
<body>
<div class="cell markdown">
<h1 id="heres-a-demo-notebook">Here's a demo notebook</h1>
<p>This is a demo notebook to play around with the pandoc ipynb support</p>
<h2 id="markdown">Markdown</h2>
<p>As it is markdown, you can embed images, HTML, etc into your posts!</p>
<p><img src="outputs/images/ca17e56d65946db885db7f8f50a9605a6a94e6a7.jpg" /></p>
<p>Here's one <span class="math inline"><em>i</em><em>n</em><em>l</em><em>i</em><em>n</em><em>e</em><sub><em>m</em><em>a</em><em>t</em><em>h</em></sub></span> and</p>
<p><br /><span class="math display"><em>m</em><em>a</em><em>t</em><em>h</em><sup><em>b</em><em>l</em><em>o</em><em>c</em><em>k</em><em>s</em></sup></span><br /></p>
<div class="sourceCode" id="cb1"><pre class="sourceCode python"><code class="sourceCode python"><span id="cb1-1"><a href="#cb1-1"></a><span class="kw">def</span> my_functino():</span>
<span id="cb1-2"><a href="#cb1-2"></a>    mystring <span class="op">=</span> <span class="st">&quot;you can also include python cells&quot;</span></span>
<span id="cb1-3"><a href="#cb1-3"></a>    <span class="cf">return</span> mystring</span></code></pre></div>
</div>
<div class="cell markdown" data-tags="[&quot;heresatag&quot;]">
<h1 id="code-cells">Code cells</h1>
<h2 id="matplotlib-output-with-metadata">Matplotlib output with metadata</h2>
<p>The below code cell has some metadata attached to it. It also outputs a figure. Both should be included in the output format.</p>
</div>
<div class="cell code" data-execution_count="7" data-slideshow="{&quot;slide_type&quot;:&quot;subslide&quot;}" data-tags="[&quot;mytag&quot;,&quot;parameters&quot;]">
<div class="sourceCode" id="cb2"><pre class="sourceCode python"><code class="sourceCode python"><span id="cb2-1"><a href="#cb2-1"></a><span class="im">from</span> matplotlib <span class="im">import</span> rcParams, cycler</span>
<span id="cb2-2"><a href="#cb2-2"></a><span class="im">import</span> matplotlib.pyplot <span class="im">as</span> plt</span>
<span id="cb2-3"><a href="#cb2-3"></a><span class="im">import</span> numpy <span class="im">as</span> np</span>
<span id="cb2-4"><a href="#cb2-4"></a>plt.ion()</span>
<span id="cb2-5"><a href="#cb2-5"></a></span>
<span id="cb2-6"><a href="#cb2-6"></a>data <span class="op">=</span> np.random.rand(<span class="dv">2</span>, <span class="dv">1000</span>) <span class="op">*</span> <span class="dv">100</span></span>
<span id="cb2-7"><a href="#cb2-7"></a>fig, ax <span class="op">=</span> plt.subplots()</span>
<span id="cb2-8"><a href="#cb2-8"></a>ax.scatter(<span class="op">*</span>data, s<span class="op">=</span>data[<span class="dv">1</span>], c<span class="op">=</span>data[<span class="dv">0</span>])</span></code></pre></div>
<div class="output execute_result" data-execution_count="7">
<pre><code>&lt;matplotlib.collections.PathCollection at 0x7f6e8d6269e8&gt;</code></pre>
</div>
<div class="output display_data">
<p><img src="outputs/images/e843a737607d119ec5b2750a2bb737c915f1b6e8.png" /></p>
</div>
</div>
<div class="cell markdown">
<h2 id="dataframes">DataFrames</h2>
</div>
<div class="cell code" data-execution_count="8">
<div class="sourceCode" id="cb4"><pre class="sourceCode python"><code class="sourceCode python"><span id="cb4-1"><a href="#cb4-1"></a><span class="im">import</span> pandas <span class="im">as</span> pd</span>
<span id="cb4-2"><a href="#cb4-2"></a>pd.DataFrame([[<span class="st">&#39;hi&#39;</span>, <span class="st">&#39;there&#39;</span>], [<span class="st">&#39;this&#39;</span>, <span class="st">&#39;is&#39;</span>], [<span class="st">&#39;a&#39;</span>, <span class="st">&#39;DataFrame&#39;</span>]], columns<span class="op">=</span>[<span class="st">&#39;Word A&#39;</span>, <span class="st">&#39;Word B&#39;</span>])</span></code></pre></div>
<div class="output execute_result" data-execution_count="8">
<div>
<style scoped>
    .dataframe tbody tr th:only-of-type {
        vertical-align: middle;
    }

    .dataframe tbody tr th {
        vertical-align: top;
    }

    .dataframe thead th {
        text-align: right;
    }
</style>
<table border="1" class="dataframe">
  <thead>
    <tr style="text-align: right;">
      <th></th>
      <th>Word A</th>
      <th>Word B</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <th>0</th>
      <td>hi</td>
      <td>there</td>
    </tr>
    <tr>
      <th>1</th>
      <td>this</td>
      <td>is</td>
    </tr>
    <tr>
      <th>2</th>
      <td>a</td>
      <td>DataFrame</td>
    </tr>
  </tbody>
</table>
</div>
</div>
</div>
<div class="cell markdown">
<h1 id="bibliography">Bibliography</h1>
<p>Let's test the bibliography here</p>
<p>Testing this <span class="citation" data-cites="holdgraf_rapid_2016">(bibliography Holdgraf et al. 2016)</span></p>
<p><span class="citation" data-cites="holdgraf_evidence_2014">Holdgraf et al. (2014)</span></p>
</div>
<div class="cell markdown">
<h3 id="the-actual-bibliography">The actual bibliography</h3>
<p>The bibliography will be placed at the end of the file</p>
</div>
<div id="refs" class="references" role="doc-bibliography">
<div id="ref-holdgraf_evidence_2014">
<p>Holdgraf, Christopher Ramsay, Wendy de Heer, Brian N. Pasley, and Robert T. Knight. 2014. “Evidence for Predictive Coding in Human Auditory Cortex.” In <em>International Conference on Cognitive Neuroscience</em>. Brisbane, Australia, Australia: Frontiers in Neuroscience.</p>
</div>
<div id="ref-holdgraf_rapid_2016">
<p>Holdgraf, Christopher Ramsay, Wendy de Heer, Brian N. Pasley, Jochem W. Rieger, Nathan Crone, Jack J. Lin, Robert T. Knight, and Frédéric E. Theunissen. 2016. “Rapid Tuning Shifts in Human Auditory Cortex Enhance Speech Intelligibility.” <em>Nature Communications</em> 7 (May): 13654. <a href="https://doi.org/10.1038/ncomms13654">https://doi.org/10.1038/ncomms13654</a>.</p>
</div>
</div>
</body>
</html>

Now we've got citations at the bottom of the page, and in-line references interspersed in the text. Pretty cool!

Wrapping up

It seems like we can get pretty far with converting .ipynb files into various flavors of markdown or HTML. My guess is that things will get a bit trickier if we tried to do this with more complex cell outputs or metdata, but it's a good start. Using Pandoc also means that it would be relatively straightforward to convert notebooks into latex, pdf, or even Microsoft Word format. I'll try to dig into this more in the future.

Leave a Comment