Chris Holdgraf - about me¶
I work at the intersection of technical development, open communities, scientific research, and education.
I work with teams to create and improve open-source technology for scientists, educators, and data analysts. My goal is to help people do their work more effectively, openly, inclusively, and reproducibly.
You can learn a bit more about what I do on this website.
More about me
See the blog archives for a more complete list.
2020-01-22 - What do people think about rST?
Publishing computational narratives has always been a dream of the Jupyter Project, and there is still a lot of work to be done in improving these use-cases. We’ve made a lot of progress in providing open infrastructure for reproducible science with JupyterHub and the Binder Project, but what about the documents themselves? We’ve recently been working on tools like Jupyter Book, which aim to improve the writing and publishing process with the Jupyter ecosystem. This is hopefully the first post of a few that ask how we can best-improve the state of publishing with Jupyter.
Many of the ideas in this post have now made their way into a new flavor of markdown called Markedly Structured Text, or MyST. It brings all of the features of rST into Markdown. Check it out!
2019-11-11 - Testing Pandoc and Jupyter Notebooks
For several months now, the universal document converter pandoc has had support for Jupyter Notebooks. This means that with a single call, you can convert
.ipynbfiles to any of the output formats that Pandoc supports (and vice-versa!). This post is a quick exploration of what this looks like.
Note that for this post, we’re using Pandoc version 2.7.3. Also, some of what’s below is hard to interpret without actually opening the files that are created by Pandoc. For the sake of this blog post, I’m going to stick with the raw text output here, though you can expand the outputs if you wish, I recommend copy/pasting some of these commands on your own if you’d like to try.
This is the second in a series of blog posts that explores what it’d look like to directly port the governance model of other communities into the Jupyter project. You can find the first post about Rust here.
Note: These posts are meant as a thought experiment rather than a proposal. Moreover, all the usual caveats come with it, such as the fact that I don’t know the Python governance structure that well, and I might totally botch my characterization of it.
Over the last few years, it has been exciting to see the xarray project evolve, add new functionality, and mature. This post is an attempt at giving xarray another visit to see how it could integrate into electrophysiology workflows.
It is common in neuroscience to ask individuals to perform a task over and over again. You record the activity in the brain each time they perform the task (called an “epoch” or a “trial”). Time is recorded relative to some onset when the task begins. That is
t==0. The result is usually a matrix of
epochs x channejupyls x time. You can do a lot of stuff with this data, but our task in this paper is to detect changes in neural activity at trial onset (
As I’ve written about before, I like Rust’s governance structure. I mean, who can’t get behind a community that lists governance as a top-level page on its website?
Jupyter is currently in the middle of figuring out the next phase of its governance structure, and so I have been thinking about what this might look like. This post is a quick thought-experiment to explore what it’d mean to port over Rust’s governance directly into the Jupyter community.
2019-10-11 - Automating Jupyter Book deployments with CI/CD
Lately I’ve spent a lot of time trying to reduce the friction involved in deploying Jupyter Book as well as contributing to the project. Features are a great carrot, but ultimately getting engagement is also about lowering barriers to entry and showing people a path forward. Jupyter Book is a relatively straightforward project, but it involves a few technical pieces that can be painful to use (thanks Jekyll).
Recently I experimented with whether we can automate deploying a Jupyter Book online. Using continuous integration / deployment services seems like a natural place to try this out. One can upload a barebones set of code to a GitHub repository, then configure a build system to create a book and deploy it online from there. This blog post is a place to keep track of the current state of affairs for this workflow.
2019-06-25 - A few recent talks
Lately I’ve given quite a number of talks about the Jupyter and Binder ecosystems for various purposes. Before each of the talks, I make the slides available at a public address in case others are interested in following up with the material. For those who missed the talks (or the subsequent tweets about them), here are a few of the more recent ones.
A word of warning: there’s a lot of overlap between these talks - I’m not crazy enough to re-invent the wheel each time I have to speak. However, maybe folks will find some value in the different angles taken in each case.
2019-03-30 - Thoughts from the Jupyter team meeting 2019
I just got back from a week-long Jupyter team meeting that was somehow both very tiring and energizing at the same time. In the spirit of openness, I’d like to share some of my experience. While it’s still fresh in my mind, here are a few takeaways that occurred to me throughout the week.
Note that these are my personal (rough) impressions, but they shouldn’t be taken as a statement from the project/community itself.
2019-01-29 - Three things I love about CircleCI
I recently had to beef up the continuous deployment of Jupyter Book, and used it as an opportunity to learn a bit more about CircleCI’s features. It turns out, they’re pretty cool! Here are a few of the things that I learned this time around.
For those who aren’t familiar with CircleCI, it is a service that runs Continuous Integration and Continuous Deployment (CI/CD) workflows for projects. This basically means that they manage many kinds of infrastructure that can launch jobs that run test suites, deploy applications, and test on many different environments.
It should go without saying, but you should never do the stuff that you’re about to read about here. Data is meant to speak for itself, and our visualizations should accurately reflect the data above all else.*
Sometimes you want to do two things:
Plot a timeseries that handles datetimes in a clever way (e.g., with Pandas or Matplotlib)
2017-03-16 - Dates in python
As a part of setting up the website for the Docathon I’ve had to re-learn all of my date string formatting rules. It’s one of those little problems you don’t really think about - turning an arbitrary string into something structured like a date - until you’ve actually got to do it.
There are a bunch of tools in python for using date-like objects, but it’s not always easy to figure out how these work. This post is just a couple of pieces of information I’ve picked up along the process.
2017-01-04 - Matplotlib Cyclers are Great
Every now and then I come across a nifty feature in Matplotlib that I wish I’d known about earlier. The MPL documentation can be a beast to get through, and as a result you miss some cool stuff sometimes.
This is a quick demo of one such feature: the cycler.
2016-12-23 - Brainy Jingle Bells
This is a quick demo of how I created this video. Check it out below, or read on to see the code that made it!
Here’s a quick viz to show off some brainy holiday spirit.
Per a recent request somebody posted on Twitter, I thought it’d be fun to write a quick scraper for the biorxiv, an excellent new tool for posting pre-prints of articles before they’re locked down with a publisher embargo.
A big benefit of open science is the ability to use modern technologies (like web scraping) to make new use of data that would originally be unavailable to the public. One simple example of this is information and metadata about published articles. While we’re not going to dive too deeply here, maybe this will serve as inspiration for somebody else interested in scraping the web.
2016-11-30 - Visualizing publication bias
This article is now interactive! Check out a live Binder instance here
In the next few months, I’ll try to take some time to talk about the things I learn as I make my way through this literature. While it’s easy to make one-off complaints to one another about how “science is broken” without really diving into the details, it’s important learn about how it’s broken, or at least how we could assess something like this.
2016-11-01 - 5 things I learned at SciPy
I’ve finally decompressed after my first go-around with Scipy. For those who haven’t heard of this conference before, Scipy is an annual meeting where members of scientific community get together to discuss their love of Python, scientific programming, and open science. It spans both academics and people from industry, making it a unique place in terms of how software interfaces with scientific research. (if you’re interested the full set of Scipy conferences, check out here.
It was an eye-opening experience that I learned a lot from, so here’s a quick recap of some things that I learned during my first rodeo.
2016-07-08 - Could Brexit have happened by chance?
As a scientist, watching the Brexit vote was a little bit painful. Though probably not for the reason you’re thinking. No, it wasn’t the politics that bothered me, but the method for making such an incredibly important decision. Let me explain…
Scientists are a bit obsessed with the concept of error. In the context of collecting data and anaylzing it, this takes the form of our “confidence” in the results. If all the data say the same thing, then we are usually pretty confident in the overall message. If the data is more complicated than this (and it always is), then we need to define how confident we are in our conclusions.
2016-07-02 - The beauty of computational efficiency
When we discuss “computational efficiency”, you often hear people throw around phrases like \(O(n^2)\) or \(O(nlogn)\). We talk about them in the abstract, and it can be hard to appreciate what these distinctions mean and how important they are. So let’s take a quick look at what computational efficiency looks like in the context of a very famous algorithm: The Fourier Transform.
Briefly, A Fourier Transform is used for uncovering the spectral information that is present in a signal. AKA, it tells us about oscillatory components in the signal, and has a wide range of uses in communications, signal processing, and even neuroscience analysis.
2015-10-29 - NIH grant analysis
As I’m entering the final years of graduate school, I’ve been applying for a few typical “pre-doc” fellowships. One of these is the NRSA, which is notorious for requiring you to wade through forests of beaurocratic documents (seriously, their “guidelines” for writing an NRSA are over 100 pages!). Doing so ends up taking a LOT of time.
This got me wondering what kind of success rates these grants have in the first place. For those who haven’t gone through the process before, it’s a bit opaque:
2015-09-27 - Craigslist data analysis
In the last post I showed how to use a simple python bot to scrape data from Criagslist. This is a quick follow-up to take a peek at the data.
Note - data that you scrape from Craigslist is pretty limited. They tend to clear out old posts, and you can only scrape from recent posts anyway to avoid them blocking you.
2015-08-30 - Scraping craigslist
In this notebook, I’ll show you how to make a simple query on Craigslist using some nifty python modules. You can take advantage of all the structure data that exists on webpages to collect interesting datasets.
First we need to figure out how to submit a query to Craigslist. As with many websites, one way you can do this is simply by constructing the proper URL and sending it to Craigslist. Here’s a sample URL that is returned after manually typing in a search to Craigslist:
2015-05-27 - Coherence correlation
Note - you can find the nbviewer of this post here
A big question that I’ve always wrestled with is the difference between correlation and coherence. Intuitively, I think of these two things as very similar to one another. Correlation is a way to determine the extent to which two variables covary (normalized to be between -1 and 1). Coherence is similar, but instead assesses “similarity” by looking at the similarity for two variables in frequency space, rather than time space.