All Posts

What do people think about rST?

Published:

Publishing computational narratives has always been a dream of the Jupyter Project, and there is still a lot of work to be done in improving these use-cases. Weve made a lot of progress in providing open infrastructure for reproducible science with JupyterHub and the Binder Project, but what about the documents themselves? Weve recently been working on tools like Jupyter Book, which aim to improve the writing and publishing process with the Jupyter ecosystem. This is hopefully the first post of a few that ask how we can best-improve the state of publishing with Jupyter. Python has a fairly sophisticated publishing tool in its stack. Sphinx has been a staple

Testing Pandoc and Jupyter Notebooks

Published:

Jupyter Notebooks to markdown and html with Pandoc For several months now, the universal document converter pandoc has had support for Jupyter Notebooks. This means that with a single call, you can convert .ipynb files to any of the output formats that Pandoc supports (and vice-versa!). This post is a quick exploration of what this looks like. Note that for this post, were using Pandoc version 2.7.3. Also, some of whats below is hard to interpret without actually opening the files that are created by Pandoc. For the sake of this blog post, Im going to stick with the raw text output here, though you can expand

What would Python-style governance look like in Jupyter?

Published:

What would Python-style governance look like in Jupyter? This is the second in a series of blog posts that explores what itd look like to directly port the governance model of other communities into the Jupyter project. You can find the first post about Rust here. Note These posts are meant as a thought experiment rather than a proposal. Moreover, all the usual caveats come with it, such as the fact that I dont know the Python governance structure that well, and I might totally botch my characterization of it. Background on Pythons Governance Recently, the Python community underwent a refactoring of their governance model. This was in large

Exploring xarray - Analyzing iEEG data

Published:

Analyzing intracranial electrophysiology data with xarray Over the last few years, it has been exciting to see the xarray project evolve, add new functionality, and mature. This post is an attempt at giving xarray another visit to see how it could integrate into electrophysiology workflows. A quick background on our data It is common in neuroscience to ask individuals to perform a task over and over again. You record the activity in the brain each time they perform the task (called an epoch or a trial). Time is recorded relative to some onset when the task begins. That is t==0. The result is usually a matrix of

What would Rust-style governance look like in Jupyter?

Published:

What would Rust-style governance look like in Jupyter? As Ive written about before, I like Rusts governance structure. I mean, who cant get behind a community that lists governance as a top-level page on its website? Jupyter is currently in the middle of figuring out the next phase of its governance structure, and so I have been thinking about what this might look like. This post is a quick thought-experiment to explore what itd mean to port over Rusts governance directly into the Jupyter community. *Note Im not an expert in Rust governance, so there are some assumptions made about its model based on my outside perspective. Apologies if

Automating Jupyter Book deployments with CI/CD

Published:

Lately Ive spent a lot of time trying to reduce the friction involved in deploying Jupyter Book as well as contributing to the project. Features are a great carrot, but ultimately getting engagement is also about lowering barriers to entry and showing people a path forward. Jupyter Book is a relatively straightforward project, but it involves a few technical pieces that can be painful to use (thanks Jekyll). Recently I experimented with whether we can automate deploying a Jupyter Book online. Using continuous integration deployment services seems like a natural place to try this out. One can upload a barebones set of code to a GitHub repository, then

A few recent talks

Published:

Lately Ive given quite a number of talks about the Jupyter and Binder ecosystems for various purposes. Before each of the talks, I make the slides available at a public address in case others are interested in following up with the material. For those who missed the talks (or the subsequent tweets about them), here are a few of the more recent ones. A word of warning theres a lot of overlap between these talks - Im not crazy enough to re-invent the wheel each time I have to speak. However, maybe folks will find some value in the different angles taken in each case. The Berkeley

Thoughts from the Jupyter team meeting 2019

Published:

I just got back from a week-long Jupyter team meeting that was somehow both very tiring and energizing at the same time. In the spirit of openness, Id like to share some of my experience. While its still fresh in my mind, here are a few takeaways that occurred to me throughout the week. Note that these are my personal (rough) impressions, but they shouldnt be taken as a statement from the projectcommunity itself. Jupyter has a huge and diverse set of users The first thing is probably unsurprising to many people, but was really driven home at this meeting, is that there are so many Jupyter

Three things I love about CircleCI

Published:

I recently had to beef up the continuous deployment of Jupyter Book, and used it as an opportunity to learn a bit more about CircleCIs features. It turns out, theyre pretty cool! Here are a few of the things that I learned this time around. For those who arent familiar with CircleCI, it is a service that runs Continuous Integration and Continuous Deployment (CICD) workflows for projects. This basically means that they manage many kinds of infrastructure that can launch jobs that run test suites, deploy applications, and test on many different environments. Here are some cool things that I now have a much better appreciation for

Automatically mirror a github repository with CircleCI

Published:

tl;dr you can automatically mirror the contents of one repository to another by using CICD services like CircleCI. This post shows you one way to do it using secrets that let you push to a GitHub repository from a CircleCI process. We recently ran into an issue with the Data 8 course where we needed to mirror one GitHub site to another. In short, the textbook is built with a tool called jupyter-book, and we use github-pages to host the content at inferentialthinking.com. For weird URL-naming reasons, we had to create a second organization to host the actual site. This introduced the complexity that

Open communities need to be partners, not sources of free labor

Published:

In the last couple of years, weve seen an increasing number of organizations start to spawn products that take a largely open stack (e.g., the SciPy ecosystem) and wrap it in a thin layer of proprietarycustom interface + infrastructure. On the face of it, this isnt a problem - I really want people to be able to make money using the open source stack - however, there is a big caveat. When you look at the work that those organizations have done over time, you often see a pretty thin trail of contributions back to those open source projects. Id argue that using an open communitys software

My weekly workflow

Published:

Ive had a bunch of conversations with friends who were interested in how to keep track of the various projects theyre working on, and to prioritize their time over the course of a week. I thought it might be helpful to post my own approach to planning time throughout the week in case its useful for others to riff off of. General principles First off, a few general principles that I use to guide my thinking on planning out the week. 1. Be intentional. This seems obvious, but I find that if I dont explicitly define what I want to work on, I have more

Signaling openness

Published:

How do open projects signal their openness to the outside community? This is a really hard question, particularly because nowadays open has become a buzzword that doesnt just signal a projects position to the community, but is also used as a marketing term to increase support, users, or resources. I was thinking about this the other day, so decided to take to twitter {% twitter httpstwitter.comcholdgrafstatus1054478362209480704 %} I was surprised at how much this question resonated with people. Here are a few highlights from the (very interesting) conversation that came out of that question. Some discussion threads Wishes vs. reality Tal immediately brought up a really important point

I like Rust's governance structure

Published:

Recently Ive been reading up on governance models for several large-ish open source projects. This is partially because Im involved in a bunch of these projects myself, and partially because its fascinating to see distributed groups of people organizing themselves in effective (or not) ways on the internet. Why is governance in open projects important? Governance is tricky, because there is an inherent tension between * Being able to make important, complex, or sensitive decisions quickly * Being transparent and inclusive in the decision-making process For most companies and organizations, the above is (sort-of) solved with a relatively hierarchical decision-making structure. The Chief Executive Officer can decide high-level directions

Using CircleCI to preview documentation in Pull Requests

Published:

Writing documentation is important - its the first point of contact between many users and your project, and can be a pivotal moment in whether they decide to adopt your tech or become a contributor. However, it can be a pain to iterate on documentation, as it is often involves a lot of rapid iteration locally, followed by a push to GitHub where you just trust that the author has done a good job of writing content, design, etc. A really helpful tip here is to use Continuous Integration to build and preview your documentation. This allows you to generate a link to the build docs,

Summer conference report back

Published:

This is a short update on several of the conferences and workshops over the summer of this year. Theres all kinds of exciting things going on in open source and open communities, so this is a quick way for me to collect my thoughts on some things Ive learned this summer. SciPy The Pangeo project demoed their JupyterHub for big-data geoscience Pangeo is a project that provides access to a gigantic geosciences dataset. They use lots of tools in the open-source community, including Dask for efficient numerical computation, the SciPy stack for a bunch of data analytics, and JupyterHub on Kubernetes for managing user instances and deploying on

Adding copy buttons to code blocks in Sphinx

Published:

NOTE This is now a sphinx extension! Thanks to some friendly suggestions, Ive written > this up as a super tiny sphinx extension. Check it out here httpsgithub.comcholdgrafsphinx-copybutton Sphinx is a fantastic way to build documentation for your Python package. On the Jupyter project, we use it for almost all of our repositories. A common use for Sphinx is to step people through a chunk of code. For example, in the Zero to JupyterHub for Kubernetes guide we step users through a number of installation and configuration steps. A common annoyance is that there is a lot of copypasting involved. Sometimes you accidentally miss a character or some

Introducing _makeitpop_, a tool to perceptually warp your data!

Published:

Note It should go without saying, but you should never do the stuff that youre about to read about here. Data is meant to speak for itself, and our visualizations should accurately reflect the data above all else. When I was in graduate school, I tended to get on my soapbox and tell everybody why they should stop using Jet and adopt a perceptually-flat colormap like viridis, magma, or inferno. Surprisingly (ok, maybe not so surprisingly) I got a lot of pushback from people. Folks would say _But I like jet, it really highlights my data, it makes the images pop more effectively

Blogging with Jupyter Notebooks and Jekyll using nbconvert templates

Published:

Heres a quick (and hopefully helpful) post for those wishing to blog in Jekyll using Jupyter notebooks. As some of you may know, nbconvert can easily convert your .ipynb files to markdown, which Jekyll can easily turn into blog posts for you. nbconvert --to markdown myfile.ipynb However, an annoying part of this is that Markdown doesnt include classes for input and outputs, which means they each get treated the same in the output. Not ideal. Fortunately, you can customize nbconvert extensively. First, its possible to create your own exporter class, but this is a bit heavy for what we want to do. In our case, wed simply like to extend

An academic scientist goes to DevOps Days

Published:

Last week I took a few days to attend DevOpsDays Silicon Valley. My goal was to learn a bit about how the DevOps culture works, what are the things people are excited about and discuss in this community. Im also interested in learning a thing or two that could be brought back into the scientific academic world. Here are a couple of thoughts from the experience. > tl;dr DevOps is more about culture and team process than it is about technology, maybe science should be too… What is DevOps anyway? This one is going to be hard to define (though heres one definition), as Im new

Combining dates with analysis visualization in python

Published:

Sometimes you want to do two things 1. Plot a timeseries that handles datetimes in a clever way (e.g., with Pandas or Matplotlib) 2. Plot some kind of analysis on top of that timeseries. Sounds simple right? Its not. The reason for this is that plotting libraries dont really plot human-readable dates, they convert dates to numbers, then change the xtick labels so that theyre human readable. This means that if you want to plot something on top of dates, its quite confusing. To demonstrate this, lets grab the latest stock market prices for a couple companies and fit regression lines to them… Lets say we want

Dates in python

Published:

As a part of setting up the website for the Docathon Ive had to re-learn all of my date string formatting rules. Its one of those little problems you dont really think about - turning an arbitrary string into something structured like a date - until youve actually got to do it. There are a bunch of tools in python for using date-like objects, but its not always easy to figure out how these work. This post is just a couple of pieces of information Ive picked up along the process. Useful links Heres a list of useful links Ive picked up,

Matplotlib Cyclers are Great

Published:

Every now and then I come across a nifty feature in Matplotlib that I wish Id known about earlier. The MPL documentation can be a beast to get through, and as a result you miss some cool stuff sometimes. This is a quick demo of one such feature the cycler. Have you ever had to loop through a number of plotting parameters in matplotlib? Say you have two datasets and youd like to compare them to one another. Maybe something like this Theres really a lot of unnecessary code going on above. Were defining objects that share the same name as the kwarg

Brainy Jingle Bells

Published:

This is a quick demo of how I created this video. Check it out below, or read on to see the code that made it! Jingle Bells! Heres a quick viz to show off some brainy holiday spirit. Well use matplotlib and MoviePy to read in an audio file and generate a scatterplot that responds to the audio qualities. Well use the spectral content in the audio to drive activity in the electrodes. Heres what Im talking about by spectral content Well extract this information again below so we can make the viz… Now, well assign each electrode to a particular point on the y-axis

The bleeding edge of publishing, Scraping publication amounts at biorxiv

Published:

Per a recent request somebody posted on Twitter, I thought itd be fun to write a quick scraper for the biorxiv, an excellent new tool for posting pre-prints of articles before theyre locked down with a publisher embargo. A big benefit of open science is the ability to use modern technologies (like web scraping) to make new use of data that would originally be unavailable to the public. One simple example of this is information and metadata about published articles. While were not going to dive too deeply here, maybe this will serve as inspiration for somebody else interested in scraping

Visualizing publication bias

Published:

This article is now interactive! Check out a live Binder instance here In the next few months, Ill try to take some time to talk about the things I learn as I make my way through this literature. While its easy to make one-off complaints to one another about how science is broken without really diving into the details, its important learn about how its broken, or at least how we could assess something like this. Fortunately, there are a lot of great researchers out there who are studying these very issues. Whether they dedicate all of their research to these meta

5 things I learned at SciPy

Published:

Ive finally decompressed after my first go-around with Scipy. For those who havent heard of this conference before, Scipy is an annual meeting where members of scientific community get together to discuss their love of Python, scientific programming, and open science. It spans both academics and people from industry, making it a unique place in terms of how software interfaces with scientific research. (if youre interested the full set of Scipy conferences, check out here. It was an eye-opening experience that I learned a lot from, so heres a quick recap of some things that I learned during my first rodeo.

Could Brexit have happened by chance?

Published:

As a scientist, watching the Brexit vote was a little bit painful. Though probably not for the reason youre thinking. No, it wasnt the politics that bothered me, but the method for making such an incredibly important decision. Let me explain… Scientists are a bit obsessed with the concept of error. In the context of collecting data and anaylzing it, this takes the form of our confidence in the results. If all the data say the same thing, then we are usually pretty confident in the overall message. If the data is more complicated than this (and it always is), then

The beauty of computational efficiency

Published:

When we discuss computational efficiency, you often hear people throw around phrases like $O(n^2)$ or $O(nlogn)$. We talk about them in the abstract, and it can be hard to appreciate what these distinctions mean and how important they are. So lets take a quick look at what computational efficiency looks like in the context of a very famous algorithm The Fourier Transform. A short primer on the Fourier Transform Briefly, A Fourier Transform is used for uncovering the spectral information that is present in a signal. AKA, it tells us about oscillatory components in the signal, and has a wide range

NIH grant analysis

Published:

NIH Fellowship Success Rates As Im entering the final years of graduate school, Ive been applying for a few typical pre-doc fellowships. One of these is the NRSA, which is notorious for requiring you to wade through forests of beaurocratic documents (seriously, their guidelines for writing an NRSA are over 100 pages!). Doing so ends up taking a LOT of time. This got me wondering what kind of success rates these grants have in the first place. For those who havent gone through the process before, its a bit opaque How the NRSA works Basically, each NRSA grant is reviewed by a

Craigslist data analysis

Published:

Using Craigslist to compare prices in the Bay Area In the last post I showed how to use a simple python bot to scrape data from Criagslist. This is a quick follow-up to take a peek at the data. > Note - data that you scrape from Craigslist is pretty limited. They tend to clear out old posts, and you can only scrape from recent posts anyway to avoid them blocking you. Now that weve got some craigslist data, what questions can we ask? Well, a good start would be to see where we want (or dont want) to rent our

Scraping craigslist

Published:

Overview In this notebook, Ill show you how to make a simple query on Craigslist using some nifty python modules. You can take advantage of all the structure data that exists on webpages to collect interesting datasets. First we need to figure out how to submit a query to Craigslist. As with many websites, one way you can do this is simply by constructing the proper URL and sending it to Craigslist. Heres a sample URL that is returned after manually typing in a search to Craigslist > httpsfbay.craigslist.orgsearchebyapa?bedrooms=1&pets_cat=1&pets_dog=1&is_furnished=1 This is actually two separate things. The first tells craigslist what kind of thing

Coherence correlation

Published:

Note - you can find the nbviewer of this post here Coherence vs. Correlation - a simple simulation A big question that Ive always wrestled with is the difference between correlation and coherence. Intuitively, I think of these two things as very similar to one another. Correlation is a way to determine the extent to which two variables covary (normalized to be between -1 and 1). Coherence is similar, but instead assesses similarity by looking at the similarity for two variables in frequency space, rather than time space. There was a nice paper that came out a while back that basically compared