2017-02-22

Exploring data with Visual Studio Code

Fig 1: Visual presentation of Radix sort steps
Every now and then I have a need to visualize some data, be it spectral spread of stars in a cluster, a spurious correlation or a stream of measurements. I've always used whatever happens to be handy and available, often kludging data from format to another and using Excel or Python+matplotlib to create the visuals. Both have repeatedly left me wanting on either appearance or ease of updating.

Enter Visual Studio Code to the equation. A small, nimble editor with interesting extensions: Python with support for Jupyter notebooks and Bokeh chart library. And the best part is, Python and Jupyter notebooks are first class citizens in the big data cloud services. So I should be able to take my existing code, update data reading parts to read my data points from cloud storage and I'd be ready to marvel a beautiful visualization of said data in my browser.

Installing tools

To start on the journey, you need to install a few things. To start, install Visual Studio Code first. And you need to install Python, I recommend 3.6.0 and the libraries for number crunching and visualization. You can use PIP for that on command line, i.e. use either pip install <package_name> or  python -m pip install <package_name>

Python Packages you need are:
Fig 2: Visual Studio Code extension for python
  • numpy
  • scipy
  • bokeh
  • jupyter
  • ipython
  • pylint
Optional but nice to have libraries are:
  • matplotlib
  • pandas
Some libraries might be included already as dependencies of the others. Now you are almost ready to go, so start up VSCode and go to the extensions tab and search for "Python" (see Fig 2). Install and reload the VSCode window.

First chart

To test everything works, you need a simple test case, how about a simple parabolic curve?
Fig 3: Code and output for a simple chart

So, what's going on there? The line 2 contains a IPython notebook cell separator, you can use these to mark executable cells for your notebook, and click on the "Run cell" button above marker to execute the following block of code. Read more on a Jupyter notebooks on VSCode by the author of the Python extension, Don Jayamanne.

The lines 3-6 import several Python module, namely Bokeh features for outputting to a notebook, plotting a figure and displaying inline-HTML text.

The line 8 defines a dictionary of common parameters for our plots, these get used by the kwargs syntax when calling plot().

Line 9 defines that our output should be a notebook and it should use inline resources instead of network resources.

Line 10 creates a sample text output to our notebook.

Lines 11-13 create a data set and a plot for it.

Line 14 shows our plot on the notebook.

Line 15 ensures that the notebook is pushed into VSCode output window.

More charts

Let's try a bigger sample, showing how a radix sort changes the array in each iteration.


The output can be seen in Fig 1. This code uses Numpy to create an array of random numbers, then sorts those using an implementation of radix sort in base 16. The interim sorting round arrays are plotted to a single figure using the multi_line -function.

If you want even more, you can take a look at the Bokeh examples, such as Heatmap.py which creates an HTML file with charts and opens it into your browser. A good excercise is to update that code to output into the VSCode window instead.

Issues

At least currently there are a few glitches. The Python kernel isn't keen to change the output method of Jupyter notebooks, so if you have multiple windows open with both HTML file and VSCode window output, your HTML files might get corrupted when executing a cell from another file. To get around this, point your mouse to the status row at bottom, click on the "Python 3 kernel" and select "Restart Python 3 kernel" whenever switching between files. I haven't found a direct Shift-Ctrl-P shortcut command for this yet.

No comments:

Post a Comment