In this entry, we will do a quick review of the possibilities available for plotting and charting in Python. It is not a complete review, but just an introduction to get started. Data visualization plays a key role in the processes of data science; it is ultimately an interface between the data and the data scientist.
Graphs help us understand large amounts of data in a simple manner. Data manipulation is often an iterative process, where a process is applied over the original data, the results are observed through graphs and new processes are applied or the original are modified according to the observations. Graphs also help us understand the models extracted from data, and generally are a very good tool to disseminate data analysis results.
There are many types of graphic representations (line plots, bar plots, pie charts, matrix charts, etc …). The selection of the graph type depends highly on the nature of the results we want to show. Time series are usually best represented with a line plot. Proportions are usually represented using pie charts, but can also be shown with bar charts if there is an additional dimension (for instance, proportions that change over time). To represent individual data points, we may want to choose scatter plots. The list goes on. A nice reference on how to choose the best graph can be found here.
Back to Python; there are many libraries for graphing in Python. To choose among them we must first know where we want to use the graphs, either to represent them on screen, save them to an image file, produce publication-grade graphs or use them in a web or desktop application.
Matplotlib (part of Scipy) is the mega-library that does everything. It is the most complete of all the available libraries, and it can produce graphs for all purposes. The problem is that, by default, the graphs produced by Matplotlib are too basic (i.e. ugly), so a lot of tweaking is required in order to decorate them properly. Matplotlib provides widgets for Gtk and Qt in order to be embedded in applications.
Pandas has some plotting capabilities. The Series and DataFrame classes have a plot() method to draw graphics of the data. The documentation has more details on the plot types and options available. The implementation is based on Matplotlib, and the resulting plots can be manipulated using Matplotlib functions. Again, the problems are that the graphics need some tweaking in order to get good plots. Nevertheless, this capability comes in very handy when doing exploratory data analysis.
Based on Matplotlib, Seaborn provides a simplified graphing interface geared towards statistic visualization. It is designed to produce attractive graphics out of the box (yay!). On the other hand, the personalization possibilities and graph types are more limited than Matplotlib.
Bokeh is a library for producing interactive graphics through web browsers. This means that the output is HTML/JS code that can be used in a web page, although static images can also be generated. Bokeh also has a server that can be used to display live updating graphs. The use of Bokeh is mostly for websites or web apps that present data to final users. Another good place where Bokeh can be used is in Jupyter notebooks, especially if they are intended for distribution.
Similarly to Bokeh and websites, PyQtGraph is a plotting library for Qt (that can be used in Python through PySide). Qt is a multi-language (C/C++, Python, Java…) and multi-platform (Windows, Linux, OS X, Android…) library for writing desktop applications.
We have seen only a subset of the many graphing libraries available for Python. Matplotlib is the de-facto standard in Python, but it may not be the most user-friendly library. To simplify things (as far as the graphic we are creating is not a very uncommon type), we can use Seaborn or other alternatives that we did not mention explicitly here (Chaco, Visvis… a full list can be found here). To use graphics in web apps, Bokeh is the way to go, since it provides interactive graphics. For desktop apps, Matplotlib provides some bindings for the major GUI Toolkits, but PyQtGraph simplifies the process for Qt. In future entries we will see examples of each of the libraries described here.