Django
Python is not a novel language. Nevertheless, in the last couple of years, its popularity has risen together with the growth of data science. It is increasingly the language of choice for data scientists. So, what is the reason for this choice, given that there are languages and environments specifically designed for this purpose, such as R, Matlab or Octave?
There are many reasons, of course; but one that we can point is the flexibility of the workflow in Python. Python has many libraries for many purposes; they may not be the best choice, but there is a Python choice for almost any task that can be done with a computer. This is specially useful for multidisciplinary fields; a description which happens to fit data science. With Python there are decent libraries for data science (Pandas, which we have seen previously in this blog, scikit-learn, which we will see in the future); maybe R is a better choice, but then again, R does not have a good platform for making GUIs, whereas Python has support for most platforms.
Today we are going to review one of the killer apps of Python: Django. Django is a platform for programming web applications with a Python backbone. In fact, a lot of users get into Python because of Django.
So, how is this related with data science? Well, at some point, we may want to collect data from users; or most likely, to provide them with the results of that data. Another use is serving data remotely. We cannot do this easily with other platforms.
Before getting into Django, we must first have a very simplified picture of how a client-server application works in practice. Normally, in web applications, the server has access to either a lot of data or a high computing power, whereas the client has access to the main asset: the users. Therefore, the server controls what gets sent or done for the users and it is focused on the shared resource (a database, a processing cloud, etc …), whereas the client controls the presentation and the interaction with the user. The flow of data between the client and the server is done through the interchange of messages following a certain protocol. Normally the client will send a message called request to the server (for instance, asking for some data of a database) and the server sends a response with the results of the request (returning the data requested by the client). From a programming point of view, these messages are interchanged through sockets. Sockets are objects or function calls that send a message to a certain address or listen on a port for messages from either the client or the server. With sockets, we will have to develop our own protocol. But there is a widely used protocol that we can use so we don’t have to deal with all the work: HTTP. HTTP is the protocol used by websites and servers are addressed using URLs. When we open a website, we are using a client (the browser) that requests the website from the server using the address we tell it (URL). The server responds with the website (or an error message). The requests can be either GET requests (we ask for a document given the URL) or POST requests (we send the server some data, such as the text of an email); among others. Again, as data scientists, why should we know this? Well, on top of HTTP, we can interchange more than websites (HTML documents), such as XML or JSON documents. Starts sounding more interesting? This is what we call a REST API.
How can we do all this with Python? The answer is Django. Django provides a platform for creating server apps. An app is a set of classes and functions that have associated URLs that can be called from a client through a request. Each function processes the request and may return a response.
Apps in Django follow a Model-View-Controller scheme. Models represent the data that the application is using (for instance, user, friendship or picture in a social network); controllers (views in Django jargon) are the functions that process the requests of users and that return the response; and views (templates) configure the replies that are returned by the views. That was not a mistake; in Django, the controllers are called views and the views are called templates. Django uses a database as the backend where the data is stored. By default, SQLite is used, although it has support for many other technologies.
The workflow in Django will usually start by creating the models of the data we want to send to the clients. Models are Python classes that represent the data stored in the backend. Usually we will start with a blank database that will be managed solely by the app. Nevertheless, there are ways to use existing databases. We will see this in a future entry; first, we must see the basics.
Once models are created, we will work on the views; deciding what we want to serve and how we will process the requests of the clients. Visibility policies (what client has access to what data) are usually decided here (although there are ways of doing this in the models). Views are Python functions that take as input the request, perform an action on the database and (optionally) return a response.
Usually, we will want the response from the views to have a specific format (HTML, XML, JSON, etc). Doing this in the view may complicate things a lot, so Django uses templates, which are formatted files with blanks that are filled before returning them.
Finally, we configure the Django environment that hosts the applications, configuring the URLs and launching the server.
On top of Django, there are many libraries that provide with pre-made models, views and templates, among other things, for specific tasks. Specially interesting for us will be the Django REST Framework, that provides materials for building a REST API.
In future entries, we will see how to build simple Django apps and increasingly complex ones until we can serve data from a database in JSON format using a REST API.