In a prior entry we described how to load data from a CSV file. In this entry, we will do a quick introduction on how to load the contents of an SQL table into a DataFrame.
With that purpose, we will use the DataFrame.read_sql() method.
The first thing that we need is an active MySQL (or similar) database. We need to establish a connection to the database using SQL Alchemy. SQL Alchemy is a very powerful library for managing databases; definitely get to know it better if you have the time. You will also need to install a mysql driver; in this case, we will use PyMySQL. You can install it using pip or the package manager of your choice.
>>> from sqlalchemy import create_engine >>> engine = create_engine('mysql+pymysql://username:password@hostname:3306/dbname')
Next, with the established connection, we open a table or execute an SQL query with the DataFrame.read_sql() method.
>>> with engine.connect() as conn, conn.begin(): >>> data = pd.read_sql('mytable', conn) >>> print(data) User Age Email 0 Alice 21 alice@foo.com 1 Bob 11 bob@sponge.com 2 Charlie 32 charles@baz.com
We have used ‘mytable’ to select a whole table, but a specific SQL query can be executed as well:
>>> with engine.connect() as conn, conn.begin(): >>> data = pd.read_sql('select * from mytable where Age>18', conn) >>> print(data) User Age Email 0 Alice 21 alice@foo.com 1 Charlie 32 charles@baz.com
Once the data is in the DataFrame, it can be manipulated using all the available methods in Pandas.
>>> data['Domain'] = data.apply(lambda x: x['Email'].split('@')[1],axis=1) >>> print(data) User Age Email Domain 0 Alice 21 alice@foo.com foo.com 1 Bob 11 bob@sponge.com sponge.com 2 Charlie 32 charles@baz.com baz.com
The results can be saved again in a database table with DataFrame.to_sql(). There is a limitation in this functionality (and please correct me if I am wrong): it cannot update existing entries a table. It can only save on a new table, override an existing one or append data.
>>> with engine.connect() as conn, conn.begin(): >>> data.to_sql('mytable_new',conn,index=False)
We have seen the basic functionality for reading and writing SQL databases. With this, we can implement all that SQL queries provide us. Remember that once the results of a query are returned, we won’t have access to the data that is in the table but not in the results (unless we call DataFrame.read_sql() again).
To see all the options, have a look at the documentation of read_sql() and to_sql().