In a prior entry we described how to load data from a CSV file. In this entry, we will do a quick introduction on how to load the contents of an SQL table into a DataFrame.
With that purpose, we will use the DataFrame.read_sql() method.
The first thing that we need is an active MySQL (or similar) database. We need to establish a connection to the database using SQL Alchemy. SQL Alchemy is a very powerful library for managing databases; definitely get to know it better if you have the time. You will also need to install a mysql driver; in this case, we will use PyMySQL. You can install it using pip or the package manager of your choice.
>>> from sqlalchemy import create_engine >>> engine = create_engine('mysql+pymysql://username:password@hostname:3306/dbname')
Next, with the established connection, we open a table or execute an SQL query with the DataFrame.read_sql() method.
>>> with engine.connect() as conn, conn.begin(): >>> data = pd.read_sql('mytable', conn) >>> print(data) User Age Email 0 Alice 21 firstname.lastname@example.org 1 Bob 11 email@example.com 2 Charlie 32 firstname.lastname@example.org
We have used ‘mytable’ to select a whole table, but a specific SQL query can be executed as well:
>>> with engine.connect() as conn, conn.begin(): >>> data = pd.read_sql('select * from mytable where Age>18', conn) >>> print(data) User Age Email 0 Alice 21 email@example.com 1 Charlie 32 firstname.lastname@example.org
Once the data is in the DataFrame, it can be manipulated using all the available methods in Pandas.
>>> data['Domain'] = data.apply(lambda x: x['Email'].split('@'),axis=1) >>> print(data) User Age Email Domain 0 Alice 21 email@example.com foo.com 1 Bob 11 firstname.lastname@example.org sponge.com 2 Charlie 32 email@example.com baz.com
The results can be saved again in a database table with DataFrame.to_sql(). There is a limitation in this functionality (and please correct me if I am wrong): it cannot update existing entries a table. It can only save on a new table, override an existing one or append data.
>>> with engine.connect() as conn, conn.begin(): >>> data.to_sql('mytable_new',conn,index=False)
We have seen the basic functionality for reading and writing SQL databases. With this, we can implement all that SQL queries provide us. Remember that once the results of a query are returned, we won’t have access to the data that is in the table but not in the results (unless we call DataFrame.read_sql() again).