In this entry, I will quickly introduce the Series. It is the most simple structure provided by Pandas, and it can be thought of as a column in an excel spreadsheet.
A Series object has two main components: an index and the data. Both components are one-dimensional arrays with the same length. The index should be made up of unique elements, and it is used to access individual data values.
To create a Series, the constructor is called:
>>> import pandas as pd >>> idx = range(4) >>> vals = ['a','b','c','d','e'] >>> s = pd.Series(index=idx, data=vals) >>> print(s[1]) b
In this example, a Series containing characters and indexed by integers has been created; but there is no reason that the contrary cannot be done. Series can also be used to represent mathematical functions:
>>> import numpy as np >>> x = np.arange(-10,10,1) # Create a range from -10 to 9 counting one by one >>> s = pd.Series(index=x, data=x**2) # x**2 returns the power of 2 of x elementwise >>> print(s) -10 100 -9 81 -8 64 -7 49 -6 36 -5 25 -4 16 -3 9 -2 4 -1 1 0 0 1 1 2 4 3 9 4 16 5 25 6 36 7 49 8 64 9 81 dtype: int32
In this example, the Numpy library has been used, but I have tried to keep it as simple as possible. Anyway, if you are starting with Python for science or data analysis, be sure to check it out!
We can access individual elements or a range of them in the indexed array using the Series.loc[]:
>>> s.loc[6] 36 >>> s.loc[-3:3] -3 9 -2 4 -1 1 0 0 1 1 2 4 3 9 dtype: int32
There are many other ways of accessing data, but I will introduce only Series.loc[] for simplicity. Additional ways can be found here.
Now we will explore the real fun parts. Data can be selected using conditional indexing to select values that meets certain conditions:
>>> sBigger64 = s.loc[s>50] >>> print(sBigger64) -10 100 -9 81 -8 64 8 64 9 81 dtype: int32
We can also test over the original values which of them meet the criteria:
>>> s>50 -10 True -9 True -8 True -7 False -6 False -5 False -4 False -3 False -2 False -1 False 0 False 1 False 2 False 3 False 4 False 5 False 6 False 7 False 8 True 9 True dtype: bool
Finally, we will see how to apply a function over a Series. This functionality is key for data analysis, and a whole entry could be dedicated to this. First of all, we need to have a function to be applied over the data. Lambda functions may come in handy here, but for simplicity, I will use a predefined function, the numpy.sqrt() function from Numpy that calculates the square root.
>>> s2 = s.apply(np.sqrt) >>> print(s2) -10 10 -9 9 -8 8 -7 7 -6 6 -5 5 -4 4 -3 3 -2 2 -1 1 0 0 1 1 2 2 3 3 4 4 5 5 6 6 7 7 8 8 9 9 dtype: float64
The Series.apply() takes the function that is passed as parameter (in this case numpy.sqrt()) and applies it element by element; and returns the result as a new Series that has an identical index.
In the next entry, I will introduce DataFrames. Of course, this is a very light introduction based on what I use on a daily basis, and there is much more to read the Pandas tutorial and Series documentation.