ggplot in loop (ipython notebook)




Tuesday, February 3, 2015
I enjoy using ggplot for visualization in ipython notebook environment. It is more flexible that other plotting modules in python. Moreover, it is fairly similar to ggplot in R, and hence I can use my knowledge from there.

One issue when using ggplot in ipython notebook, though, is that the plot needs to be the last statement in the cell. Now the question is what to do if you want to do plotting in a loop? Unfortunately my quick search online was unsuccessful. In R, we just need to use print function within the loop to plot a ggplot. But this does not work for ipython. The following is my solution to this problem.

Lets go with an example. Let assume we have the following dataset about personal income in united states that is available in "Census Income" dataset.
df = pd.read_csv('http://archive.ics.uci.edu/ml/machine-learning-databases/adult/adult.data')
df.columns = ['age', 'education', 'marital status', 'relationship', 'race', 'gender', 'hours per week', 'native country', 'income']
df.head(11)

The second line above is due to the fact that the csv does not have header. The first 11 entries are as below:


Now assume that we want to plot each gender age density. For that we need to put the ggplot in a an array and put that array as the last line in the cell.
gender = df['gender'].unique()
p = []
for g in gender:
    df_slice = df[df['gender']==g]
    p.append(ggplot(aes('age'), data = df_slice) + geom_density())
p

The above piece of code generate the following plots two plots:


P.S. another way of plotting multiple figures is by using facet_grid. However, since the above approach creates normal size plots, it is my preference.


0 comments:

Post a Comment

 

Favorite Quotes

"I have never thought of writing for reputation and honor. What I have in my heart must out; that is the reason why I compose." --Beethoven

"All models are wrong, but some are useful." --George Box

Copyright © 2015 • Ensemble Blogging