Book review: Python for Data Analysis

Python for Data Analysis (disclaimer) is written by Wes McKinney, the original author of the excellent Pandas library. I highly recommend this book for anyone who interacts with data. The scope of the book goes well beyond Pandas and covers other essential python data tools such as IPython, Numpy and Matplotlib. Also included are recommendations and best practices for data workflows and interactive analysis. The examples in the book are well thought out and illustrate the point in question without unnecessary complication. As a bonus a diverse group of data sets are used in the examples, which makes for a more interesting read.

One useful function that I had previously overlooked is the apply method of pandas groupby objects. The apply method applies a function to each group in the groupby object, then glues the results together row wise. I like apply because it’s an elegant way to do arbitrary operations to each group of data, replacing cases where I might otherwise have used a loop like the one below:

result_dict = {}
for group_name, group in groupby_object:
    result = some_function(group)
    result_dict[group_name] = result

There are some useful examples of using apply in the pandas documentation. There are even more examples in the Python for Data Analysis book, including applying a regression model to the data in each group.