Effective Python (disclaimer) contains 59 mini-lessons on python best practices. This book is not for programming novices but is great for advancing your python skill or as a reference if you’re moving to python from another language.
I see Effective Python as a compliment to Python for Data Analysis, the latter covering scientific and statistical uses where alternatives include MATLAB or R, and the former covering general software problems where python takes the place of Java or Ruby. Of course those two fields are not mutually exclusive, after reading Effective Python I have a better understanding of what’s going on (or should be going on) behind the scenes of the python libraries I use.
The Linux Command Line: A Complete Introduction lives up to its name and covers a wide range of command line topics (disclaimer). There are a lot of subtleties with how the bash shell behaves that are more clear to me after reading this book. I didn’t learn as much as I did from How Linux Works (review), but I wish I’d read The Linux Command Line when I first starting working with linux systems.
The last part of the book deals with writing shell scripts and covers more about shell scripting than I ever wanted to know. Reading and writing nontrivial shell scripts is sometimes a reality, so overall it was good to pick up a few useful tips and tricks along with a generally increased understanding of bash programming.
Black Hat Python: Python Programming for Hackers and Pentesters is worth a quick read to learn a bit about avenues of attack on networks and web services (disclaimer). The second half of the book is largely devoted to attacks on windows machines which is less useful to me professionally but still interesting.
As a data scientist I sometimes expose data through web interfaces and it’s important to understand how a malicious user might try to exploit a system to access its data or take down the service.
I’ve been working for a few weeks on a new site focused on becoming a better data scientist through tutorials, tips and news. Check it out here: Data Science Bytes. The recommended books and recommended videos pages may be especially interesting.
Think Bayes is a rare book that isn’t specifically about Python or a Python library but still contains excellent, well organized python code (disclaimer). Think Bayes covers using Bayesian Statistics to calculate probabilities and better understand real problems drawing from a range of real data sets. I definitely recommend that any scientist (and especially any data scientist) read this book, it’s available for free here.
Last May when I had a week between graduation and starting work I spent some time creating a LaTeX version of my resume. I couldn’t find a template that suited my needs right out of the box, but I found a two column layout that came close. I had to make a fair bit of changes to template to be happy with the result, including more customization options for the personal info box and adding the option to put bulleted lists under each job. The the underlying LaTeX template and example document are up on github, so you can clone the project and fill in your own details.
You can also edit the LaTeX directly at writelatex.com, a great website that provides online LaTeX editing and compilation. Here’s a link to the template as an interactive example.
The IPython Notebook is an incredible tool that I use almost daily. Notebooks can be exported to HTML for easy sharing using
nbconvert, however I’ve never been happy with the look of the exported notebooks. In the past I’ve used the nbviewer to render the .ipynb files and display notebooks online, but I’d prefer to host the notebooks myself. I set out to make some changes so I could export HTML notebooks that looked good. The results can be seen in my Introduction to Singular Value Decomposition and NYC School Data Exploration notebooks. See below for how to.
Along with the interactive histogram for the MPG tracking page I also added some interactivity to the miles over time graph. Below is the code and data that produces the above plot. The code for this example and the entire MPG project is available on github.
I recently worked on some updates to the MPG tracking page I set up in January. One of my goals was to make the graphs on the page respond to mouseover events by displaying more data. Here’s some sample code for a simplified version of the histogram that now appears on the MPG page, the data is below the code. The code for this example and the entire MPG project is available on github. See also the interactive miles over time graph.
I’ve been working heavily with Amazon’s Elastic MapReduce (EMR) lately to run analysis jobs on hadoop. During development I often have to ssh into the master node of the cluster and the constant copying/pasting of DNS names or job-ids was starting to get annoying. I wrote this function to automatically log me in to an ssh session with the most recently created active master node and put it in my
A bit of configuration of the elastic-mapreduce CLI is required (see here).