I’ve been working for a few weeks on a new site focused on becoming a better data scientist through tutorials, tips and news. Check it out here: Data Science Bytes. The recommended books and recommended videos pages may be especially interesting.
The IPython Notebook is an incredible tool that I use almost daily. Notebooks can be exported to HTML for easy sharing using
nbconvert, however I’ve never been happy with the look of the exported notebooks. In the past I’ve used the nbviewer to render the .ipynb files and display notebooks online, but I’d prefer to host the notebooks myself. I set out to make some changes so I could export HTML notebooks that looked good. The results can be seen in my Introduction to Singular Value Decomposition and NYC School Data Exploration notebooks. See below for how to.
I’ve been working heavily with Amazon’s Elastic MapReduce (EMR) lately to run analysis jobs on hadoop. During development I often have to ssh into the master node of the cluster and the constant copying/pasting of DNS names or job-ids was starting to get annoying. I wrote this function to automatically log me in to an ssh session with the most recently created active master node and put it in my
A bit of configuration of the elastic-mapreduce CLI is required (see here).