23 Free Data Science Books
Last updated January 17, 2017
As a data scientist at Quora, I often get asked for my advice about becoming a data scientist. To help people exploring the data science career track, I've taken some time to compile my top recommendations of quality data science books that are either available for free (legally, of course) or are Pay What You Want (PWYW) with $0 minimum.
Please bookmark this place and refer to it often! Click on the book covers to take yourself to the free versions of the book. I've also provided Amazon links (when applicable) in my descriptions in case you want to check the price of the physical book.
The authors of these books have put in much effort to produce these free resources - please consider supporting them through avenues that the authors provide, such as contributing via PWYW or buying a hard copy [Disclosure: I get a small commission via the Amazon links, and I am co-author of one of these books].
Python and R
The start of your journey is where the resources are the most plentiful. I've listed three books that I recommend: Think Python [Check price on Amazon], R Programming for Data Science, and R for Data Science [Check price on Amazon]. I would highly suggest learning both Python and R to become an effective data scientist, but if you're forcing yourself to choose between Python and R, check out: Which is better for data analysis: R or Python? - Quora
Probability and Statistics
With Think Stats [Check price on Amazon], you'll start off plotting and understanding distributions, and learning about hypothesis testing and regression. Then, you'll move on to Think Bayes [Check price on Amazon], where you'll play with conditional probabilities and priors. Finally, you'll graduate with [ADVANCED] Bayesian Methods for Hackers, where you'll play with more advanced Bayesian algorithms such as multi-armed bandits and MCMC. These three have a heavy emphasis on Python applications.
Data Analysis Process
What distinguishes a data scientist from a statistician is the ability to deal with all the practical considerations involving datasets. This involves anything including cleaning data, exploring for insights, and presenting your data in a way that's clear and understandable. These three books (available on Leanpub) will help you develop these practical skillsets. What you learn in these books is how to do the data process.
The Art of Data Science provides guidance onto best practices when dealing with and analyzing data, facilitating the production of useful, interesting, and valid results. The Elements of Data Analytic Style goes into practical skills like tidying and checking your data, and presenting and sharing your findings. Exploratory Data Analysis with R will overview tools and best practices in R to accomplish all the best steps of the data analysis process.
The first two chapters of Design and Analysis of Experiments [Check price on Amazon] covers most of what you need to know about A/B Testing. The rest is more advanced.
For a survey into the nuances of applying experimental design in practice, check out the 42-page paper Controlled experiments on the web: survey and practical guide, written by practitioners currently on the Microsoft Analysis and Experimentation team.
Data Visualization with d3.js
At the end, you'll create visualizations worthy of Mike Bostock himself (maybe)! Check out D3 Tips and Tricks and Interactive Data Visualization [Check price on Amazon].
Statistical Machine Learning
An Introduction to Statistical Learning [Check price on Amazon] is a more approachable and accessible version to the original "The Elements of Statistical Learning". Play around with its applications in R, and check out the richness of the accompanying MOOC.
[ADVANCED] The Elements of Statistical Learning [Check price on Amazon] was the original Statistical Learning textbook, and is highly-regarded in the statistics and machine learning community. It should give you a thorough background in statistical learning, although is noticeably more advanced.
Practical Machine Learning / Data Mining
These three books by highly respected academics / practitioners, and cover some of the most popular techniques in data mining and machine learning today. The previous section, Statistical Machine Learning, covers machine learning from the perspective of statisticians: creating statistical valid models of the data that can be used for predictions. This section, practical machine learning / data mining, deals more with the need to extract information and make predictions from large datasets.
The first book, [ADVANCED] Mining of Massive Datasets [Check price on Amazon], is based off of Stanford's eponymous class, and covers popular problems such as recommendation systems, PageRank, and social network analysis. The second book, [ADVANCED] Deep Learning [Check price on Amazon], has draft chapters available for free. The book is written by some of the most well-respected deep learning researchers and is set up to be the canonical reference for deep learning when the book is released. The third book, Machine Learning Yearning, by Andrew Ng (co-founder of Coursera, Chief Scientist at Baidu, and creator of the popular Coursera ML class), is aimed at practical considerations for people developing ML systems. Draft chapters are available for free if you sign up for his mailing list. The book isn't too technical but is best read after you've played around with some ML projects of your own.
Interviews With real data scientists
The Data Science Handbook [Check price on Amazon] and the Data Analytics Handbook are both books that interview leading data scientists, who share stories about their career, insights from their jobs, and advice for aspirating data scientists. There's almost no overlap between the data scientists interviewed, so check out both! Disclosure: I am a co-author of the The Data Scientist Handbook!
Build Data Science Teams
These books are appropriate for those starting their own data science team, or executives that are investing in building out a data organization. All three of these books have digital versions available for free. Data Driven: Creating a Data Culture [Check price on Amazon] are written by two of the highest-profile data scientists in the US: Hilary Mason and DJ Patil. Understanding the Chief Data Officer is a survey to understand how large corporations have adopted data science. Building Data Science Teams [Check price on Amazon] was written by DJ Patil, and was one of earliest books on data science teams (published September 2011).
Is there a free data science book that you really like, but isn't on here? Is there a book here that you really didn't like? I will occasionally update this list and add new books to make sure that this page represents the best free data science books available!
I welcome all comments, feedback, and suggestions! Contact me via the form here.