22 Free Data Science Books

Last updated April 13, 2021

To help people exploring the data science career track, I've taken some time to compile my top recommendations of quality data science books that are either available for free courtesy of the author(s). I include the last updated date of the book in parentheses.

If you did enjoy the book, there are usually options to purchase a hard-copy of the book to support the authors.

During COVID-19, Springer has also released a large number of its textbooks for free.


Python and R

The start of your journey is where the resources are the most plentiful. I’d recommend these books to get started with Python for data science. The Python books especially are more dynamic web resources than “books” - I think this format actually works better than a traditional ebook for learning Python.

  • Python for Everybody (2016) is a gentle introduction to Python for beginners, complete with videos and course materials as well. Available under a creative commons license.

  • Python 101 (2019) is an online book that starts with Python’s basics but ramps up to more advanced topics.

  • Python Data Science Handbook (2016) is available on GitHub for free, and includes both the text and accompanying Jupyter notebooks. The textbook walks you through the standard Data Science operations in Python, including using a notebook, manipulating data, visualizing data, and building some common models. Available under a creative commons license.

  • Pandas Cookbook (2020) is a hands-on introduction to Pandas that focuses on common things one might have to do when manipulating, exploring, and cleaning data. Available under a creative commons license.

I’d recommend these books to get started with R:

If you had to pick one between Python and R, Python is generally more suitable for environments that involve eventual productionization into software, whereas R is more streamlined and has better packages / tooling for pure data analysis and exploration. Many comparisons between the two are available on Quora at Which is better for data analysis: R or Python?


Probability, Statistics, and Bayesian Methods

  • Introduction to Probability, 2nd Edition (2019). This is the official textbook of Harvard’s Stat 110, which I had the honor of being a teaching fellow for. The second edition is available for free at probabilitybook.net. Also check out the companion Probability Cheatsheet.

  • OpenIntro Statistics, 4th Edition (2019). This is a high-quality and full textbook available for PWYW, and covering statistics topics all the way from some basics to some more advanced topics (like power calculations).

  • Think Stats, 2nd Edition (2014). Here, you'll start off plotting and understanding distributions, and learning about hypothesis testing and regression. This book contains Python applications.

  • Think Bayes (2012). Here, you'll play with conditional probabilities and priors. This book contains Python applications.

  • Bayesian Methods for Hackers (2020). You will play with more advanced Bayesian algorithms such as multi-armed bandits and MCMC. This book contains Python applications.


Experimental Design

The first two chapters of Design and Analysis of Experiments (2010) covers most of what you need to know about A/B Testing. The rest is more advanced.

For a survey into the nuances of applying experimental design in practice, check out the 42-page paper Controlled experiments on the web: survey and practical guide (2008), written by practitioners formerly/currently on the Microsoft Analysis and Experimentation team. It’s an old paper but still a great overview.


Statistical Inference and Learning

These three books include authors from some of the most respected academics in the statistical learning space.

  • Computer-age Statistical Inference (2016) is by reputable Statistics professors Bradley Efron and Trevor Hastie. It covers various topics in statistical inference that are relevant in this data-science era, with scalable techniques applicable to large datasets.

  • An Introduction to Statistical Learning (2017) is a more approachable and accessible version to the original "The Elements of Statistical Learning". Play around with its applications in R, and check out the richness of the accompanying MOOC.

  • The Elements of Statistical Learning (2017) was the original Statistical Learning textbook, and is highly-regarded in the statistics and machine learning community. It should give you a thorough background in statistical learning, although is noticeably more advanced.


Machine Learning, DEEP LEARNING, and Data Mining

These three books by highly respected academics / practitioners, and cover some of the most popular techniques in data mining and machine learning today. The previous section, Statistical Learning, covers machine learning from the perspective of statisticians: creating statistical valid models of the data that can be used for predictions. This section, practical machine learning / data mining, deals more with the need to extract information and make predictions from large datasets.

  • Mining of Massive Datasets, 3rd Edition (2020) is based off of Stanford's eponymous class, and covers popular problems such as recommendation systems, PageRank, and social network analysis. Learn more about the book and the class at http://www.mmds.org/.

  • Machine Learning Yearning (2018) by Andrew Ng, is aimed at practical considerations for people developing ML systems. The book isn't too technical but is best read after you've played around with some ML projects of your own.

  • Deep Learning (2016) is written by some of the pioneers of the field, but will get quite heavy on the math. An HTML version of the book is available for free from their website.


Practicing Data Science


BONUS: Data Science for Children