Machine learning
I selected those resources that are more suitable for beginners together with the parts of machine learning that I like the most.- You can start with this introduction to data mining by Saed Sayad (University of Toronto). I found the first diagram particularly interesting.
- This glossary of machine learning terms is the best that I’ve found so far.
- An introduction to machine learning in 10 pictures is a short still great article to start with.
- Xavier Amatriain, one of the minds behind Netflix’s famous recommendation system, explains what are the advantages of different classification algorithms.
- Don’t miss this list of machine learning podcasts.
- Introduction to Recommender Systems is a 4-hour lecture of the 2014 Machine Learning Summer School at CMU. You can find other interesting machine learning lectures from the same summer school and other programs in Alex Smola’s YouTube channel.
- The Elements of Statistical Learning is a classic book ideal to understand the foundations of many machine learning methods.
- Tex Mining with WEKA cookbook for those who prefer Java.
- “Machine Learning Gremlins” is a presentation on common machine learning mistakes by Ben Hamner (Kaggle).
- Because we don’t always need exact answers, this introduction to stream mining by Mikio Braun can be very useful to you.
- If you want a wider vision of artificial intelligence, these lectures from the AI course taught at MIT by Patrick Winston.
- The lectures of the course “CS273a: Introduction to Machine Learning” by Prof. Alex Ihler (UCI) are available on Youtube.
- Choosing a machine learning model can be a cumbersome task. That’s why we have automatic machine learning to assist model selection. These slides are a good entry point to it.
- For a good picture of the state of the art of neural networks and deep learning, you can find tutorials and workshops of the NIPS 2014 conference in this YouTube channel. You can also find this summary of the conference by John Platt (Microsoft Research).
Statistics
- Pretty handy resource to explain statistical significance: how to Assess Statistical Significance.
- Top 10 big ideas covered in the Probability course at Harvard by Joe Blitzstein. You can also watch on Youtube the lectures of this course.
- Learn more about errors in hypothesis testing (statistical significance and power) from this lecture on Data Collection and Statistical Inference by Aaron Gullickson.
- What to do when data is missing? Learn what statisticians working in clinical trial field do.
- Introduction to Time Series Analysis from the book Engineering Statistics.
- This article talks about how to optimize decisions beyond A/B testing, including an introduction to the multi-armed bandit problem and the epsilon-greedy strategy.
- Jeff Rajeck has a series of posts titled “using data science with A/B tests”. I particularly enjoyed the one covering Bayesian analysis.
- Brian Caffo is one of the lecturers of the Data Science specialization on Coursera and his YouTube channel is full of resources to learn statistics.
- Some statistical concepts that data scientists usually overlook by Chris Fonnesbeck at SciPy 2015.
Python
Once you are familiar with Python, the following resources for machine learning and data analysis can take your skills to the next level:- Video tutorials to learn how to use Python’s scikit-learn library to perform machine learning by Kevin Markham.
- 3h+ in-depth introduction to machine learning with scikit-learn by Kyle Kastner (Université de Montréal) and Andreas Mueller (NYU Center for Data Science).
- Machine learning cheat sheet for scikit-learn by Andreas Mueller.
- If you are interested in using neural networks in Python, Daniel Nouri explains how to solve the Facial Keypoint Detection Kaggle challenge using L....
- If you don’t have a technical background, you’ll find very useful the scripts that you can find in Practical Business Python.
- Notebook Gallery: links to the best IPython and Jupyter notebooks submitted by users.
- Recipes of the IPython Cookbook include excellent examples of how to use NumPy, scikit-learn and many other packages.
- Code snippets of some of the most common operations with Pandas.
- Make your first machine learning predictions using Python with this Kaggle tutorial.
- NLTK is the most popular library for natural language processing in Python. This presentation can give you a good overview of what you can do with it and this 1 hour tutorial will show you what you can do with it.
- PyDataTV is the YouTube channel of the PyData conferences. You can find keynotes, talks and workshops on how to use the PyData stack.
R
I’ve been trying hard to like R. It’s been in fact more than 5 years of trying to like it and I just simply prefer Python. In any case, I still frequently launch an R prompt to use some fantastic packages that R has.- Intro to R is a playlist by Google Developers that explains all the basics of the language.
- Kaggle top ranker Xavier Conort listed 10 R Packages to win Kaggle competitions. That’s a good way to discover some very prominent R packages.
- An Introduction to Statistical Learning with Applications in R is a terrific free book full of examples.
- “R: the good parts” is an article by Jose Quesada (Data Science Retreat) that lists the main advantages of R with links to other good resources.
- Archetypal analysis is not usually taught in introductory machine learning courses.This post explains how to apply it and shows that it outperforms kmeans in a number of cases. Plus, archetypal analysis is easier to interpret.
- AnomalyDetection and BreakoutDetection: open source R packages for time-series analysis by Twitter.
- qdap is not only one of the best packages for natural language processing in R, but also one of the best documented. Use the vignette to get started with it and later on the manual.
- I know from my own experience that R’s memory limitations can give you a headache. These tricks are sometimes an effective painkiller. Also the slides “Taking R to the Limit: Large Datasets” might help.
- statsTeachR is a repository of lessons for teaching statistics using R.
- Make your first machine learning predictions using R with any of these four tutorials.
Applying data science to your organization
To end with, some examples on how data science and machine learning can be used to add value to your organization:- Jeff Leek (Johns Hopkins University) shared some interesting learnings in his post10 things statistics taught us about big data analysis.
- What Data Science can do for entrepreneurs? Growth, retention, product customization and marketing optimization
- How to start data science initiatives in a lean and cost-effective way
- This paper explains how Booking.com uses crowdsourced data and machine learning to suggest the destination of the next trip.
- Someone asked how Quora uses machine learning and answers are very representative of how a website can benefit from using it.
- Airbnb guest requests are 4% more likely to be accepted after they used collaborative filtering to predict host’s behavior. They also used data to understand what their users want and show more relevant results to ....
- This paper describes how to do customer segmentation for customer retention using decision trees.
- How Spotify uses deep learning to recommend music is well documented in this post.
- How Google transcribed house numbers from Street View using neural networks.
- Predicting consumer credit-risk performance at the beginning of the... (Paper).