Speed Reading On Machine Learning

Manav Sehgal
Manav Sehgal
Published in
7 min readApr 6, 2017

--

Self-mastering Machine Learning requires a lot of hard work. However, if you are an impatient learner like me, you want to dive-in-head-first, do some objectives based learning, get fundamental concepts explained quickly but comprehensively, explore complex topics interactively, try things on your own, then go deep into topics of interest.

With this objective, I am compiling Speed Reading On Machine Learning. A list of introductory, intermediate to advanced reading (and viewing) materials with a somewhat linear path of developing your understanding on this subject. The reading sources have made it to the list based on their credentials. I mention these alongside.

Here is the suggested learning path.

  1. Get Excited. Dive head-first to visually explore typical machine learning problems and solutions using interactive demos.
  2. Build Curiosity. Establish machine learning objectives for learning topics. Learn by watching how the best data scientists and experts solve for ML competitions.
  3. Formal Fundamentals. These are like the Newton’s Three Laws, the heuristics of Machine Learning, the taxonomy.
  4. Maths and Stats. Learning the pre-requisites for the under-the-hood understanding of how the algorithms really work.
  5. Popular Algorithms. There are many algorithms used in ML. Deep-dive into a select few which are most popular among data scientists. Also, learn how to select the right algorithms for the problem at hand.
  6. Programming ML. Learn about programming languages, frameworks, APIs specific to ML.
  7. Tools Aided Active Learning. Many of the modern data science and ML tools actually aid active learning by doing.

Visual explorations in machine learning

These are some of the best interactive, browser-based introductions to machine learning, sequenced in suggested order of learning and increasing complexity.

Classifying or predicting based on a dataset comprising of house prices (and other features like area and location) is one of the typical “Hello World” machine learning problems. Here is one of the best, animated visualizations explaining the application of machine learning to automatically classify houses based on certain features.

MNIST dataset is a fundamental machine learning training dataset of handwritten figures or numbers, used for image recognition problem-solving. The ConvNetJS demo is a browser based visual simulation of how the algorithm processes the MNIST dataset.

Image recognition and more specifically face recognition is one of the “magical” aspects of machine learning which has evolved into day-to-day products (you smartphone camera) and tools (Facebook and Google images).

The tutorial on Face Recognition with Deep Learning is a visual introduction explaining the workings of algorithms used with animated diagrams.

Competitions in data science and machine learning

The leading data science, machine learning, competitions community. Data scientists and ML experts compete with each other to solve challenges including disaster management, detecting cancer, helping the environment, and predicting house prices.

I have authored a popular (+26,000 views, +1,000 forks as on April 2017) interactive tutorial and Python Notebook walking through a typical data science solutions workflow followed when solving for Kaggle competitions. The comprehensive tutorial covers most of seven workflow stages.

  1. Question or problem definition.
  2. Acquire training and testing data.
  3. Wrangle, prepare, cleanse the data.
  4. Analyze, identify patterns, and explore the data.
  5. Model, predict and solve the problem.
  6. Visualize, report, and present the problem-solving steps and final solution.
  7. Supply or submit the results.

Andrew Ng on machine learning

Andrew Ng is VP & Chief Scientist of Baidu; Co-Chairman and Co-Founder of Coursera; and an Adjunct Professor at Stanford University. He founded and led the “Google Brain” project which developed massive-scale deep learning algorithms. Andrew Ng Stanford Course lectures are available on YouTube.

The YouTube playlist is more than 22 hours of recording classes as taught to Stanford students in Machine Learning. The content is also available as a Coursera course on machine learning. Lecture notes from the CS229 Stanford class are available as PDF files on Stanford’s website.

Maths and Stats

Before jumping into algorithms or even prior to learning fundamentals of Machine Learning (prior section) one may need to brush up on some Maths and Stats skills.

The Graphical Algebra is an interesting presentation that explains mathematical concepts including quadratic equations, log functions, affine transformations, and Fourier transforms. It does so interactively and visually.

For instance you can see Fourier transforms of your voice visualized in real-time as you “speak” to your laptop’s mic! Or, listen to some cool music while it is visualized as a Fourier Transform.

You can watch an interesting talk by the creator Steven Wittens.

The setosa project explains several Maths concepts visually including regression, PCA, Eigenvectors, Markov chains, and conditional probability.

The visual guide to statistics is an awesome introduction to key statistics concepts.

You can learn about probability, regression, distributions, and statistical inference.

Visualizing algorithms

Not all of these algorithms are directly applicable to ML, however many can be used for extending into specific domains like digital signal processing (IoT, voice recognition) and complex information visualization.

Mike Bostock is the creator of D3, one of the best JavaScript visualization libraries.

He has develop a set of algorithm visualiztions which are available as an interactive demo and a video presentation.

David Mimno from Cornell presents visualisations for ML algorithms in his 2014 presentation.

If you are interested in applying Digital Signal Processing in your ML project then Seeing Circles, Sines, and Signals will be an excellent start to the topic.

XGBoost algorithm

XGBoost is one of the most popular and successful algorithms used within winning solutions for Kaggle competitions. The introduction to boosted trees tutorial on official website of XGBoost starts with fundamentals of Machine Learning algorithms and then eases into relatively complex aspects of decision trees and gradient boosting.

Gradient Descent algorithm

Andrew Ng starts his Machine Learning lectures by explaining Gradient Descent algorithm as one of the first. The Google Brain team along with Y-Combinator Research have created a publishing platform for research papers that are interactive and relatively easily explain complex topics.

The recently published paper on Momentum as a property of ML algorithms like Gradient Descent is an intermediate to an advanced discussion on this topic.

Neural Networks

Colah’s blog is one of the best sources of research papers on Neural Networks using visual explanations.

Stanford CS231n course on CNN for Visual Recognition course notes include explanation of several algorithms including Stochastic Gradient Descent, Support Vector Machine and softmax, and of course backpropagation and neural nets.

Deep learning

Deep learning is one of the hottest areas of interest within ML. RNN or Recurrent Neural Nets are considered among the most used algorithms for deep learning.

This visual interactive simulation explains how RNNs work.

The Stanford tutorial on Unsupervised feature learning and Deep learning has co-contributors including Andrew Ng among other notable experts.

MIT Press book on Deep Learning starts with mathematical and machine learning foundations and dives into Deep learning research.

A guide to deep learning by Yerevann is very comprehensive and covers some of the resources highlighted in Speed Reading On Machine Learning.

Just released Stanford CS224N/Ling284 Natural Language Processing with Deep Learning video recordings of classes with around 19 videos in the series taught by Chris Manning and Richard Socher.

Programming ML

One of the best ways to start programming ML is to work with interactive notebooks like those provided by Anaconda Jupyter.

I am authoring a series of reusable starter notebooks for ML and data science available for download from GitHub.

You can then guide your solutions by using algorithm maps and flow-charts created by ML frameworks, libraries and API providers.

Python Scikit-learn is one of the leading ML libraries which provides an interactive map of machine learning to guide your selection criteria for which algorithm to use.

Microsoft has created a similar cheat sheet for their ML APIs available as a PDF download.

Scipy lectures is a compilation of tutorials on using Python for scientific computing.

Tools Aiding Active Learning

There are several tools available in the ML workflow which aid active learning by doing.

RapidMiner is a data science and ML platform. It enables you to “draw” ML workflow visually starting with datasets and dragging and dropping ML algorithms to work on these datasets. The tool documents more than 60 algorithms and encourages interactive, iterative devlopment of ML solutions.

Work in progress

This compilation is always “Work-In-Progress” as it may be updated based on newly available reading sources or as suggested in comments and annotations by the readers.

Please like, share, comment if you want us to further develop the Speed Reading On Machine Learning resource.

--

--