Category: Regression
A Python script to generate random synthetic mixtures from pure spectra from a database. This post can be used as a tutorial to generate datasets for Multivariate Curve Resolution.
A short survey of diagnostic plots, as tools to dig deeper on the assumptions behind a regression model.
A implementation of a genetic algorithm for wavelength selection using basic Numpy functions.
Wavelength selection methods aim at choosing the spectral bands that produce the best regression or classification model. Here we introduce a genetic algorithm for wavelength selection.
Updated code and additional utility scripts for PLS regression. Will keep it updated as we go.
Multivariate Curve Resolution deals with spectra, or other signals, from samples containing multiple components, and aims at recovering the pure components.
An introductory tutorial on optimisers for deep learning, including Python code for a regression training for NIR spectroscopy.
The process of developing and optimising a regression model, almost invariably requires a sequence of steps. These steps can be combined in a single predictor using the Pipeline function …
Using parallel computation to speed up cross-validation analysis for large data sets.
Where we discuss the meaning of an activation function in neural networks, discuss a few examples, and show a comparison of neural network training with different activation functions.
This post introduces basic Python code to build fully-connected deep neural networks with TensorFlow for regression analysis of spectral data.
The Akaike Information Criterion (AIC) is another tool to compare prediction models. AIC combines model accuracy and parsimony in a single metric and can be used to evaluate data …
What is the minimum amount of information required to export and re-use a linear regression model? The answer is surprisingly simple. Here's a step by step example using PLS …
Backward Variable Selection for PLS regression is a method to discard variables that contribute poorly to the regression model. Here's a Python implementation of the method.
The Concordance Correlation Coefficient (CCC) can be useful to quantify the quality of a linear regression model. In this tutorial we explain the CCC and describe its relation with …
Bias-Variance trade-off refers to the optimal choice of parameters in a model in order to avoid both overfitting and underfitting. Let's look at a worked example using PLS regression.
Improve the performance of a PLS method by wavelength band selection using Simulated Annealing optimisation.
Simulated annealing helps overcome some of the shortcomings of greedy algorithms. Here's a tutorial on simulated annealing for principal components selection in regression.
Greedy algorithms are commonly used to optimise a function over a parameter space. Here's an implementation of a greedy algorithm for principal components selection in regression.
Not all wavelengths are created equals. A moving window PLS algorithm optimises the regression by discarding bands that are not useful for prediction.
Cross-validation is a standard procedure to quantify the robustness of a regression model. Compare K-Fold, Montecarlo and Bootstrap methods and learn some neat trick in the process.
Want to get more out of your principal components regression? Here's a simple hack that will give you a stunning improvement on the performance of PCR.
Principal components regression is a staple of NIR analysis. Ridge regression is much used of machine learning. How do they relate? Find out in this post.
Not every data point is created equal. In this post we'll show how to perform outliers detection with PLS regression for NIR spectroscopy in Python.
Improve the quality of your PLS regression using variable selection. This tutorial will work through a variable selection method for PLS in Python.
Step by step tutorial on how to build a NIR calibration model using Partial Least Squares Regression in Python.
An in-depth introduction to Principal Component Regression in Python using NIR data. PCR is the combination of PCA with linear regression. Check it out.