Qualitative analysis of ground coffee with NIR spectroscopy

Coffee flavour is a very complicated mix of so many different elements: variety, roasting, grinding and brewing all play a unique part in making your perfect cup of coffee. In this post, I’ll discuss how NIR spectroscopy can be used to qualitatively distinguish between different types of ground coffee.

The gist of this post is to apply a Principal Components Analysis (PCA) decomposition to NIR spectra of roasted, ground coffee and visualise the results to evaluate a rough classification. We won’t discuss or compare different classification methods as the results would be of little significance given the small sample size. Rough as they may be however, it’s easy to understand that with little knowledge of the coffee samples one can be a fairly good classification that can have many uses. For instance it may provide a way to quickly detect adulteration, composition, degree of roasting and sensory perception of coffees. Sensory analysis is especially attractive: predicting the taste of the coffee by measuring NIR spectra of beans or ground coffee blends would be an interesting application.

Well, let’s come back to the ground and set ourselves up for a simpler first step: qualitative classification ground coffee based on NIR spectra. For that I decided to use ground coffee of consistent quality (especially milling size) which has been already classified based on flavour strength. I chose Aldi Expressi coffee capsules (I know, I know), which comes in several varieties, each defined by an ‘intensity’ scale. Differences across the scale are due to different degree of roasting, and different coffee variety (Arabica and/or Robusta) as well as country of origin.

Setting up the questions and methodology

We used NIR analysis to try and answer three simple questions:

Can we distinguish the content of different capsules from one another?
Can we separate coffees based on caffeine content?
Can we distinguish coffees based on chlorogenic acids content?

Question number 3 is especially interesting, as chlorogenic acid plays a role during roasting and influence the bitterness and the acidity of the coffee beverage.

OK, that’s the list of all capsule types we used, along with some information we gathered from the Aldi website:

Name	Intensity	Roast	Variety
Tauro	5	Medium	Arabica
Renzo	8	Medium-dark	Arabica
Reggio	9	Medium-dark	NA
La Spezia	11	Dark	Arabica + Robusta
Torino	11	Dark	Arabica
Abruzzo	12	Dark	Arabica + Robusta
Calabrese	13	Dark	Arabica + Robusta

NIR analysis was done in the wavelength range was 1100-2300 nm with steps of 2 nm. For each sample we took 10 readings, each reading was the average of 20 scans. The data is available at our GitHub repository.

The data collected were reduced by Principal Components Analysis in Python. We chose the first 3 principal components (which we called PC1, PC2, PC3), and plotted the reduced data as a 3D scatter plot. As anticipated, rather than comparing different classification algorithms (data is definitely not sufficient), we just colour-code the decomposed data and look for trends

Data and code examples

Let’s begin with the basic imports and the exploratory analysis of the data.

import pandas as pd

import numpy as np

import matplotlib.pyplot as plt

from scipy.signal import savgol_filter

from sklearn.preprocessing import StandardScaler, LabelEncoder

from sklearn.decomposition import PCA

from mpl_toolkits.mplot3d import Axes3D

url = 'https://raw.githubusercontent.com/nevernervous78/nirpyresearch/master/data/coffee_classification.csv'

data = pd.read_csv(url)

labels = data['Coffee Type']

y = LabelEncoder().fit_transform(labels)

X = -np.log(data.values[:,1:].astype('float32'))

Xc = X - X.mean(axis=0)

X1 = savgol_filter(X, 11, polyorder = 2, deriv=1)

wl = np.linspace(1100,2300, X.shape[1])

colors = [plt.cm.jet(float(i)/max(y)) for i in y]

with plt.style.context(('seaborn-whitegrid')):

for i,j in enumerate(colors):

plt.plot(wl, X1[i,:], c=j, alpha=0.5)

plt.xlabel('Wavelength (nm)')

plt.ylabel('First derivative - NIR absorbance')

plt.show()

We plot the first derivative of the absorbance spectra, with this result

Sample code to generate the 3D scatter plot of the PCA coefficients is below

# PCA decomposition

pca = PCA(n_components=3)

Xpca = pca.fit_transform(StandardScaler().fit_transform(X1))

## 3D Scatter plot

unique = list(set(y))

colors = [plt.cm.jet(float(i+1)/(max(unique)+1)) for i in unique]

with plt.style.context(('seaborn-whitegrid')):

fig = plt.figure(figsize=(10,9))

ax = fig.add_subplot(111, projection="3d")

for i, u in enumerate(unique):

xi = [Xpca[j,0] for j in range(len(Xpca[:,0])) if y[j] == u]

yi = [Xpca[j,1] for j in range(len(Xpca[:,1])) if y[j] == u]

zi = [Xpca[j,2] for j in range(len(Xpca[:,2])) if y[j] == u]

ax.scatter(xi, yi, zi, color=colors[i], s=80, label=str(u))

ax.view_init(10, 40)

ax.set_xlabel('PC1')

ax.set_ylabel('PC2')

ax.set_zlabel('PC3')

plt.legend(labels.unique(),loc='upper left')

plt.show()

The 3D scatter plot will colour-code the data poibts according to the coffee type that is passed with the labels. The plots will look similar to the one reproduced below.

NIR analysis of ground coffee

Question 1: Can we distinguish between capsules?

To answer this question we used the full wavelength range for the PCA analysis, and here’s the result. As you can see, the different coffee types tend to cluster apart, where the medium roasts sit at the right hand side of the chart and the dark roasts towards the left hand side. Difference between individual intensity value is also very clear, representing the underlying difference in compositions.

Question 2: Can we distinguish between caffeine content?

Here we used the 1650-1800 nm band, that according to [1] and [2] contains a strong overtone from the caffeine compound. The measurements from the different capsules spread nicely out following increasing caffeine content. The direction of the arrow comes from the well known fact that the darker roasts contain less caffeine than the lighter roasts.

Question 3: Can we distinguish between chlorogenic acids content?

Here we used the 1400-1600 nm band that, according to the same references cited above, should contain strong signal coming from chlorogenic acids. Sure enough we can see that the darker roasts contain increasing amount of chlorogenic acids, giving the coffee a more bitter and pungent taste.

That’s it for our brief foray into coffee type classification. As anticipated, I avoided any quantitative analysis given the small size of the dataset. The main idea is that NIR analysis is well-placed to provide a quantitative classification of ground coffee types, even with a relatively simple approach. And, of course, the same approach can be used for other classification problems involving NIR spectroscopy.

I hope you found this post useful. Please don’t be shy and share it with friends and colleagues!

Until next time.
Daniel

References

D. F. Barbin et al. Application of infrared spectral techniques on quality and compositional attributes of coffee: An overview.
J. F. Ribeiro et al. Chemometric models for the quantitative descriptive sensory analysis of Arabica coffee beverages using near infrared spectroscopy.

Cookie	Duration	Description
cookielawinfo-checkbox-analytics	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checkbox-functional	11 months	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checkbox-necessary	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-others	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-performance	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
viewed_cookie_policy	11 months	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.

Setting up the questions and methodology

Data and code examples

NIR analysis of ground coffee

Question 1: Can we distinguish between capsules?

Question 2: Can we distinguish between caffeine content?

Question 3: Can we distinguish between chlorogenic acids content?

References

About The Author

Daniel Pelliccia

Setting up the questions and methodology

Data and code examples

NIR analysis of ground coffee

Question 1: Can we distinguish between capsules?

Question 2: Can we distinguish between caffeine content?

Question 3: Can we distinguish between chlorogenic acids content?

References

About The Author

Daniel Pelliccia

Related Posts