Generate synthetic mixtures from pure spectra

In a previous post on Multivariate Curve Resolution (MCR), we looked at techniques to extract the contribution of individual components from the spectra of mixtures. This is a companion post, showing how to generate a synthetic mixture of Raman spectra of minerals.

The pure spectra are taken from the RRUFF Project, containing Raman spectra of a large number of minerals.

The script presented here takes three compounds (directly from the file saved from the database) and mixes their Raman spectra in a random way, with the only constraint that the individual concentrations add up to 1.

Mixtures from pure spectra

The script assumes you have the spectra from three compounds, saved as txt files, but it can be generalised to an arbitrary number of compounds.

For this example, we chose three minerals:

Albite – Download data
Quartz – Download data
Orthoclase – Download data

The scripts generates 300 random mixes and add the pure spectra of the three compounds at the end (i.e. 303 spectra in total). The data is then saved as csv file.

We start with the imports, and then load the files

import numpy as np

import matplotlib.pyplot as plt

from scipy import interpolate

import pandas as pd

fn1 = "Albite__R040129__Broad_Scan__785__0__unoriented__Raman_Data_RAW__14316.txt"

fn2 = "Quartz__R040031__Broad_Scan__780__0__unoriented__Raman_Data_RAW__16501.txt"

fn3 = "Orthoclase__R040055__Broad_Scan__785__0__unoriented__Raman_Data_RAW__18458.txt"

fn = [fn1, fn2, fn3]

names = []

lines = []

for i,j in enumerate(fn):

with open(j) as f:

lines.append(f.readlines())

names.append(j.split("__")[0])

Next, we create lists of three elements, containing spectra and wavenumber arrays for the three minerals.

wavenumb = []

raman = []

for i,j in enumerate(lines):

wn = []

rm = []

for l, line in enumerate(j[10:]):

if line == '##END=\n':

break

else:

wn.append(float(line.split(',')[0]))

rm.append(float(line.split(',')[1]))

wavenumb.append(wn)

raman.append(rm)

You will notice that, in general, the spectra don’t span the same wavenumber range, and don’t have the same resolution. To generate synthetic spectra from mixtures of these pure spectra, we need to make sure that the spectra are interpolated to the same wavenumber range.

# Get minimum and maximum wavenumber for each pure spectrum

mins = []

maxs = []

for i,j in enumerate(wavenumb):

mins.append(np.array(j).min())

maxs.append(np.array(j).max())

# Select a wavenumber range that is covered by all scans

wn_min = np.max(mins)

wn_max = np.min(maxs)

n_points = 1500

wn_interp = np.linspace(wn_min,wn_max,n_points)

# Interpolate spectra to the same wavenumber bins

raman_interp = []

for i,j in enumerate(wavenumb):

tck = interpolate.splrep(j, raman[i], s=None)

raman_interp.append(interpolate.splev(wn_interp, tck, der=0))

Now we are finally read to calculate n = 300 random synthetic mixtures. For each sample, we set the first concentration via a random number in the range (0-1). Then we set the second concentration by extracting another random number in the range from 0 up to the previous value. The concentration of the third component will then be fixed by imposing that the sum of the three concentrations is unity.

n = 300

concentration = []

cumulative_conc = np.zeros(n)

pure_conc = np.zeros((len(raman_interp), len(raman_interp)))

for i,j in enumerate(raman_interp):

if i == 0:

conc = np.random.rand(n)

concentration.append(conc)

cumulative_conc += conc

elif (i > 0) and (i < len(raman_interp)-1):

max_conc = 1.0 - cumulative_conc

conc = max_conc*np.random.rand(n)

concentration.append(conc)

cumulative_conc += conc

else:

conc = 1.0 - cumulative_conc

concentration.append(conc)

cumulative_conc = cumulative_conc + conc

pure_conc[i,i] = 1

concentration = np.hstack((np.array(concentration), pure_conc))

Finally, we convert the array into a data frame and export a CSV file

df = pd.DataFrame()

mixed_spectra = np.zeros((n+len(raman_interp), wn_interp.shape[0]))

for i,j in enumerate(names):

df[j] = concentration[i,:]

for k in range(mixed_spectra.shape[0]):

mixed_spectra[k,:] += concentration[i,k]*raman_interp[i]

df[wn_interp] = np.around(mixed_spectra,4)

df.to_csv("mixed_spectra.csv")

An example of data generated with this script is available at our Github repository. And, as mentioned, head to our previous post on Multivariate Curve Resolution (MCR) to learn how to invert this process, i.e. how to extract the spectra of the pure components from the spectra of the mixtures.

As always, thanks for reading and until next time.

Daniel

Feature image from msvr on Pexels.

Cookie	Duration	Description
cookielawinfo-checkbox-analytics	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checkbox-functional	11 months	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checkbox-necessary	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-others	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-performance	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
viewed_cookie_policy	11 months	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.

Mixtures from pure spectra

About The Author

Daniel Pelliccia

Mixtures from pure spectra

About The Author

Daniel Pelliccia

Related Posts