In a previous post on Multivariate Curve Resolution (MCR), we looked at techniques to extract the contribution of individual components from the spectra of mixtures. This is a companion post, showing how to generate a synthetic mixture of Raman spectra of minerals.
The pure spectra are taken from the RRUFF Project, containing Raman spectra of a large number of minerals.
The script presented here takes three compounds (directly from the file saved from the database) and mixes their Raman spectra in a random way, with the only constraint that the individual concentrations add up to 1.
Mixtures from pure spectra
The script assumes you have the spectra from three compounds, saved as txt files, but it can be generalised to an arbitrary number of compounds.
For this example, we chose three minerals:
- Albite – Download data
- Quartz – Download data
- Orthoclase – Download data
The scripts generates 300 random mixes and add the pure spectra of the three compounds at the end (i.e. 303 spectra in total). The data is then saved as csv file.
We start with the imports, and then load the files
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 |
import numpy as np import matplotlib.pyplot as plt from scipy import interpolate import pandas as pd fn1 = "Albite__R040129__Broad_Scan__785__0__unoriented__Raman_Data_RAW__14316.txt" fn2 = "Quartz__R040031__Broad_Scan__780__0__unoriented__Raman_Data_RAW__16501.txt" fn3 = "Orthoclase__R040055__Broad_Scan__785__0__unoriented__Raman_Data_RAW__18458.txt" fn = [fn1, fn2, fn3] names = [] lines = [] for i,j in enumerate(fn): with open(j) as f: lines.append(f.readlines()) names.append(j.split("__")[0]) |
Next, we create lists of three elements, containing spectra and wavenumber arrays for the three minerals.
1 2 3 4 5 6 7 8 9 10 11 12 13 |
wavenumb = [] raman = [] for i,j in enumerate(lines): wn = [] rm = [] for l, line in enumerate(j[10:]): if line == '##END=\n': break else: wn.append(float(line.split(',')[0])) rm.append(float(line.split(',')[1])) wavenumb.append(wn) raman.append(rm) |
You will notice that, in general, the spectra don’t span the same wavenumber range, and don’t have the same resolution. To generate synthetic spectra from mixtures of these pure spectra, we need to make sure that the spectra are interpolated to the same wavenumber range.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 |
# Get minimum and maximum wavenumber for each pure spectrum mins = [] maxs = [] for i,j in enumerate(wavenumb): mins.append(np.array(j).min()) maxs.append(np.array(j).max()) # Select a wavenumber range that is covered by all scans wn_min = np.max(mins) wn_max = np.min(maxs) n_points = 1500 wn_interp = np.linspace(wn_min,wn_max,n_points) # Interpolate spectra to the same wavenumber bins raman_interp = [] for i,j in enumerate(wavenumb): tck = interpolate.splrep(j, raman[i], s=None) raman_interp.append(interpolate.splev(wn_interp, tck, der=0)) |
Now we are finally read to calculate n = 300 random synthetic mixtures. For each sample, we set the first concentration via a random number in the range (0-1). Then we set the second concentration by extracting another random number in the range from 0 up to the previous value. The concentration of the third component will then be fixed by imposing that the sum of the three concentrations is unity.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 |
n = 300 concentration = [] cumulative_conc = np.zeros(n) pure_conc = np.zeros((len(raman_interp), len(raman_interp))) for i,j in enumerate(raman_interp): if i == 0: conc = np.random.rand(n) concentration.append(conc) cumulative_conc += conc elif (i > 0) and (i < len(raman_interp)-1): max_conc = 1.0 - cumulative_conc conc = max_conc*np.random.rand(n) concentration.append(conc) cumulative_conc += conc else: conc = 1.0 - cumulative_conc concentration.append(conc) cumulative_conc = cumulative_conc + conc pure_conc[i,i] = 1 concentration = np.hstack((np.array(concentration), pure_conc)) |
Finally, we convert the array into a data frame and export a CSV file
1 2 3 4 5 6 7 8 9 10 11 |
df = pd.DataFrame() mixed_spectra = np.zeros((n+len(raman_interp), wn_interp.shape[0])) for i,j in enumerate(names): df[j] = concentration[i,:] for k in range(mixed_spectra.shape[0]): mixed_spectra[k,:] += concentration[i,k]*raman_interp[i] df[wn_interp] = np.around(mixed_spectra,4) df.to_csv("mixed_spectra.csv") |
An example of data generated with this script is available at our Github repository. And, as mentioned, head to our previous post on Multivariate Curve Resolution (MCR) to learn how to invert this process, i.e. how to extract the spectra of the pure components from the spectra of the mixtures.
As always, thanks for reading and until next time.
Daniel
Feature image from msvr on Pexels.