I am trying to use spectral data from a fluorometer to quantify a certain substance in my samples.
I have created a large table with the substance concentration [C] in one column, and the different columns are the wavelengths measured [w1, w2... w3,000]
each row is a sample where the concentration is known and the spectra is measured.
I used PLS to try and make a prediction model, but because my substance comes in different orders of magnitude, I used the log10[C] instead of C. I managed to get good results but my supervisor alerted me that this doesn't make since because fluorescence is LINEARLY CORRELATED WITH CONCENTRATION (beer's law). When I tried to make a model using the raw C, rather than log10[C] I got worse results.
Furthermore, and this is really my question, when I tryed to make a PLS model of log10[w] against log10[C] I got exactly the same predictions (R^2=0.997 between a prediction based on raw W and log10[W]). I think that somehow the PLS is doing the log-trnsformation by itself, perhaps in the scaling or centering... Does anyone know if this is correct? how come you can make a PLS model on semi-logged data and get the same results with Log-Log data?
Professor had too much Beer. (really trying to work a pun here)
Absorbance is on a log scale and transmittance is on a linear scale. The log transform makes sense.
Flourescence might be linearaly correlated with concentration but Absorbance will be log-linearly correlated with concentration.
I hope you are setting up the model in the Partial Least Squres platform, that alone will save you a few steps.
In PLS the response is flourescence (or that ever the other method is) and then each wavelength you scan is column, and all those go in as X Factors. (one row for each observation, and hopefully you've taken more than 2 or three measurements.)