BIOMEDICAL JOURNAL OF PIROGOV RNRMU (MOSCOW, RUSSIA)
Identification of microorganisms by Fourier-transform infrared spectroscopy
Identification of microbial species is a routine task for clinical microbiology laboratories. Rapid identification of pathogens in patients with infections or sepsis is essential in prescribing an adequate antibiotic treatment. Efficient therapy for these aggressively progressing conditions is important since they are a common cause of postoperative morbidity and mortality in cardiac surgery  and maternal and neonatal death after childbirth .
Pathogens can be identified by both traditional microbiological tests and modern techniques now available worldwide, such as matrix-assisted laser desorption/ionization time-of-flight mass spectrometry (MALDI-ToF MS). The most popular spectrometers are MALDI BioTyper (Bruker; Germany) and Vitek MS (Biomerieux; France). They deliver fast and reliable results but are quite expensive and cannot be afforded by many hospitals. Besides, these devices are too bulky and, therefore, unsuitable for field use by biosafety agencies.
An alternative technique for the identification of microorganisms is Fourier transform infrared (FTIR) spectroscopy. It can determine the chemical composition of a studied culture and identify any type of macromolecule or low molecular weight compound. Similar to MALDI ToF MS, sample preparation for FTIR spectroscopy is easy: the culture sample simply needs to be mounted on a surface transparent to infrared light and left there to dry. Although FTIR spectroscopy ensures a quick diagnosis, its application in clinical microbiology is limited because the FTIR-spectra of a studied culture vary depending on the composition of the growth medium and culture growth phase. The aim of this work was to develop an algorithm for the reliable identification of microorganisms in pure cultures regardless of the growth medium or growth phase based on the analysis of their FTIR spectra.
Strains of human pathogens
In this work, we used the strains of the most common causative agents of infections and sepsis in humans, including S. aureus (20 MRSA and 47 MSSA isolates), E. faecalis (n = 10), E. faecium (n = 10), K. pneumoniae (n = 10), E. coli (n = 10), S. marcescens (n = 10), E. cloacae (n = 10), A. baumannii (n = 10), P. aeruginosa (n = 10), S. epidermidis (n = 10), and C. albicans (n = 10).
The pathogens were isolated from patients of Bakulev National Medical Research Center of Cardiovascular Surgery and Kulakov National Medical Research Center for Obstetrics, Gynecology and Perinatology. Isolation and identification were carried out according to the standard technique . Differential diagnostic media included mannitol salt agar selective for staphylococci, Enterococcus agar, Endo agar with fish hydrolysate for culturing gram-negative rods (Enterobacterales, A. baumannii, and P. aeruginosa), Sabouraud agar and meat- peptone broth supplemented with 1% glucose for C. albicans. Confirmatory identification of isolates was performed on the MALDI BioTyper mass spectrometer (Bruker; Germany).
Isolates deposited in the Cryobank were plated onto blood agar plates under aerobic conditions at 37 °С and cultured overnight.
Each isolate was grown in 4 different types of culture media: agarized or liquid, with or without blood. The bloodless media included egg-yolk salt, meat-peptone and Endo broths and Sabouraud agar. For FTIR spectroscopy, the samples were harvested 12, 24 and 48 hours after plating.
To protect the operator of the FTIR spectrometer from the risk of infection and to prepare the cultures for short- term storage, they were inactivated in 70% alcohol before spectroscopy. Thirty microliter aliquots of fresh liquid cultures were collected into polypropylene microtubes, supplemented with 70 µl of 96% ethanol and carefully mixed by pipetting. Samples of the cultures grown on solid agar were collected using the inoculation loop and then suspended in 70% aqueous ethanol (30 mg of the biomass per 100 µl of the alcohol solution).
Recording IR spectra of isolates
IR spectra were recorded from the suspensions of pathogen cultures fixed in 70% aqueous ethanol. To prepare individual samples for transmission IR spectroscopy, 5 to 19 µl of the suspension were micropipetted onto standard (12.5 mm long and 2 mm thick) ZnSe surfaces (Elektrosteklo; Moscow) and left to dry until complete evaporation of the ethanol (5–15 min). The spectra were recorded by the FTIR spectrometer Spectrum Two (Perkin-Elmer; USA) over the wavenumber range of 4000–600 cm-1 at 4 cm-1 optical resolution and 1 cm-1 digital resolution. The ZnSe surface with an applied pathogen sample was positioned vertically in the Microfocus holder (Perkin- Elmer; USA) and placed in the way of a horizontal probe beam generated by the IR source; 16 individual averaged scans were accumulated for about 2 min. The background spectrum of the spectrometer was recorded under the same conditions but with the clean ZnSe surface and updated before recording the IR spectrum of every new incoming sample.
After discarding the abnormal spectra, the number of spectra ready for further analysis totalled 347, including 188 spectra obtained from S. aureus (39 from these had MRSA phenotype; 48 had MSSA phenotype; and 101 were not characterized in terms of their drug-resistance) and 169 spectra obtained from other pathogens (14 from A. baumannii, 32 from C. albicans, 8 from E. cloacae, 21 from E. faecalis, 20 from E. faecium, 11 from E. coli, 17 from K. pneumoniae, 18 from P. aeruginosa, 8 from S. marcescens, 10 from S. epidermidis); 10 spectra represented mixed cultures: 2 were obtained from S. aureus (MRSA) + E. coli; 2 from S. aureus (MSSA) + E. coli; 2 from S. aureus (MRSA) + K. pneumoniae, 2 from S. aureus (MSSA) + K. pneumoniae, and another 2 from S. aureus (MSSA) + P. aeruginosa. fig. 1 illustrates the initial spectra used in the analysis.
Using routine spectroscopy algorithms, the initial spectra were preprocessed for unification; artifacts caused by drifts in the baseline or atmospheric carbon dioxide and water vapor fluctuations were eliminated. Briefly, the initial spectra were normalized to the average transmission, and the first derivative of the envelope was calculated; the relevant wavenumber ranges were narrowed down to 600–1800 cm-1 and 2800– 3000 cm-1. Because the initial spectra were of good quality and did not require any extra smoothing, the derivative was calculated using the symmetric difference formula at two points for numerical differentiation. fig. 2 shows the preprocessed spectra.
The preprocessed spectra were used to build a mathematical model for the identification of S. aureus in a culture sample. Another model capable of discriminating between MRSA and MSSA strains was constructed based on the spectra of MRSA and MSSA phenotypes of S. aureus. Both models exploited the spectra of pure cultures. The spectra of culture mixes were used for model validation.
Both models were built by applying the principal component analysis (PCA) and the linear discriminant analysis (LDA) . LDA  allows selecting a line or a hyperplane effectively separating two or more classes of the data. The ratio of between-class to within-class variances shows reliability of classification. However, matrix calculations in LDA do not allow direct application of the spectral data. Datasets must be characterized by a high number of correlated features and regions of poor informative value must be identified. LDA should be preceded by PCA to extract the most informative and uncorrelated spectral data from the dataset. The informative value of the method is assessed by variance: if the latter is low at a given wavenumber, almost all spectra here are expected to behave identically; therefore, such regions cannot provide any valuable information. In PCA, informative and uncorrelated data are extracted by projecting onto corresponding vectors (principal components). In fact, a model constructed with PCA-LDA is a projection of spectral data onto a new vector. In practice, it entails calculation of a linear combination with certain coefficients.
The built models were cross-validated . Cross-validation is a series of blind tests: the initial dataset is randomly partitioned into k subsets; one of the resulting subsets (the test dataset) is discarded, others k – 1 training sets are analyzed by PCA-LDA. The obtained model predicts the values for the test dataset as if they were initially unknown. This procedure is repeated for each of k subsets. Once the values predicted for all test subsets are averaged, one can make predictions about new unknown spectra. When splitting the spectra into the subsets, all spectra from the same isolate must fall within one subset only. Otherwise, predicted values will be higher than the actual ones. Cross-validation of our model for S. aureus identification was performed at k = 20, i.e. the total set of spectra was divided into 20 subsets so that the spectra of one and the same isolate always fell into one subset only. For the model discriminating between MSSA and MRSA strains, each strain was represented by an equal number of isolates (k = 40). Thus, each subset represented only one isolate.
Preprocessing and the analysis of the obtained spectral data were done in R  and the RStudio environment. Spectral data were handled by hyperSpec ; the models were built and validated using caret  and MASS . Images were created in ggplot2 .
Identification of S. aureus
Based on the spectra obtained from 11 species that are the most common causative agents of infections and sepsis in humans (S. aureus, S. epidermidis, E. faecalis, E. faecium, K. pneumoniae, E. coli, S. marcescens, E. cloacae, A. baumannii, P. aeruginosa, and C. albicans), we built a mathematical model for S. aureus identification. The accuracy of the model was assessed by cross-validation on 20 subsets (the spectra of one and the same isolate got into one subset only) and reached 98.4% (± 4%). However, further in-depth analysis of cross- validation results revealed that almost all errors arose from the S. epidermidis isolate getting into the test sample. This means that at the genus level the model performs well considering the size and composition of the training dataset. The spectra of other staphylococci (two S. epidermidis isolates) were too poorly represented in the training dataset to let the model make accurate predictions at the species level. The worst accuracy (81%) observed during cross-validation represented the case when the test dataset included the spectra of S. epidermidis leaving the training set with insufficient data to establish a reliable difference between S. epidermis and S. aureus.
From that, one might conclude that these two related species cannot be discriminated using our approach. But the final model that aggregated all obtained data and was built without cross-validation demonstrated an ability to discriminate between these 2 species with 100% accuracy. This ability is visually represented as the projection of the spectral data onto the linear discriminant (the separating axis in the PCA-LDA method; fig. 3)
This projection is basically a result of multiplication of each spectrum by a coefficient vector: if a preprocessed spectrum is a vector (a set of values), then the linear discriminant is a result of a linear combination. Model tuning is all about the choice of optimal coefficients. Their values for the obtained model are presented as a graph in fig. 4.
The visual representation of the coefficients serves to roughly interpret the obtained model: the higher is the coefficient expressed in absolute figures, the more significant is the corresponding spectral range. Higher coefficient values expressed in absolute figures (with due account of preprocessing and computation of the derivative) in the zone of Wavenumber (cm-1) negative coefficients mean that the spectrum is not generated by S. aureus, and vice versa.
To estimate the feasibility of the proposed approach for clinical microbiology, we studied the ability of our model to identify a target pathogen in mixed cultures. For the analysis, we used 2 methicillin-resistant and 3 methicillin-sensitive S. aureus strains. The microorganisms were cultured on blood agar plates for 24 or 48 hours. Then equal volumes of S. aureus and gram-negative bacteria (E. coli, K. pneumonia spectroscopy (accuracy of 97.8%) and FTIR spectroscopy (accuracy of 96.2%), using a number of pathogenic and nonpathogenic bacteria: P. aeruginosa, P. putida, E. coli, E. faecium, Streptomyces lividans, B. subtilis, B. cereus, as well as baker’s yeast Saccharomyces cerevisiae. The last work from the list describes a method for the rapid identification of bacterial microcolonies of 50 to 300 µm in diameter using the state-of-the-art IR-BioTyper spectrometer (Bruker): the colonies are automatically transferred from the agarized culture medium or P. aeruginosa) cultured on Endo agar were combined. The concentration of bacterial cells per unit volume was not measured. All mixed samples were fixed in alcohol and fed to the final model for identification (see the table).
In all spectra except 2 representing one and the same sample, the presence of a target pathogen was predicted with high probability, which indicates that the model is reliable and can be used in clinical practice.
Prediction of methicillin-resistance phenotype in S. aureus isolates
Prediction of an methicillin-resistance phenotype in ic Staphylococci is a serious challenge faced by clinical microbiology. In this work we attempted to predict the MRSA/ MSSA phenotypes in S. aureus isolates based on their FTIR spectra. The classification model was constructed in the same fashion, i.e. using PCA and LDA in succession followed by cross-validation.
We failed to achieve the same quality of predictions as with S. aureus. The accuracy of the model evaluated by cross- validation was 73%. The projection onto the linear discriminant is shown in fig. 5. Discrimination here was much worse than for S. aureus. Still, 80% of the spectra were identified accurately. This observation leads us to hypothesize a larger size of the training sample could raise the reliability of the identification to the acceptable level.
The first reports of FTIR application for the identification of microorganisms were published in 1991 . The research works that followed were dedicated to the identification of bacteria, such as lactobacilli and agents of foodborne infections, in the environment [13, 14]. A few studies demonstrated that FTIR can be used to identify Mycobacteria and Listeria [15–17]. In 2011 with the arrival of commercially available spectrometers by Bruker (Germany) and Perkin-Elmer (USA) that reliably identified microorganisms from their FTIR spectra the number of publications on the use of FTIR in microbiology started to grow [18–20]. Research groups were formed outside Germany in Poland , the UK [22, 23] and the Netherlands . The Dutch researchers were the first to attempt to identify the causative agents of sepsis in humans and to compare spectral resolutions of different vibrational spectroscopy techniques, including FTIR spectroscopy, Raman spectroscopy, and surface-enhanced Raman spectroscopy (SERS). The authors concluded that FTIR and Raman spectroscopies produced reliable results but were not as sensitive as SERS. In turn, although SERS proved to be a very sensitive technique, its reproducibility was poor.
Recently, a lot of research works have been published on the use of FTIR in clinical microbiology [25–28]. The first work listed here compares spectral resolutions of vibrational spectroscopy techniques, including SERS (accuracy of 74.9%), Raman spectroscopy (accuracy of 97.8%) and FTIR spectroscopy (accuracy of 96.2%), using a number of pathogenic and nonpathogenic bacteria: P. aeruginosa, P. putida, E. coli, E. faecium, Streptomyces lividans, B. subtilis, B. cereus, as well as baker’s yeast Saccharomyces cerevisiae. The last work from the list describes a method for the rapid identification of bacterial microcolonies of 50 to 300 μm in diameter using the state-of-the-art IR-BioTyper spectrometer (Bruker): the colonies are automatically transferred from the agarized culture medium to the CaF2 surface; the principal component analysis applied to the obtained spectral data is performed by an artificial neural network (ANN) accessible via the Bruker server.
The findings of those studies suggest that FTIR spectra comprehensively describe the chemical composition of cells, including biopolymers that are building blocks for cell walls and membranes, intracellular DNA, phospholipids, sugars, etc. and therefore ensure a) the reliable discrimination between pathogenic bacterial species; b) the accurate identification of microorganisms at the species level; c) the identification of a phylum the studied isolate belongs to using digital libraries of microbial spectra. Platforms for rapid testing based solely on IR spectroscopy data could provide a quick solution to these three tasks and assist in optimizing the treatment strategy and adapting it to an individual patient in order to avoid prescription of antibiotics ineffective against the causative pathogen. However, the transition of this approach from the research lab to the clinical setting is obstructed by the absence of an algorithm for automated analysis of microbial FTIR spectra. Such algorithm is expected to identify those components of the spectrum that are determined by the genotype of the strain and not by culturing conditions, such as the growth medium composition, the growth phase, the degree of culture degradation, etc. In all works referred to above the authors sought to standardize culturing conditions, which is quite difficult to achieve in the real clinical setting and is also time consuming.
Such algorithm is proposed in the present work. It allows identification of bacterial species regardless of the growth phase and growth medium composition. We cultured a number of bacterial isolates of S. aureus, E. faecalis, E. faecium, K. pneumoniae, E. coli, S. marcescens, E. cloacae, A. baumannii, P. aeruginosa, S. epidermidis and C. albicans in different media and for different time periods. Using PCA, we identified the most informative regions of microbial FTIR spectra. The result of the analysis was represented as a system of coefficients that facilitated quick identification of new isolates from their FTIR spectra. The accuracy of the proposed method was assessed by the blind test using pure cultures of S. aureus isolates and their paired mixes with P. aeruginosa, E. coli and K. pneumoniae.
The obtained results demonstrate that the proposed algorithm for the analysis of microbial FTIR spectra reliably identifies the presence of S. aureus in the culture regardless of the duration of culturing (24 or 48 hours) after being trained on the sample of 11 pathogens representing different phyla (bacteria and ascomycete yeasts). All samples were inactivated in 70% ethanol before their spectra were recorded. This makes manipulations with virulent pathogens safer and stabilizes the samples until further analysis. The presence of whole blood and admixtures of other microorganisms (gram-negative E. coli, K. pneumoniae and P. aeruginosa) in the sample at concentrations more or less equal to the concentration of S. aureus does not affect the ability of the proposed algorithm to identify the pathogen of interest. The model predicts the presence of a methicillin-resistant phenotype (MRSA/MSSA) with 80% accuracy. We hope that our algorithm will be capable of identifying any other pathogen cultured in any media after expanding the training set.
We have described a method for creating a database of microbial FTIR spectra and a comparison algorithm suitable for the identification of pathogenic microorganisms that discriminates between the species regardless of the culture growth phase or medium composition. This algorithm can be used in combination with the standard and affordable spectrometer Spectrum Two (Perkin-Elmer; USA). We have tested out algorithm on the clinical isolates of S. aureus, which were reliably discriminated from other causative agents of infections, including E. faecalis, E. faecium, K. pneumoniae, E. coli, S. marcescens, E. cloacae, A. baumannii, P. aeruginosa, and C. albicans, taken as pure cultures and pair mixes.