Extracellular vesicles (EVs) have great potential as a source for clinically relevant biomarkers since they can be readily isolated from biofluids and carry microRNA (miRNA), mRNA, and proteins that can reflect disease status. However, the biological and technical variability of EV content is unknown making comparisons between healthy subjects and patients difficult to interpret. In this study, AbbVie Bioresearch Center researchers sought to establish a laboratory and bioinformatics analysis pipeline to analyse the small RNA content within EVs from patient serum that could serve as biomarkers and to assess the biological and technical variability of EV RNA content in healthy individuals. They sequenced EV small RNA from multiple individuals (biological replicates) and sequenced multiple replicates per individual (technical replicates) using the Illumina Truseq protocol. They observed that the replicates of samples clustered by subject indicating that the biological variability (~95%) was greater than the technical variability (~0.50%). They observed that ~30% of the sequencing reads were miRNAs. The researchers evaluated the technical parameters of sequencing by spiking the EV RNA preparation with a mix of synthetic small RNA and demonstrated a disconnect between input concentration of the spike-in RNA and sequencing read frequencies indicating that bias was introduced during library preparation. To determine whether there are differences between library preparation platforms, they compared the Truseq with the Nextflex protocol that had been designed to reduce bias in library preparation. While both methods were technically robust, the Nextflex protocol reduced the bias and exhibited a linear range across input concentrations of the synthetic spike-ins. Altogether, these results indicate that technical variability is much smaller than biological variability supporting the use of EV small RNAs as potential biomarkers. These findings also indicate that the choice of library preparation method leads to artificial differences in the datasets generated invalidating the comparability of sequencing data across library preparation platforms.
Choice of library preparation impacts the sequencing results obtained by small RNA-Seq
(a) Principal component analysis shows the biological and technical variability between samples after sequencing with the Truseq and Nextflex kits. The samples were batch corrected by ComBat to account for batch-specific differences and the image are coloured by donor and kit. (b) The number of unique miRNAs detected is higher in the Nextflex kit when compared to the Truseq Kit (p-value = 0.01). (c) Volcano plots of differential gene expression between the Truseq and Nextflex kits are shown. The plots illustrate the log10 Benjamini–Hochberg corrected p-value vs. the log2 change of transcript abundance. Red indicates miRNAs that differ between kits. (d) Heatmap of the top differential expressed genes using the Nextflex and Truseq kits is shown. The horizontal bar on top depicts the samples from each kit, the colour that represents the specific kit is provided in the legend on the left of the figure. (e) Representative box plots are shown of two differentially expressed miRNA that were detected by either the Nextflex or both the Nextflex and Truseq kits. The y-axis shows the counts per million reads of the two miRNAs as detected by library preparation kit (f) Scatter plot measuring miRNA expression by sequencing (measured in log counts per million, coloured by kit) versus quantitative PCR shows that only Nextflex detected all 10 of the genes detected by qPCR. The Pearson correlation for the Nextflex kit was 0.1 against qPCR while the correlation for Truseq versus qPCR was 0.05.