Remove chimera

Context

Chimeras are artifact sequences formed by two or more biological sequences incorrectly joined together. This often occurs during PCR reactions using mixed templates (i.e., uncultured environmental samples). Incomplete extensions during PCR allow subsequent PCR cycles to use a partially extended strand to bind to the template of a different, but similar, sequence. This partially extended strand then acts as a primer to extend and form a chimeric sequence. Once created, the chimeric sequence is then further amplified in subsequent cycles. The end result is a PCR artifact that does not represent a sequence that exists in nature. This phenomena is particularly common in amplicon sequencing where closely related sequences are amplified.

How it does

This tool removes chimeric sequences by sample.

De novo detection: In this algorithm, a chimera-free reference database is automatically generated for each NGS data. Initially, the reference database is empty. Then, NGS reads are considered in the order of decreasing abundance. If a sequence is classified as chimeric, it is discarded; otherwise, it is added to the reference database (so the size of the reference database grows). Candidate parents (PCR templates, strains A and B in the previous figure) are required to have more abundance than that of the query sequence, on the assumption that a chimera has undergone fewer rounds of amplification and will, therefore, be less abundant than its parents (Edgar et al., 2011).

The chimera detection is performed with VSEARCH combined with a homemade strategy. The chimera detection is performed sample by sample and a cross-validation is performed to remove only chimera identified in all samples where they are present.

Configuration: Short reads (16S V3V4 use cases)

The chimera detection is performed with vsearch.

mkdir FROGS/SWARM/
sbatch -J chimera -o LOGS/chimera.out -e LOGS/chimera.err -c 8 --export=ALL --wrap="module load devel/Miniforge/Miniforge3 && module load bioinfo/FROGS/FROGS-v5.0.2 && remove_chimera.py --input-fasta FROGS/denoising_swarm.fasta --input-biom FROGS/denoising_swarm.biom --output-fasta FROGS/SWARM/remove_chimera.fasta --nb-cpus 8 --log-file FROGS/SWARM/remove_chimera.log --output-biom FROGS/SWARM/remove_chimera.biom --html FROGS/SWARM/remove_chimera.html && module unload bioinfo/FROGS/FROGS-v5.0.2"
(to see all settings: remove_chimera.py --help)



Interpretation: Short reads (16S V3V4 use cases)

Let look at the HTML file to see the result of remove chimera.
You have three panels: Summary, Cluster distribution, and Sample distribution. Here, we will focus primarily on the Summary panel.

Since the Clusters distribution and Samples distribution panels are common to multiple tools, a more detailed interpretation with vizualisation of these panels can be found in the Cluster Stat section .


These graphs illustrate the number of chimeric clusters and sequences.
  • ~10% of sequences (26.3% of clusters) are chimeric
  • Chimeric clusters are made of few sequences

Chimera detection by sample


You can see details on chimeric clusters and sequences within each sample. You can sort the columns by clicking on result cluster statistics.

Configuration: Short reads (ITS use cases)

The chimera detection is performed with vsearch.

mkdir FROGS/ITS/
sbatch -J chimera -o LOGS/chimera.out -e LOGS/chimera.err -c 8 --export=ALL --wrap="module load devel/Miniforge/Miniforge3 && module load bioinfo/FROGS/FROGS-v5.0.2 && remove_chimera.py --input-fasta FROGS/denoising_its.fasta --input-biom FROGS/denoising_its.biom --output-fasta FROGS/ITS/remove_chimera.fasta --nb-cpus 8 --log-file FROGS/ITS/remove_chimera.log --output-biom FROGS/ITS/remove_chimera.biom --html FROGS/ITS/remove_chimera.html && module unload bioinfo/FROGS/FROGS-v5.0.2"
(to see all settings: remove_chimera.py --help)



Interpretation: Short reads (ITS use cases)

Let look at the HTML file to see the result of remove chimera.
You have three panels: Summary, Cluster distribution, and Sample distribution. Here, we will focus primarily on the Summary panel.

Since the Clusters distribution and Samples distribution panels are common to multiple tools, a more detailed interpretation with vizualisation of these panels can be found in the Cluster Stat section .


These graphs illustrate the number of chimeric clusters and sequences.
  • ~1% of sequences (1.5% of clusters) are chimeric

Chimera detection by sample


You can see details on chimeric clusters and sequences within each sample. You can sort the columns by clicking on result cluster statistics.