Affiliation postprocessing

Context

This is a post-processing tool used in microbial community analysis pipelines (e.g., after ASV detection and taxonomic affiliation). Its main purpose is to refine taxonomic affiliations by: Detecting and resolving nested amplicons (ASVs fully contained within longer sequences). Aggregating redundant ASVs that have identical or very similar affiliations, avoiding overrepresentation of biologically similar sequences. This is especially important when analyzing 16S rRNA gene data, where overlapping amplicons or highly similar ASVs can artificially inflate diversity.

How it does

It compares ASVs based on :
  • Sequence identity
  • Alignment coverage
If two or more ASVs meet these thresholds and share the same taxonomic information, they are aggregated into a single representative ASV.

Configuration: 16S V3V4 Swarm

Refine affiliations, to manage amplicon included in other sequence, and to deal with surnumerary ASV (ASV with same affiliations).

sbatch -J a_postprocess -o LOGS/affiliation_postprocess.out -e LOGS/affiliation_postprocess.err -c 8 --export=ALL --wrap="module load devel/Miniforge/Miniforge3 && module load bioinfo/FROGS/FROGS-v5.0.2 && affiliation_postprocess.py --input-fasta FROGS/SWARM/filters.fasta --input-biom FROGS/SWARM/affiliation.biom --log-file FROGS/SWARM/affiliation_postprocess.log --output-biom FROGS/SWARM/affiliation_postprocess.biom --output-fasta FROGS/SWARM/affiliation_postprocess.fasta --identity 98 --coverage 98 && module unload bioinfo/FROGS/FROGS-v5.0.2"
(to see all settings: affiliation_postprocess.py --help)