PICRUSt2 estimation of function abundances

Context

frogsfunc_functions.py is a functional profiling tool that predicts per-sample microbial community functions based on ASV sequences, marker genes, and phylogenetic placement. It supports multiple marker types (16S, ITS, 18S) and can use different functional databases (EC, KO, COG, PFAM, TIGRFAM, PHENO). The tool estimates the abundance of functions, normalizes ASV abundances by marker copy numbers, and stratifies contributions for detailed community-wide functional analysis.

How it does

The program takes as input a BIOM file of ASV abundances, a FASTA file of ASV sequences, a phylogenetic tree containing ASVs and reference sequences, and a marker table describing predicted gene copy numbers. For 16S, functions are predicted using selected databases (e.g., EC, KO). For ITS/18S, it can use directly observed function tables. The HSP method (e.g., max parsimony, subtree averaging) is applied to infer functions along the tree. NSTI filtering, minimum reads, and sample thresholds remove unreliable ASVs. Outputs include function abundances, normalized ASV abundances, weighted NSTI summaries, stratified contributions, BIOM and FASTA files of filtered ASVs, and logs and HTML reports.

Command lines


      usage: frogsfunc_functions.py [-h] [--version] [--debug] [--nb-cpus NB_CPUS]
                                    [--strat-out] --input-biom INPUT_BIOM
                                    --input-fasta INPUT_FASTA --input-tree
                                    INPUT_TREE --input-marker INPUT_MARKER
                                    --marker-type {16S,ITS,18S}
                                    [--functions FUNCTIONS]
                                    [--input-function-table INPUT_FUNCTION_TABLE]
                                    [--hsp-method {mp,emp_prob,pic,scp,subtree_average}]
                                    [--max-nsti MAX_NSTI]
                                    [--min-blast-ident MIN_BLAST_IDENT]
                                    [--min-blast-cov MIN_BLAST_COV]
                                    [--min-reads INT] [--min-samples INT]
                                    [--output-function-abund OUTPUT_FUNCTION_ABUND]
                                    [--output-asv-norm OUTPUT_ASV_NORM]
                                    [--output-weighted OUTPUT_WEIGHTED]
                                    [--output-contrib OUTPUT_CONTRIB]
                                    [--output-biom OUTPUT_BIOM]
                                    [--output-fasta OUTPUT_FASTA]
                                    [--output-excluded OUTPUT_EXCLUDED]
                                    [--log-file LOG_FILE] [--html HTML]

      Per-sample functional profiles prediction.

      optional arguments:
        -h, --help            show this help message and exit
        --version             show program's version number and exit
        --debug               Keep temporary files to debug program. [Default:
                              False]
        --nb-cpus NB_CPUS     The maximum number of CPUs used. [Default: 1]
        --strat-out           If activated, a new table is built. It will contain
                              the abundances of each function of each ASV in each
                              sample. [Default: False]

      Inputs:
        --input-biom INPUT_BIOM
                              frogsfunc_placeseqs Biom output file
                              (frogsfunc_placeseqs.biom).
        --input-fasta INPUT_FASTA
                              frogsfunc_placeseqs Fasta output file
                              (frogsfunc_placeseqs.fasta).
        --input-tree INPUT_TREE
                              frogsfunc_placeseqs output tree in newick format
                              containing both studied sequences (i.e. ASVs) and
                              reference sequences.
        --input-marker INPUT_MARKER
                              Table of predicted marker gene copy numbers
                              (frogsfunc_placeseqs output : frogsfunc_marker.tsv).
        --marker-type {16S,ITS,18S}
                              Marker gene to be analyzed.
        --hsp-method {mp,emp_prob,pic,scp,subtree_average}
                              HSP method to use. mp: predict discrete traits using
                              max parsimony. emp_prob: predict discrete traits based
                              on empirical state probabilities across tips.
                              subtree_average: predict continuous traits using
                              subtree averaging. pic: predict continuous traits with
                              phylogentic independent contrast. scp: reconstruct
                              continuous traits using squared-change parsimony
                              [Default: mp].
        --max-nsti MAX_NSTI   Sequences with NSTI values above this value will be
                              excluded [Default: 2.0].
        --min-blast-ident MIN_BLAST_IDENT
                              Sequences with blast percentage identity against the
                              PICRUSt2 closest ref above this value will be excluded
                              (between 0 and 1). [Default: None]
        --min-blast-cov MIN_BLAST_COV
                              Sequences with blast percentage coverage against the
                              PICRUSt2 closest ref above this value will be excluded
                              (between 0 and 1). [Default: None]
        --min-reads INT       Minimum number of reads across all samples for each
                              input ASV. ASVs below this cut-off will be counted as
                              part of the "RARE" category in the stratified output.
                              If you choose 1, none ASV will be grouped in “RARE”
                              category. [Default: 1].
        --min-samples INT     Minimum number of samples that an ASV needs to be
                              identfied within. ASVs below this cut-off will be
                              counted as part of the "RARE" category in the
                              stratified output. If you choose 1, none ASV will be
                              grouped in “RARE” category. [Default: 1].

      16S :
        --functions FUNCTIONS
                              Specifies which function databases should be used
                              (EC). Available indices : 'EC', 'KO', 'COG', 'PFAM',
                              'TIGRFAM', 'PHENO'. EC is used by default because
                              necessary for frogsfunc_pathways. At least EC or KO is
                              required. To run the command with several functions,
                              separate the functions with commas (ex: -i EC,PFAM).
                              [Default: EC]

      ITS and 18S :
        --input-function-table INPUT_FUNCTION_TABLE
                              The path to input functions table describing directly
                              observed functions, in tab-delimited format.(ex $PICRU
                              St2_PATH/default_files/fungi/ec_ITS_counts.txt.gz).

      Outputs:
        --output-function-abund OUTPUT_FUNCTION_ABUND
                              Output file for function prediction abundances.
                              [Default: frogsfunc_functions_unstrat.tsv].
        --output-asv-norm OUTPUT_ASV_NORM
                              Output file with asv abundances normalized by marker
                              copies number. [Default:
                              frogsfunc_functions_marker_norm.tsv]
        --output-weighted OUTPUT_WEIGHTED
                              Output file with the mean of nsti value per sample
                              (format: TSV). [Default:
                              frogsfunc_functions_weighted_nsti.tsv]
        --output-contrib OUTPUT_CONTRIB
                              Stratified output that reports asv contributions to
                              community-wide function abundances (ex
                              pred_function_asv_contrib.tsv). [Default: None]
        --output-biom OUTPUT_BIOM
                              Biom file without excluded ASVs (NSTI, blast perc
                              identity or blast perc coverage thresholds). (format:
                              BIOM) [Default: frogsfunc_function.biom]
        --output-fasta OUTPUT_FASTA
                              Fasta file without excluded ASVs (NSTI, blast perc
                              identity or blast perc coverage thresholds). (format:
                              FASTA). [Default: frogsfunc_function.fasta]
        --output-excluded OUTPUT_EXCLUDED
                              List of ASVs with NSTI values above NSTI threshold (
                              --max_NSTI NSTI ).[Default:
                              frogsfunc_functions_excluded.txt]
        --log-file LOG_FILE   List of commands executed. [Default: stdout]
        --html HTML           Path to store resulting html file. [Default:
                              frogsfunc_functions_summary.html]

        

Exemple of command line:

frogsfunc_functions.py \
--input-biom frogsfunc_placeseqs.biom --input-fasta frogsfunc_placeseqs.fasta \
--input-tree frogsfunc_tree.nwk --input-marker frogsfunc_marker.tsv \
--marker-type 16S --functions EC,KO \
--output-function-abund frogsfunc_functions_unstrat.tsv --output-asv-norm frogsfunc_functions_marker_norm.tsv \
--output-weighted frogsfunc_functions_weighted_nsti.tsv --output-contrib pred_function_asv_contrib.tsv \
--output-biom frogsfunc_function.biom --output-fasta frogsfunc_function.fasta \
--output-excluded frogsfunc_functions_excluded.txt --html frogsfunc_functions_summary.html
        

Outputs

Function abundance file (--output-function-abund): predicted functional abundances per sample.
Normalized ASV file (--output-asv-norm): ASV abundances normalized by marker copy number.
Weighted NSTI file (--output-weighted): mean NSTI values per sample.
Contribution file (--output-contrib): ASV contributions to community-wide functions.
BIOM file (--output-biom): filtered ASVs in BIOM format.
FASTA file (--output-fasta): filtered ASV sequences.
Excluded ASVs (--output-excluded): list of ASVs removed due to NSTI or BLAST thresholds.
HTML report (--html): summary of functional profiling and metrics.
Log file (--log-file): records all commands executed and processing steps.