PICRUSt2 estimation of function abundances
Context
frogsfunc_functions.py is a functional profiling tool that predicts per-sample microbial community functions based on ASV sequences,
marker genes, and phylogenetic placement. It supports multiple marker types (16S, ITS, 18S) and can use different functional databases
(EC, KO, COG, PFAM, TIGRFAM, PHENO). The tool estimates the abundance of functions, normalizes ASV abundances by marker copy numbers, and stratifies
contributions for detailed community-wide functional analysis.
How it does
The program takes as input a BIOM file of ASV abundances, a FASTA file of ASV sequences, a phylogenetic tree containing ASVs and reference sequences, and a marker table describing predicted gene copy numbers. For 16S, functions are predicted using selected databases (e.g., EC, KO). For ITS/18S, it can use directly observed function tables. The HSP method (e.g., max parsimony, subtree averaging) is applied to infer functions along the tree. NSTI filtering, minimum reads, and sample thresholds remove unreliable ASVs. Outputs include function abundances, normalized ASV abundances, weighted NSTI summaries, stratified contributions, BIOM and FASTA files of filtered ASVs, and logs and HTML reports.
Command lines
usage: frogsfunc_functions.py [-h] [--version] [--debug] [--nb-cpus NB_CPUS]
[--strat-out] --input-biom INPUT_BIOM
--input-fasta INPUT_FASTA --input-tree
INPUT_TREE --input-marker INPUT_MARKER
--marker-type {16S,ITS,18S}
[--functions FUNCTIONS]
[--input-function-table INPUT_FUNCTION_TABLE]
[--hsp-method {mp,emp_prob,pic,scp,subtree_average}]
[--max-nsti MAX_NSTI]
[--min-blast-ident MIN_BLAST_IDENT]
[--min-blast-cov MIN_BLAST_COV]
[--min-reads INT] [--min-samples INT]
[--output-function-abund OUTPUT_FUNCTION_ABUND]
[--output-asv-norm OUTPUT_ASV_NORM]
[--output-weighted OUTPUT_WEIGHTED]
[--output-contrib OUTPUT_CONTRIB]
[--output-biom OUTPUT_BIOM]
[--output-fasta OUTPUT_FASTA]
[--output-excluded OUTPUT_EXCLUDED]
[--log-file LOG_FILE] [--html HTML]
Per-sample functional profiles prediction.
optional arguments:
-h, --help show this help message and exit
--version show program's version number and exit
--debug Keep temporary files to debug program. [Default:
False]
--nb-cpus NB_CPUS The maximum number of CPUs used. [Default: 1]
--strat-out If activated, a new table is built. It will contain
the abundances of each function of each ASV in each
sample. [Default: False]
Inputs:
--input-biom INPUT_BIOM
frogsfunc_placeseqs Biom output file
(frogsfunc_placeseqs.biom).
--input-fasta INPUT_FASTA
frogsfunc_placeseqs Fasta output file
(frogsfunc_placeseqs.fasta).
--input-tree INPUT_TREE
frogsfunc_placeseqs output tree in newick format
containing both studied sequences (i.e. ASVs) and
reference sequences.
--input-marker INPUT_MARKER
Table of predicted marker gene copy numbers
(frogsfunc_placeseqs output : frogsfunc_marker.tsv).
--marker-type {16S,ITS,18S}
Marker gene to be analyzed.
--hsp-method {mp,emp_prob,pic,scp,subtree_average}
HSP method to use. mp: predict discrete traits using
max parsimony. emp_prob: predict discrete traits based
on empirical state probabilities across tips.
subtree_average: predict continuous traits using
subtree averaging. pic: predict continuous traits with
phylogentic independent contrast. scp: reconstruct
continuous traits using squared-change parsimony
[Default: mp].
--max-nsti MAX_NSTI Sequences with NSTI values above this value will be
excluded [Default: 2.0].
--min-blast-ident MIN_BLAST_IDENT
Sequences with blast percentage identity against the
PICRUSt2 closest ref above this value will be excluded
(between 0 and 1). [Default: None]
--min-blast-cov MIN_BLAST_COV
Sequences with blast percentage coverage against the
PICRUSt2 closest ref above this value will be excluded
(between 0 and 1). [Default: None]
--min-reads INT Minimum number of reads across all samples for each
input ASV. ASVs below this cut-off will be counted as
part of the "RARE" category in the stratified output.
If you choose 1, none ASV will be grouped in “RARE”
category. [Default: 1].
--min-samples INT Minimum number of samples that an ASV needs to be
identfied within. ASVs below this cut-off will be
counted as part of the "RARE" category in the
stratified output. If you choose 1, none ASV will be
grouped in “RARE” category. [Default: 1].
16S :
--functions FUNCTIONS
Specifies which function databases should be used
(EC). Available indices : 'EC', 'KO', 'COG', 'PFAM',
'TIGRFAM', 'PHENO'. EC is used by default because
necessary for frogsfunc_pathways. At least EC or KO is
required. To run the command with several functions,
separate the functions with commas (ex: -i EC,PFAM).
[Default: EC]
ITS and 18S :
--input-function-table INPUT_FUNCTION_TABLE
The path to input functions table describing directly
observed functions, in tab-delimited format.(ex $PICRU
St2_PATH/default_files/fungi/ec_ITS_counts.txt.gz).
Outputs:
--output-function-abund OUTPUT_FUNCTION_ABUND
Output file for function prediction abundances.
[Default: frogsfunc_functions_unstrat.tsv].
--output-asv-norm OUTPUT_ASV_NORM
Output file with asv abundances normalized by marker
copies number. [Default:
frogsfunc_functions_marker_norm.tsv]
--output-weighted OUTPUT_WEIGHTED
Output file with the mean of nsti value per sample
(format: TSV). [Default:
frogsfunc_functions_weighted_nsti.tsv]
--output-contrib OUTPUT_CONTRIB
Stratified output that reports asv contributions to
community-wide function abundances (ex
pred_function_asv_contrib.tsv). [Default: None]
--output-biom OUTPUT_BIOM
Biom file without excluded ASVs (NSTI, blast perc
identity or blast perc coverage thresholds). (format:
BIOM) [Default: frogsfunc_function.biom]
--output-fasta OUTPUT_FASTA
Fasta file without excluded ASVs (NSTI, blast perc
identity or blast perc coverage thresholds). (format:
FASTA). [Default: frogsfunc_function.fasta]
--output-excluded OUTPUT_EXCLUDED
List of ASVs with NSTI values above NSTI threshold (
--max_NSTI NSTI ).[Default:
frogsfunc_functions_excluded.txt]
--log-file LOG_FILE List of commands executed. [Default: stdout]
--html HTML Path to store resulting html file. [Default:
frogsfunc_functions_summary.html]
Exemple of command line:
frogsfunc_functions.py \
--input-biom frogsfunc_placeseqs.biom --input-fasta frogsfunc_placeseqs.fasta \
--input-tree frogsfunc_tree.nwk --input-marker frogsfunc_marker.tsv \
--marker-type 16S --functions EC,KO \
--output-function-abund frogsfunc_functions_unstrat.tsv --output-asv-norm frogsfunc_functions_marker_norm.tsv \
--output-weighted frogsfunc_functions_weighted_nsti.tsv --output-contrib pred_function_asv_contrib.tsv \
--output-biom frogsfunc_function.biom --output-fasta frogsfunc_function.fasta \
--output-excluded frogsfunc_functions_excluded.txt --html frogsfunc_functions_summary.html
Outputs
Function abundance file (--output-function-abund): predicted functional abundances per sample.
Normalized ASV file (--output-asv-norm): ASV abundances normalized by marker copy number.
Weighted NSTI file (--output-weighted): mean NSTI values per sample.
Contribution file (--output-contrib): ASV contributions to community-wide functions.
BIOM file (--output-biom): filtered ASVs in BIOM format.
FASTA file (--output-fasta): filtered ASV sequences.
Excluded ASVs (--output-excluded): list of ASVs removed due to NSTI or BLAST thresholds.
HTML report (--html): summary of functional profiling and metrics.
Log file (--log-file): records all commands executed and processing steps.