PICRUSt2 estimation of pathway abundances
Context
frogsfunc_pathways.py is a pathway inference tool that predicts the presence and abundance of metabolic pathways in microbial communities based on gene family abundances. It uses input function abundance tables from FROGS functional predictions and maps gene families to pathways. The tool can also compute stratified contributions per sequence or per sample and supports multiple hierarchy levels for pathway classification.
How it does
The program takes as input the unstratified function abundance table from frogsfunc_functions.py, along with mapping files that associate gene families to pathways. If the --per-sequence-contrib option is used, it additionally requires tables of sequence abundances and function assignments per sequence. The tool aggregates gene family abundances into pathway abundances, optionally normalizes them (CPM), and can generate stratified outputs to show contributions of individual sequences or predicted genomes. Hierarchy ranks allow the organization of pathways from broad to specific categories.
Command lines
usage: frogsfunc_pathways.py [-h] [--version] [--debug]
[--per-sequence-contrib] --input-file INPUT_FILE
[--map MAP]
[--per-sequence-abun PER_SEQUENCE_ABUN]
[--per-sequence-function PER_SEQUENCE_FUNCTION]
[--hierarchy-ranks [HIERARCHY_RANKS [HIERARCHY_RANKS ...]]]
[--normalisation]
[--output-pathways-abund OUTPUT_PATHWAYS_ABUND]
[--output-pathways-contrib OUTPUT_PATHWAYS_CONTRIB]
[--output-pathways-predictions OUTPUT_PATHWAYS_PREDICTIONS]
[--output-pathways-abund-per-seq OUTPUT_PATHWAYS_ABUND_PER_SEQ]
[--log-file LOG_FILE] [--html HTML]
Infer the presence and abundances of pathways based on gene family abundances
in a sample.
optional arguments:
-h, --help show this help message and exit
--version show program's version number and exit
--debug Keep temporary files to debug program. [Default:
False]
--per-sequence-contrib
If stratified option is activated, a new table is
built. It will contain the abundances of each function
of each ASV in each sample. (in contrast to the
default stratified output, which is the contribution
to the community-wide pathway abundances.) Options
--per-sequence-abun and --per-sequence-function need
to be set when this option is used. [Default: False]
Inputs:
--input-file INPUT_FILE
Input TSV function abundances table from
FROGSFUNC_step3_function (unstratified table :
frogsfunc_functions_unstrat.tsv).
--map MAP File required if you are not analyzing 16S sequences
with the Metacyc ("EC" function in the previous step)
database. IF MARKER STUDYED STILL 16S: it must
indicate the path to the PICRUSt2 KEGG pathways
mapfile, if you chose "KO" in the previous step (the
mapfile is available here : $PICRUSt2_PATH/default_fil
es/pathway_mapfiles/KEGG_pathways_to_KO.tsv) IF MARKER
STUDYED IS ITS OR 18S: Path to mapping file of
pathways to fungi reactions (the mapfile is available
here : $PICRUSt2_PATH/default_files/pathway_mapfiles/m
etacyc_path2rxn_struc_filt_fungi.txt ).
--per-sequence-abun PER_SEQUENCE_ABUN
Path to table of sequence abundances across samples
normalized by marker copy number (typically the
normalized sequence abundance table output at the
metagenome pipeline step:
frogsfunc_functions_marker_norm.tsv by default). This
input is required when the --per-sequence-contrib
option is set. [Default: None]
--per-sequence-function PER_SEQUENCE_FUNCTION
Path to table of function abundances per sequence,
which was outputted at the hidden-state prediction
step (frogsfunc_copynumbers_predicted_functions.tsv by
default). This input is required when the --per-
sequence-contrib option is set. Note that this file
should be the same input table as used for the
metagenome pipeline step [Default: None]
--hierarchy-ranks [HIERARCHY_RANKS [HIERARCHY_RANKS ...]]
The ordered ranks levels used in the metadata
hierarchy pathways. [Default: ['Level1', 'Level2',
'Level3', 'Pathway']]
--normalisation To normalise data after analysis. Values are divided
by sum of columns , then multiplied by 10^6 (CPM
values). [Default: False]
Outputs:
--output-pathways-abund OUTPUT_PATHWAYS_ABUND
Pathway abundance file output. [Default:
frogsfunc_pathways_unstrat.tsv]
--output-pathways-contrib OUTPUT_PATHWAYS_CONTRIB
Stratified output corresponding to contribution of
predicted gene family abundances within each predicted
genome. [Default: None]
--output-pathways-predictions OUTPUT_PATHWAYS_PREDICTIONS
Stratified output corresponding to contribution of
predicted gene family abundances within each predicted
genome. [Default: None]
--output-pathways-abund-per-seq OUTPUT_PATHWAYS_ABUND_PER_SEQ
Pathway abundance file output per sequences (if --per-
sequence-contrib set). [Default: None]
--log-file LOG_FILE This output file will contain several information on
executed commands. [Default: stdout]
--html HTML Path to store resulting html file. [Default:
frogsfunc_pathways_summary.html]
Exemple of command line:
frogsfunc_pathways.py \
--input-file frogsfunc_functions_unstrat.tsv \
--map KEGG_pathways_to_KO.tsv \
--per-sequence-abun frogsfunc_functions_marker_norm.tsv \
--per-sequence-function frogsfunc_copynumbers_predicted_functions.tsv \
--hierarchy-ranks Level1,Level2,Level3,Pathway \
--normalisation \
--output-pathways-abund frogsfunc_pathways_unstrat.tsv \
--output-pathways-contrib frogsfunc_pathways_contrib.tsv \
--output-pathways-predictions frogsfunc_pathways_predictions.tsv \
--output-pathways-abund-per-seq frogsfunc_pathways_abund_per_seq.tsv \
--html frogsfunc_pathways_summary.html --log-file frogsfunc_pathways.log
Outputs
Pathway abundance file (--output-pathways-abund): predicted pathway abundances per sample.
Contribution file (--output-pathways-contrib): stratified contributions of gene family abundances within predicted genomes.
Predictions file (--output-pathways-predictions): alternative stratified output of gene family contributions.
Abundance per sequence (--output-pathways-abund-per-seq): pathway abundances for each sequence when per-sequence contributions are calculated.
HTML report (--html): summary of predicted pathways and metrics.
Log file (--log-file): records executed commands and processing steps.