PICRUSt2 estimation of pathway abundances

Context

frogsfunc_pathways.py is a pathway inference tool that predicts the presence and abundance of metabolic pathways in microbial communities based on gene family abundances. It uses input function abundance tables from FROGS functional predictions and maps gene families to pathways. The tool can also compute stratified contributions per sequence or per sample and supports multiple hierarchy levels for pathway classification.

How it does

The program takes as input the unstratified function abundance table from frogsfunc_functions.py, along with mapping files that associate gene families to pathways. If the --per-sequence-contrib option is used, it additionally requires tables of sequence abundances and function assignments per sequence. The tool aggregates gene family abundances into pathway abundances, optionally normalizes them (CPM), and can generate stratified outputs to show contributions of individual sequences or predicted genomes. Hierarchy ranks allow the organization of pathways from broad to specific categories.

Command lines


      usage: frogsfunc_pathways.py [-h] [--version] [--debug]
                                  [--per-sequence-contrib] --input-file INPUT_FILE
                                  [--map MAP]
                                  [--per-sequence-abun PER_SEQUENCE_ABUN]
                                  [--per-sequence-function PER_SEQUENCE_FUNCTION]
                                  [--hierarchy-ranks [HIERARCHY_RANKS [HIERARCHY_RANKS ...]]]
                                  [--normalisation]
                                  [--output-pathways-abund OUTPUT_PATHWAYS_ABUND]
                                  [--output-pathways-contrib OUTPUT_PATHWAYS_CONTRIB]
                                  [--output-pathways-predictions OUTPUT_PATHWAYS_PREDICTIONS]
                                  [--output-pathways-abund-per-seq OUTPUT_PATHWAYS_ABUND_PER_SEQ]
                                  [--log-file LOG_FILE] [--html HTML]

      Infer the presence and abundances of pathways based on gene family abundances
      in a sample.

      optional arguments:
        -h, --help            show this help message and exit
        --version             show program's version number and exit
        --debug               Keep temporary files to debug program. [Default:
                              False]
        --per-sequence-contrib
                              If stratified option is activated, a new table is
                              built. It will contain the abundances of each function
                              of each ASV in each sample. (in contrast to the
                              default stratified output, which is the contribution
                              to the community-wide pathway abundances.) Options
                              --per-sequence-abun and --per-sequence-function need
                              to be set when this option is used. [Default: False]

      Inputs:
        --input-file INPUT_FILE
                              Input TSV function abundances table from
                              FROGSFUNC_step3_function (unstratified table :
                              frogsfunc_functions_unstrat.tsv).
        --map MAP             File required if you are not analyzing 16S sequences
                              with the Metacyc ("EC" function in the previous step)
                              database. IF MARKER STUDYED STILL 16S: it must
                              indicate the path to the PICRUSt2 KEGG pathways
                              mapfile, if you chose "KO" in the previous step (the
                              mapfile is available here : $PICRUSt2_PATH/default_fil
                              es/pathway_mapfiles/KEGG_pathways_to_KO.tsv) IF MARKER
                              STUDYED IS ITS OR 18S: Path to mapping file of
                              pathways to fungi reactions (the mapfile is available
                              here : $PICRUSt2_PATH/default_files/pathway_mapfiles/m
                              etacyc_path2rxn_struc_filt_fungi.txt ).
        --per-sequence-abun PER_SEQUENCE_ABUN
                              Path to table of sequence abundances across samples
                              normalized by marker copy number (typically the
                              normalized sequence abundance table output at the
                              metagenome pipeline step:
                              frogsfunc_functions_marker_norm.tsv by default). This
                              input is required when the --per-sequence-contrib
                              option is set. [Default: None]
        --per-sequence-function PER_SEQUENCE_FUNCTION
                              Path to table of function abundances per sequence,
                              which was outputted at the hidden-state prediction
                              step (frogsfunc_copynumbers_predicted_functions.tsv by
                              default). This input is required when the --per-
                              sequence-contrib option is set. Note that this file
                              should be the same input table as used for the
                              metagenome pipeline step [Default: None]
        --hierarchy-ranks [HIERARCHY_RANKS [HIERARCHY_RANKS ...]]
                              The ordered ranks levels used in the metadata
                              hierarchy pathways. [Default: ['Level1', 'Level2',
                              'Level3', 'Pathway']]
        --normalisation       To normalise data after analysis. Values are divided
                              by sum of columns , then multiplied by 10^6 (CPM
                              values). [Default: False]

      Outputs:
        --output-pathways-abund OUTPUT_PATHWAYS_ABUND
                              Pathway abundance file output. [Default:
                              frogsfunc_pathways_unstrat.tsv]
        --output-pathways-contrib OUTPUT_PATHWAYS_CONTRIB
                              Stratified output corresponding to contribution of
                              predicted gene family abundances within each predicted
                              genome. [Default: None]
        --output-pathways-predictions OUTPUT_PATHWAYS_PREDICTIONS
                              Stratified output corresponding to contribution of
                              predicted gene family abundances within each predicted
                              genome. [Default: None]
        --output-pathways-abund-per-seq OUTPUT_PATHWAYS_ABUND_PER_SEQ
                              Pathway abundance file output per sequences (if --per-
                              sequence-contrib set). [Default: None]
        --log-file LOG_FILE   This output file will contain several information on
                              executed commands. [Default: stdout]
        --html HTML           Path to store resulting html file. [Default:
                              frogsfunc_pathways_summary.html]
        

Exemple of command line:

frogsfunc_pathways.py \
--input-file frogsfunc_functions_unstrat.tsv \
--map KEGG_pathways_to_KO.tsv \
--per-sequence-abun frogsfunc_functions_marker_norm.tsv \
--per-sequence-function frogsfunc_copynumbers_predicted_functions.tsv \
--hierarchy-ranks Level1,Level2,Level3,Pathway \
--normalisation \
--output-pathways-abund frogsfunc_pathways_unstrat.tsv \
--output-pathways-contrib frogsfunc_pathways_contrib.tsv \
--output-pathways-predictions frogsfunc_pathways_predictions.tsv \
--output-pathways-abund-per-seq frogsfunc_pathways_abund_per_seq.tsv \
--html frogsfunc_pathways_summary.html --log-file frogsfunc_pathways.log
        

Outputs

Pathway abundance file (--output-pathways-abund): predicted pathway abundances per sample.
Contribution file (--output-pathways-contrib): stratified contributions of gene family abundances within predicted genomes.
Predictions file (--output-pathways-predictions): alternative stratified output of gene family contributions.
Abundance per sequence (--output-pathways-abund-per-seq): pathway abundances for each sequence when per-sequence contributions are calculated.
HTML report (--html): summary of predicted pathways and metrics.
Log file (--log-file): records executed commands and processing steps.