Skip to content

The abundance of poorly aligned input sequences is too high. #414

@WaterDrop-EarthDivision

Description

@WaterDrop-EarthDivision

Hello,

Thank you for your contribution to the open-source community.

When I was running Picrust2(16s,Mouse feces), I received the prompt "This is the set of poorly aligned input sequences to be excluded(136/858)". Then, I asked ChatGPT and searched through the issues #122 to look for the reason why this situation occurred before. Unfortunately, I was unable to find the cause of this problem. Do you have the time to help me find out the reason for this issue?

#!/bin/bash
set -e  

CONDA_BASE=$(conda info --base)
source "${CONDA_BASE}/etc/profile.d/conda.sh"
conda activate picrust2

INPUT_TABLE=~/M6A/16s/Picrust2/data/ASV_table.txt
INPUT_FA=~/M6A/16s/Picrust2/data/ASV_final.fa

OUT_DIR=~/M6A/16s/Picrust2/
mkdir -p ${OUT_DIR}

echo ">>> Running PICRUSt2 pipeline..."

picrust2_pipeline.py \
-s ${INPUT_FA} \
-i ${INPUT_TABLE} \
-o ${OUT_DIR}/picrust2_out \
-p 4

echo ">>> PICRUSt2 finished!"
Image

What I did before:

  1. Use dada2 (denoising, merging of both ends, constructing sequence table) and silva (nr99, v138.2) for annotation.
  2. Conducted QC on samples and ASVs and removed samples and CSV files that failed the QC.
  3. Converted the ASV_table into a \t-separated txt file (at this point, ASVs are not continuous, but they are corresponding and uniquely associated with the ASVs in the fa file).
Image Image

What I did afterwards:
1.The length distributions of both the ASV-removed group(orange) and the non-ASV-removed(blue) group were plotted.

Image Image
  1. The abundance distribution of both the ASV with and without removal was plotted.
Image
  1. The distribution of species before and after the removal of ASV was statistically analyzed.
> > table(tax$Excluded, tax$Phylum)
       
        Acidobacteriota Actinomycetota Bacillota Bacteroidota Campylobacterota Cyanobacteriota Patescibacteria
  FALSE               3             35       589           58                3               2               3
  TRUE                0              0         3          130                0               0               0
       
        Pseudomonadota Spirochaetota Thermodesulfobacteriota Verrucomicrobiota
  FALSE             25             1                       2                 1
  TRUE               2             0                       0                 0
> table(tax$Excluded, tax$Genus)
       
        [Acetivibrio] ethanolgignens group [Clostridium] innocuum group [Clostridium] methylpentosum group
  FALSE                                  1                            1                                  2
  TRUE                                   0                            0                                  0
       
        [Eubacterium] brachy group [Eubacterium] ventriosum group [Eubacterium] xylanophilum group
  FALSE                          4                              1                                1
  TRUE                           0                              0                                0
       
        [Ruminococcus] gauvreauii group A2 Acetatifactor Acutalibacter Adlercreutzia Aeromicrobium Agathobaculum
  FALSE                               1  8             3             5            15             1             4
  TRUE                                0  0             0             0             0             0             0
       
        Agrobacterium Akkermansia Alistipes Allobaculum Anaeroplasma Anaerotignum Anaerotruncus Anaerovorax Angelakisella
  FALSE             1           1        20          12            1            1             5           1             2
  TRUE              0           0         0           0            0            0             0           0             0
       
        Asaccharobacter ASF356 Bacteroides Bifidobacterium Blautia Brachyspira Brevundimonas Brucella Bryobacter
  FALSE               3      5           0               3       3           1             4        2          1
  TRUE                0      0           6               0       0           0             0        0          0
       
        Butyribacter Butyricimonas Candidatus Saccharimonas Candidatus Soleaferrea Cellulomonas Cellulosimicrobium
  FALSE            1             2                        3                      1            1                  1
  TRUE             0             0                        0                      0            0                  0
       
        Christensenellaceae R-7 group Chryseobacterium Citricoccus Cloacibacterium Clostridium Colidextribacter
  FALSE                             1                1           1               1           4               17
  TRUE                              0                0           0               0           0                0
       
        Coriobacteriaceae UCG-002 Cutibacterium Defluviitaleaceae UCG-011 Demequina Devosia Dubosiella Enterocloster
  FALSE                         2             1                         1         1       1         50             2
  TRUE                          0             0                         0         0       0          0             0
       
        Extibacter Faecalibaculum Faecalimonas Falsochrobactrum Family XIII AD3011 group Flavobacterium Frisingicoccus
  FALSE          1              6            3                2                        1              1              1
  TRUE           0              2            0                0                        0              0              0
       
        GCA-900066575 Gordonia Harryflintia Helicobacter Hirschia Holdemania Ileibacterium Intestinimonas
  FALSE            16        1           10            3        1          1            10             23
  TRUE              0        0            0            0        0          0             0              0
       
        Lachnoclostridium Lachnospiraceae AC2044 group Lachnospiraceae FCS020 group Lachnospiraceae NK4A136 group
  FALSE                23                            1                           14                            55
  TRUE                  0                            0                            0                             0
       
        Lachnospiraceae UCG-006 Lachnospiraceae UCG-008 Leucobacter Ligilactobacillus Marvinbryantia Massiliomicrobiota
  FALSE                       7                       1           2                 1              1                  1
  TRUE                        0                       0           0                 0              0                  0
       
        Mediterraneibacter Methylorubrum Monoglobus Muribaculum Nakamurella Negativibacillus Neorhizobium NK4A214 group
  FALSE                  2             1          1           0           1                1            1             1
  TRUE                   0             0          0           4           0                0            0             0
       
        Nocardioides Odoribacter Oscillibacter Paenibacillus Paenochrobactrum Paludicola Parabacteroides Pedobacter
  FALSE            1           2             8             1                1          4               0          1
  TRUE             0           4             0             0                0          0               5          0
       
        Prevotellaceae Ga6A1 group Prevotellaceae NK3B31 group Prevotellaceae UCG-001 Quinella Rikenella
  FALSE                          0                           0                      1        1         1
  TRUE                           1                           1                      4        0         0
       
        Rikenellaceae RC9 gut group Romboutsia Roseburia Ruminococcus Ruthenibacterium Shinella Sphingomonas Sphingopyxis
  FALSE                           0          2        21            4                2        1            2            1
  TRUE                            1          0         0            0                0        0            0            0
       
        Thomasclavelia Turicimonas UCG-005 UCG-009 Zag_111
  FALSE              1           1       3       1       2
  TRUE               0           0       0       0       0
  1. Using BLASTN to examine the three removed ASVs, it was found that corresponding bacterial species could be identified for each of them.
Image Image Image

Some of my thoughts:

  1. Since there are issues with some of the ASV sequences here and the problematic ASVs can be found in NCBI, it cannot be concluded that the problem is caused by complementary sequences, can it?
  2. Could it be due to insufficient support for the Bacteroidota branch?

Looking forward to your reply.

Best wishes,
Chenglin.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions