Hello,
Thank you for your contribution to the open-source community.
When I was running Picrust2(16s,Mouse feces), I received the prompt "This is the set of poorly aligned input sequences to be excluded(136/858)". Then, I asked ChatGPT and searched through the issues #122 to look for the reason why this situation occurred before. Unfortunately, I was unable to find the cause of this problem. Do you have the time to help me find out the reason for this issue?
#!/bin/bash
set -e
CONDA_BASE=$(conda info --base)
source "${CONDA_BASE}/etc/profile.d/conda.sh"
conda activate picrust2
INPUT_TABLE=~/M6A/16s/Picrust2/data/ASV_table.txt
INPUT_FA=~/M6A/16s/Picrust2/data/ASV_final.fa
OUT_DIR=~/M6A/16s/Picrust2/
mkdir -p ${OUT_DIR}
echo ">>> Running PICRUSt2 pipeline..."
picrust2_pipeline.py \
-s ${INPUT_FA} \
-i ${INPUT_TABLE} \
-o ${OUT_DIR}/picrust2_out \
-p 4
echo ">>> PICRUSt2 finished!"
What I did before:
- Use dada2 (denoising, merging of both ends, constructing sequence table) and silva (nr99, v138.2) for annotation.
- Conducted QC on samples and ASVs and removed samples and CSV files that failed the QC.
- Converted the ASV_table into a \t-separated txt file (at this point, ASVs are not continuous, but they are corresponding and uniquely associated with the ASVs in the fa file).
What I did afterwards:
1.The length distributions of both the ASV-removed group(orange) and the non-ASV-removed(blue) group were plotted.
- The abundance distribution of both the ASV with and without removal was plotted.
- The distribution of species before and after the removal of ASV was statistically analyzed.
> > table(tax$Excluded, tax$Phylum)
Acidobacteriota Actinomycetota Bacillota Bacteroidota Campylobacterota Cyanobacteriota Patescibacteria
FALSE 3 35 589 58 3 2 3
TRUE 0 0 3 130 0 0 0
Pseudomonadota Spirochaetota Thermodesulfobacteriota Verrucomicrobiota
FALSE 25 1 2 1
TRUE 2 0 0 0
> table(tax$Excluded, tax$Genus)
[Acetivibrio] ethanolgignens group [Clostridium] innocuum group [Clostridium] methylpentosum group
FALSE 1 1 2
TRUE 0 0 0
[Eubacterium] brachy group [Eubacterium] ventriosum group [Eubacterium] xylanophilum group
FALSE 4 1 1
TRUE 0 0 0
[Ruminococcus] gauvreauii group A2 Acetatifactor Acutalibacter Adlercreutzia Aeromicrobium Agathobaculum
FALSE 1 8 3 5 15 1 4
TRUE 0 0 0 0 0 0 0
Agrobacterium Akkermansia Alistipes Allobaculum Anaeroplasma Anaerotignum Anaerotruncus Anaerovorax Angelakisella
FALSE 1 1 20 12 1 1 5 1 2
TRUE 0 0 0 0 0 0 0 0 0
Asaccharobacter ASF356 Bacteroides Bifidobacterium Blautia Brachyspira Brevundimonas Brucella Bryobacter
FALSE 3 5 0 3 3 1 4 2 1
TRUE 0 0 6 0 0 0 0 0 0
Butyribacter Butyricimonas Candidatus Saccharimonas Candidatus Soleaferrea Cellulomonas Cellulosimicrobium
FALSE 1 2 3 1 1 1
TRUE 0 0 0 0 0 0
Christensenellaceae R-7 group Chryseobacterium Citricoccus Cloacibacterium Clostridium Colidextribacter
FALSE 1 1 1 1 4 17
TRUE 0 0 0 0 0 0
Coriobacteriaceae UCG-002 Cutibacterium Defluviitaleaceae UCG-011 Demequina Devosia Dubosiella Enterocloster
FALSE 2 1 1 1 1 50 2
TRUE 0 0 0 0 0 0 0
Extibacter Faecalibaculum Faecalimonas Falsochrobactrum Family XIII AD3011 group Flavobacterium Frisingicoccus
FALSE 1 6 3 2 1 1 1
TRUE 0 2 0 0 0 0 0
GCA-900066575 Gordonia Harryflintia Helicobacter Hirschia Holdemania Ileibacterium Intestinimonas
FALSE 16 1 10 3 1 1 10 23
TRUE 0 0 0 0 0 0 0 0
Lachnoclostridium Lachnospiraceae AC2044 group Lachnospiraceae FCS020 group Lachnospiraceae NK4A136 group
FALSE 23 1 14 55
TRUE 0 0 0 0
Lachnospiraceae UCG-006 Lachnospiraceae UCG-008 Leucobacter Ligilactobacillus Marvinbryantia Massiliomicrobiota
FALSE 7 1 2 1 1 1
TRUE 0 0 0 0 0 0
Mediterraneibacter Methylorubrum Monoglobus Muribaculum Nakamurella Negativibacillus Neorhizobium NK4A214 group
FALSE 2 1 1 0 1 1 1 1
TRUE 0 0 0 4 0 0 0 0
Nocardioides Odoribacter Oscillibacter Paenibacillus Paenochrobactrum Paludicola Parabacteroides Pedobacter
FALSE 1 2 8 1 1 4 0 1
TRUE 0 4 0 0 0 0 5 0
Prevotellaceae Ga6A1 group Prevotellaceae NK3B31 group Prevotellaceae UCG-001 Quinella Rikenella
FALSE 0 0 1 1 1
TRUE 1 1 4 0 0
Rikenellaceae RC9 gut group Romboutsia Roseburia Ruminococcus Ruthenibacterium Shinella Sphingomonas Sphingopyxis
FALSE 0 2 21 4 2 1 2 1
TRUE 1 0 0 0 0 0 0 0
Thomasclavelia Turicimonas UCG-005 UCG-009 Zag_111
FALSE 1 1 3 1 2
TRUE 0 0 0 0 0
- Using BLASTN to examine the three removed ASVs, it was found that corresponding bacterial species could be identified for each of them.
Some of my thoughts:
- Since there are issues with some of the ASV sequences here and the problematic ASVs can be found in NCBI, it cannot be concluded that the problem is caused by complementary sequences, can it?
- Could it be due to insufficient support for the Bacteroidota branch?
Looking forward to your reply.
Best wishes,
Chenglin.
Hello,
Thank you for your contribution to the open-source community.
When I was running Picrust2(16s,Mouse feces), I received the prompt "This is the set of poorly aligned input sequences to be excluded(136/858)". Then, I asked ChatGPT and searched through the issues #122 to look for the reason why this situation occurred before. Unfortunately, I was unable to find the cause of this problem. Do you have the time to help me find out the reason for this issue?
What I did before:
What I did afterwards:
1.The length distributions of both the ASV-removed group(orange) and the non-ASV-removed(blue) group were plotted.
Some of my thoughts:
Looking forward to your reply.
Best wishes,
Chenglin.