A comprehensive web interface for BioPython - providing easy access to all major BioPython modules through a modern, user-friendly interface.
NuGenBioPython is a web application that makes BioPython's extensive bioinformatics toolkit accessible through an intuitive web interface. Whether you're analyzing DNA sequences, working with protein structures, or exploring phylogenetic trees, this tool provides a seamless experience for researchers and students alike.
- Sequence Analysis - DNA/RNA/Protein sequence manipulation and analysis
- Sequence I/O - Parse and convert between multiple file formats
- Sequence Alignment - Pairwise and multiple sequence alignments
- Phylogenetics - Tree parsing, manipulation, and visualization
- Protein Structure - PDB file parsing and structural analysis
- Database Access - NCBI Entrez, PubMed, and GenBank searches
- Motif Analysis - Create and search sequence motifs with PWM
- Restriction Enzymes - Restriction site analysis
- Clustering Analysis - K-means, hierarchical, and DBSCAN clustering
- BLAST Search - Sequence similarity searches
- KEGG Database - Access pathways and metabolic information
- Genome Diagrams - Create visual genomic representations
- Search I/O - Parse BLAST, HMMER search results
- SwissProt/UniProt - Parse protein database records
- Biological Data - Codon tables and reference data
- Population Genetics - GenePop format with statistical analysis
- Pathway Analysis - Build biochemical pathways
- UniGene Analysis - Gene expression clustering
- Hidden Markov Models - Build and train HMMs
- Protein Parameters - Advanced protein analysis with ProtParam
- Graphics & Visualization - Chromosome and comparative graphics
- 60+ Analysis Functions - Comprehensive bioinformatics toolkit
- 10+ File Formats - Support for all major biological data formats
- Real-time Processing - Fast analysis with immediate results
- Interactive Visualizations - Dynamic charts and diagrams
- Example Data - Built-in examples for quick testing
- Export Capabilities - Download results in various formats
- Performance Optimized - GPU-accelerated rendering with smooth scrolling
- Professional UI - Bootstrap 5 with consistent design patterns
- DNA, RNA, and protein sequence analysis
- Composition analysis and GC content calculation
- Molecular weight calculation
- Sequence complement and reverse complement
- Translation (DNA/RNA to protein)
- Interactive examples with real-time results
- Parse multiple sequence file formats (FASTA, GenBank, EMBL, Swiss-Prot, etc.)
- Convert between different sequence formats
- File upload and download functionality
- Support for alignment formats (Clustal, PHYLIP, NEXUS, Stockholm)
- 15+ supported biological data formats
- Pairwise sequence alignment with customizable parameters
- Configurable scoring matrices (match, mismatch, gap penalties)
- Visual alignment display with scoring information
- Real-time parameter adjustment
- Parse phylogenetic trees (Newick, NEXUS, PhyloXML, NeXML)
- Tree visualization with matplotlib
- Tree statistics (terminal count, branch lengths)
- Support for tree manipulation and analysis
- File upload and processing
- Parse PDB and mmCIF structure files
- Analyze protein chains, residues, and atoms
- Structure statistics and information display
- Support for both traditional PDB and modern mmCIF formats
- Structure composition analysis
- Search NCBI databases (PubMed, GenBank, Protein, etc.)
- Customizable search parameters and result limits
- Support for complex Boolean search queries
- Access to literature, sequences, structures, and more
- Email requirement handling
- Search KEGG pathways, genes, enzymes, and compounds
- Access metabolic pathway information for various organisms
- Real-time connection to KEGG REST API with fallback to sample data
- Pathway visualization and detailed entry information
- Support for multiple organisms (human, mouse, rat, yeast, etc.)
- Motif creation from aligned sequences
- Position Weight Matrix (PWM) generation
- Sequence logo visualization with logomaker
- Consensus sequence identification
- Motif searching with threshold scoring
- Interactive threshold adjustment
- 20+ common restriction enzymes
- Cut site identification and positioning
- Fragment size calculation
- Recognition site display
- Fragment size distribution visualization
- Multiple enzyme analysis
- K-means clustering with configurable cluster count
- Hierarchical clustering with dendrogram visualization
- DBSCAN density-based clustering
- Cluster assignment display
- Interactive parameter adjustment
- CSV data matrix input
- Multiple BLAST programs (blastn, blastp, blastx, tblastn, tblastx)
- Database selection (nt, nr, RefSeq, SwissProt, PDB)
- Configurable parameters (E-value, word size, max hits)
- Mock result display with realistic data structure
- Query sequence validation
- Interactive feature addition interface
- Configurable genome length
- Color-coded feature types
- Feature positioning and sizing
- Visual genome representation
- Export functionality with fallback visualization
- Parse BLAST XML and text results
- Parse HMMER search results
- Extract hit information and statistics
- Support for multiple search formats
- Query and hit analysis
- Parse SwissProt and UniProt database files
- Extract protein annotations and features
- Accession number and description parsing
- Feature location and qualifier extraction
- Organism and gene name information
- Access codon tables for different organisms
- Translation with specific genetic codes
- IUPAC data and reference information
- Support for 24+ genetic code tables
- Custom translation parameters
- Parse GenePop format files
- Hardy-Weinberg equilibrium testing
- F-statistics calculation
- Allele frequency analysis
- Population structure analysis
- Build biochemical pathway systems
- Reaction network analysis
- Species source/sink identification
- Pathway topology analysis
- Metabolic network visualization
- Parse UniGene cluster files
- Gene expression data extraction
- Tissue-specific expression analysis
- Protein similarity information
- STS marker integration
🎯 Hidden Markov Models (Bio.HMM) - Advanced Tools
- Build HMM Models - MarkovModelBuilder with multiple model types
- Baum-Welch Training - Unsupervised parameter estimation
- Viterbi Decoding - Optimal state path prediction
- Literature Mining - PubMed/Entrez integration for research articles
- Nexus Format Support - Complete parser with character matrix extraction
- SCOP Classification - Structural Classification of Proteins lookup
- Codon Alignment - Codon-aware sequence alignment with Bio.codonalign
- Model Types - Sequence, Emission, Transition, and Profile HMMs
- Clone the repository:
git clone https://github.com/AnthonyNystrom/NuGenBioPython.git
cd NuGenBioPython- Create virtual environment (recommended):
# Using venv
python -m venv nugenbio-env
source nugenbio-env/bin/activate # On Windows: nugenbio-env\Scripts\activate
# Using conda
conda create -n nugenbio python=3.9
conda activate nugenbio- Install dependencies:
pip install -r requirements.txt- Run the application:
python app.py- Open your browser:
Navigate to
http://localhost:9000
- Python 3.8 or higher
- 2GB RAM minimum (4GB recommended)
- Modern web browser (Chrome, Firefox, Safari, Edge)
- Flask >= 3.0.0
- BioPython >= 1.85
- NumPy >= 1.24.0
- Matplotlib >= 3.6.0
- Scikit-learn >= 1.2.0
- Pillow >= 9.5.0
- Pandas >= 2.0.0
- SciPy >= 1.10.0
- Seaborn >= 0.12.0
- NetworkX >= 2.8
- ReportLab >= 4.0.0
- Logomaker >= 0.8
- Requests >= 2.28.0
- All dependencies listed in
requirements.txt
- Start the application and navigate to the dashboard
- Choose the analysis type you want to perform
- Each module has its own dedicated interface with examples and help
- Enter DNA, RNA, or protein sequences
- Select the appropriate sequence type
- Click "Analyze" to get comprehensive statistics
- Use "Load Example" to try with sample data
- Upload sequence files in various formats
- Convert between different file formats
- Download converted files
- View parsed sequence information
- Input two sequences for pairwise alignment
- Adjust scoring parameters as needed
- View alignment results with visual representation
- Upload tree files or enter tree strings
- Visualize trees with matplotlib
- Get tree statistics and information
- Upload PDB or mmCIF files
- View structure composition and chain information
- Analyze protein architecture
- Search NCBI databases with custom queries
- Use Boolean operators for complex searches
- Browse and analyze search results
- Fetch full records and sequence data
- Create and search sequence motifs with PWM
- Analyze restriction enzyme cut sites
- Perform clustering analysis with multiple algorithms
- Build and train Hidden Markov Models with Baum-Welch algorithm
- Decode sequences with Viterbi algorithm
- Search scientific literature via PubMed integration
- Parse Nexus phylogenetic files with matrix extraction
- Lookup SCOP protein structural classifications
- Perform codon-aware sequence alignments
- Analyze population genetics data
- Create genome diagrams and visualizations
- FASTA (.fasta, .fas, .fa)
- GenBank (.gb, .gbk)
- EMBL (.embl)
- Swiss-Prot (.swiss)
- And many more...
- Clustal (.clustal)
- PHYLIP (.phylip)
- NEXUS (.nexus)
- Stockholm (.stockholm)
- Newick (.nwk, .newick)
- NEXUS (.nex, .nexus)
- PhyloXML (.xml, .phyloxml)
- NeXML (.xml, .nexml)
- PDB (.pdb, .ent)
- mmCIF (.cif, .mmcif)
The application provides RESTful API endpoints for programmatic access:
POST /api/sequence/analyze- Sequence analysisPOST /api/sequence/protparam- Protein parameter analysisPOST /api/sequence/six_frame- Six-frame translationPOST /api/seqio/parse- Parse sequence filesPOST /api/seqio/convert- Convert sequence formatsGET /api/seqio/sample/<file_type>- Download sample filesPOST /api/alignment/pairwise- Pairwise alignmentPOST /api/phylo/parse- Parse phylogenetic treesPOST /api/structure/parse- Parse protein structuresPOST /api/structure/advanced_analysis- Advanced structure analysisGET /api/structure/sample- Download sample PDB file
POST /api/database/entrez_search- Search NCBI databasesPOST /api/database/fetch_record- Fetch full database recordsPOST /api/database/fetch_sequence- Fetch sequence dataPOST /api/kegg/search- KEGG database searchGET /api/kegg/get/<entry_id>- Get KEGG entry detailsGET /api/kegg/pathway/<pathway_id>- Get pathway information
POST /api/motifs/create- Motif creation and PWM generationPOST /api/motifs/search- Motif searching with threshold scoringPOST /api/restriction/analyze- Restriction enzyme analysisGET /api/restriction/list_enzymes- Available restriction enzymesPOST /api/clustering/analyze- Clustering analysisPOST /api/genomediagram/create- Genome diagram creationPOST /api/graphics/chromosome- Chromosome visualizationPOST /api/advanced/hmm/build- HMM model constructionPOST /api/advanced/hmm/train- Baum-Welch trainingPOST /api/advanced/hmm/decode- Viterbi state path decodingPOST /api/advanced/literature/search- PubMed literature searchPOST /api/advanced/nexus/parse- Nexus format parsingPOST /api/advanced/scop/lookup- SCOP classification lookupPOST /api/advanced/codon/align- Codon-aware alignment
POST /api/searchio/parse- Parse BLAST/HMMER search resultsPOST /api/swissprot/parse- Parse SwissProt/UniProt filesGET /api/biodata/codon_tables- Get codon table informationPOST /api/biodata/translate_with_table- Translate with specific codon tablePOST /api/blast/search_real- Real BLAST search capabilityPOST /api/popgen/parse- Population genetics analysisPOST /api/pathway/analyze- Pathway analysisPOST /api/unigene/parse- UniGene analysisPOST /api/advanced/hmm/build- Hidden Markov Model buildingPOST /api/advanced/hmm/train- HMM training with Baum-WelchPOST /api/advanced/hmm/decode- Viterbi decoding
- File uploads are limited to 16MB
- Email addresses are required for NCBI database access (as per NCBI guidelines)
- Uploaded files are automatically cleaned up after processing
- Input validation is performed on all user data
- Backend: Flask 3.0 with RESTful API architecture
- Frontend: Bootstrap 5 responsive design with professional gradient styling
- JavaScript: Interactive components with real-time validation
- Visualization: Matplotlib integration with PNG export and base64 encoding
- Performance: GPU-accelerated rendering with hardware compositing
- Scroll Optimization: Debounced scroll detection with transition disabling
- Accessibility: Respects user's motion preferences (prefers-reduced-motion)
- File Handling: Secure upload (16MB limit) with format validation
- Error Handling: Comprehensive try-catch blocks with graceful degradation
- Fallbacks: Alternative visualizations when optional libraries fail
- Validation: Input sanitization and user-friendly error messages
- Sequence Formats: FASTA, GenBank, EMBL, Swiss-Prot, Clustal, PHYLIP, NEXUS, Stockholm, Tab-separated
- Structure Formats: PDB, mmCIF
- Tree Formats: Newick, NEXUS, PhyloXML, NeXML
- Data Formats: CSV for clustering analysis
- Core: Flask 3.0.0, BioPython 1.85, NumPy, Matplotlib
- Analysis: scikit-learn, scipy, pandas, networkx
- Visualization: logomaker, reportlab
- Web: Bootstrap 5, Font Awesome, jQuery
NuGenBioPython includes comprehensive validation and testing:
- ✅ Import Validation: All BioPython modules and dependencies
- ✅ Core Functionality: Sequence analysis, file I/O, alignment
- ✅ Advanced Features: Motifs, restriction enzymes, clustering
- ✅ Web Interface: All Flask routes and API endpoints
- ✅ Error Handling: Graceful fallbacks and user feedback
- ✅ Environment: Cross-platform compatibility verification
- ✅ Specialized Modules: SearchIO, SwissProt, BioData, PopGen, Pathway, UniGene
- ✅ HMM Advanced Tools: Model building, Baum-Welch training, Viterbi decoding, Literature mining, Nexus parsing, SCOP lookup, Codon alignment
- ✅ Clustering Algorithms: K-means, DBSCAN, Hierarchical clustering
- ✅ Restriction Enzymes: 20+ common enzymes with cut site analysis
- ✅ Performance: UI scroll optimization and GPU acceleration
- Code Review: BioPython best practices implementation
- Error Handling: Comprehensive try-catch blocks and fallbacks
- User Experience: Interactive examples and clear instructions
- Cross-platform: Linux, macOS, and Windows compatibility
- Documentation: Complete with usage examples and help text
- Clean Codebase: Production-ready with no test files or archived templates
- Performance Tested: Smooth UI with GPU acceleration and scroll optimization
- Total Modules: 21 BioPython modules (100% coverage of key modules)
- Total Templates: 23 HTML templates (production-ready, no backups)
- Total API Endpoints: 50+ RESTful endpoints
- Supported File Formats: 15+ biological data formats
- Supported Organisms: 8+ model organisms (KEGG)
- Restriction Enzymes: 20+ common enzymes
- Clustering Algorithms: 3 methods (K-means, Hierarchical, DBSCAN)
- HMM Features: Model building, Baum-Welch training, Viterbi decoding, Literature mining, Nexus parsing, SCOP lookup, Codon alignment
- Web Routes: 21 main interface routes
- Sample Files: 10+ built-in sample data files
- Code Quality: Professional structure with zero test files in production
- Fork the repository
- Create a feature branch (
git checkout -b feature/AmazingFeature) - Commit changes (
git commit -m 'Add AmazingFeature') - Push to branch (
git push origin feature/AmazingFeature) - Open a Pull Request
- Follow PEP 8 style guidelines
- Add unit tests for new features
- Update documentation for API changes
- Ensure backward compatibility
This project is licensed under the MIT License - see the LICENSE file for details.
Copyright © 2025 Anthony Nystrom
Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:
The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
For issues, questions, or suggestions:
- Open an issue on GitHub
- Refer to the BioPython documentation for module-specific questions
- Check application logs for debugging
For production deployments, you may need to create a .env file with environment-specific variables such as NCBI API keys or custom port configurations.
If you use NuGenBioPython in your research, please cite:
NuGenBioPython: A Web Interface for BioPython
Anthony Nystrom (2024)
GitHub: https://github.com/AnthonyNystrom/NuGenBioPython
Made with ❤️ for the Bioinformatics Community
