This repository contains additional datasets and analysis workflow for the paper Takaramoto et al (2024) HulaCCR1, a pump-like cation channelrhodopsin discovered in a lake microbiome J Mol Biol436:168844. If you are here for the sequence of HulaCCR1: MRRKLALTSLCGMPGKLAAACILATVVVSFAAPSIPSIQSVAALDQTALAGHLAPAPAAPDAITVSRKVGEADATTSSPTYIGGNPTKCWNYYYVAGAYAFGIVFQTAFAALMYYYTNRGTGWYGHPFDEKNRRYEYNDIGIYVQIATIVNYCLQFVYNIQNGHGSFNPGNFRYFEYCFTCPFIVLDVCYSVELPHKGLNFALTFFTLFLGGVMALSNKSTTDVYLLFMLSAIAMVVLYSLMLYGVALKWDAIDDSAKPTLKMGLGIFFGIWPIFPIFYALYRDAGFSCELDVSLHLILDIACKGSFGWLMLRYRLTMEDIEWDDMQAELNSLEVASRDGSMPMTPMTPNKRSFRNRRQSLVDHARMEKMMNLNSGIVPKLTYQVHSLSTGTTPRPLSRVGGGFTPQTEDPKRGGINDKRAFADRVAAGD.
Files in the directory output:
Figure_2b_rhodopsin_phylogeny.newick- phylogenetic tree of the mgCCR1-3 clade with fragments placed on a RAxML tree of representative sequences using pplacerFigure_2c_clade_distrbution.csv- distribution of the BCCRs in the different datasets containing ChRs from the mgCCR1-3 clade. CSV file with three columns: JGI GOLD analysis accession (orHulaCCR1for our Hula Lake dataset), BCCR (sub)clade, number of genesFigure_S1a_transposon_phylogeny.newick- phylogenetic tree of transposon sequences found in reference genomes and in the flanks of the contigs containing genes from the mgCCR1-3 clade. Fragments were placed on the tree with pplacerFigure_S1a_transposon_subclades.csv- ummary of the transposon subclades identified in contigs with the genes from the mgCCR1-3 clade. For each representative sequence, numbers of fragments from each mgCCR1-3 subclade that cluster together with the representative are listedFigure_S1b_cryptist_phylogeny.newick- 18S phylogeny of the Cryptista (the basalmost OTU is Palpitomonas bilix)Figure_S2_sintax.svg- the visualization of the distribution of the different taxa across the datasets containing genes from mgCCR1-3 clade based on 18S rRNA analysis with sintaxFigure_S2_sintax.tsv- aggregated results of the sintax analysis
Files in the directory metadata (the input to the workflow):
BCCRs.faa- representative bacteriorhodopsin-like cation channelrhodopsins (BCCRs)datasets.xslx- metadata for the datasets in which rhodopsins from the mgCCR1-3 clade (including HulaCCR1) were foundflanks.fna- regions flanking genes coding for rhodopsin from the mgCCR1-3 cladeingroup.faa- all protein sequences of the collected rhodopsins from the mgCCR1-3 clade, includes fragments and redundant sequencesselected_ingroup.faa- non-redundant set of representative proteins from the mgCCR1-3 clade
The workflow itself is written in Snakemake and the dependencies are taken care of with conda. The code and the environment definitions can be found in the workflow directory.