MalariaSED (malaria parasite sequence decoder) is a sequence-based Deep Learning (DL) framework for malaria parasites to understand the contribution of noncoding variants to epigenetic profiles. The current version can predict the chromatin impacts, including open chromatin accessibility, H3K9ac, and six TFs, including PfAP2-G, PfAP2-I, PfBDP1, PfAP2-G5, PbAP2-O, and PbAP2-G2, covering different parasite living environments like the mosquito host, the human liver, and human blood cells.

We provide two input formats for users to compute the chromatin profile changes:

The VCF format requests that the beginning four columns of the user input file include chromosome ID, genomic variant location, reference nucleotide (the nucleotide sequence for insertion or deletion) and alternative nucleotide (‘*for deletion). The nucleotide sequence length should be shorter than 1kb, and we only support 200 rows each time.

Users can upload two Fasta files for reference and alternation sequences. Multiple sequences are allowed, and MalariaSED will calculate chromatin effects between two Fasta sequences with the same row ID in the reference and alternation files. The length of both Fasta files should be equal to 1kb. We only allow 200 fasta sequences each time.

Output formats from VCF prediction:

The output of MalairaSED provides a tab-separated file.

Column 1: the information of a input locus, listing as the format "chromosome location", "reference sequence", "alternat sequence"::"1kb extended chromosome location surronding the locus".

Column 2 and 3: The closest gene ID and corresponding distance to the locus, Plasmodb version 26.

Column 4 and 5: The closest gene ID and corresponding distance to the locus, Plasmodb version 55.

Column 6 to 31: The probability of each chromatin profile predicted from MalariaSED

Column 32 to 44: The absolute chromatin effect calculated as log2 fold change of odds.

Column 45 to 57: The E-value of a chromatin effect is the expected fraction of the non-coding variants reported in MalariaGEN database that would present higher effect than the variant here.

Output formats from FASTA prediction

Column 1: the input fasta head

Column 2 to 27: The probability of each chromatin profile predicted from MalariaSED

Column 28 to 40: The absolute chromatin effect calculated as log2 fold change of odds.

Column 41 to 53: The E-value of a chromatin effect is the expected fraction of the non-coding variants reported in MalariaGEN database that would present higher effect than the variant here.

A command line version is in GitHub (https://github.com/CharleyWang/MalariaSED).



Quick guide:

User upload files: