Express is a database of transcriptome profiles encompassing known and novel transcripts across multiple developmental stages in eye tissues in mouse. Express contains transcript level expression data obtained from 18 lens and 35 retina RNA-Seq mouse samples. We downloaded the raw datasets, aligned them to reference genome and quantified transcript level expression for known and novel transcripts. We then downloaded the reference gene and transcript information and organized them along with the expression data in a MySQL database. We finally developed a PHP backend to interact with the database and a frontend to interact with the user and to visualize the query results.
We downloaded 21 mouse lens and 35 mouse retina samples across different developmental stages varying from E15 to P90. Please see Table 1 and Table 2 in our publication for more details.
The downloaded raw datasets (in FASTQ format) were aligned to reference mouse genome (mm10) using HISAT. The alignment files (in SAM format) were processed to sorted BAMs later indexing them using SAMtools. The sorted BAM files were then given to StringTie for transcript quantification and discovery along with a reference mouse transcripts obtained from Ensembl. The GTF files storing the expression levels for known and novel transcipts provided by StringTie were then used to generate a reference annotation file including novel transcripts using StringTie "merge" mode. After the reference annotation file with novel transcripts were obtained, we reran StringTie with the sorted BAM files giving the new reference annotation file to collect the GTF files including expression levels for transcripts including known and novel transcripts. Then, we did quantile normalization for the lens and retina samples separately. The final tables with normalized expression levels are then organized into an SQL table.
We also downloaded gene information from Ensembl BioMart and HGNC for gene alias, gene name, gene ID and transcript ID relationships for all known transcripts. We also downloaded transcript information from Ensembl including gene ID and transcript ID. These two tables are then converted into SQL tables and together with the expression data, they were put in a MySQL database.
The dump of MySQL database can be downloaded using this link (express.sql.gz, compressed 242 MB). The complete guide to set up a local server of Express is given on its GitHub page including the source code.
Go to Home page, select a tissue type/subtype and enter a query (or pick one of the sample queries), and then click search button. The results will be shown as heatmap by default and the raw expression data obtained will be filtered with > 5 TPM. The heatmap includes transcripts in its rows and developmental stage:cell subtype in its columns. When there is no cell subtype given for a developmental stage, it is the whole tissue rather than a particular cell subtype (for lens; E: epithelium, F: fiber and for retina; C: cones, R: rods). You can later change TPM cutoff settings to filter expression data for different TPM cutoffs and switch to the quantile normalized expression data across samples per tissue type rather than raw expression data. The browser view can be toggled using the button in the right hand side of the navigation. Similarly, you can also toggle the heatmap view. Both views and heatmap data can be exported using the Export button on the right hand side of the navigation. The views will be exported in SVG (scalable vector graphics) format and the data will be exported as TSV (Tab-separated values). The exported data will include gene name, transcript ID, developmenal stage, NCBI BioProject ID, PubMed ID, study reference, novelty flag, averaged raw TPM value across samples and averaged normalized TPM value across samples. Novelty flags can be 0, 1 and 2. 0 means it is a known (annotated) transcript (shown as ENSMUSTXXXXXXXXXXX); 1 means it is an unannnotated transcript (shown as MSTRG.XXXX.XXXXX.X); 2 means it is a completely novel transcript (shown as MSTRG.XXXX.XXXXX.X).
The backend PHP API allows us to query the MySQL database for expression levels of transcripts given a tissue type, a TPM cutoff and a query (e.g. gene synonym/name, Ensembl gene ID, MGI gene ID, Ensembl transcript ID or chromosomal location). The API URL as follows: https://sysbio.sitehost.iu.edu/express/app/api.php
and accepts three GET parameters, expression
, query
, tissue
, cutoff
and value
(e.g. https://sysbio.sitehost.iu.edu/express/app/api.php?expression=transcript&query=Cryb2&tissue=lens&cutoff=1&value=raw).
The expression
parameter can be one of the following:
The query
parameter can be one of the following:
The tissue
parameter can be one of the following:
The cutoff
parameter can be one of the following:
The value
parameter can be one of the following:
The output will be a JSON array of objects with following properties:
Please cite Express in your publications as:
Budak, G., Dash, S., Srivastava, R., Lachke, S. A., & Janga, S. C. (2018). Express: a database of transcriptome profiles encompassing known and novel transcripts across multiple development stages in eye tissues. Experimental eye research, 168, 57-68.