Home  |  About  |  Contact 

Bioinformatics (WP6)

Aims and Overview

We will build on the existing informatics structures developed under the EUMORPHIA project to both track mouse and data generation as well as to store and disseminate phenotype data. We will maintain the EMPReSS SOP database that underpins EMPReSSslim. We will continue to enhance the EuroPhenome database to provide ready access to phenome data integrated with phenotype generating SOPs. In addition, we will develop and institute a tracking system as a component of EuroPhenome that will allow the user to view the progress of any mutant in the EUMODIC programme from mutant generation to primary phenotyping to secondary phenotyping. The workpackage will bring together results from all the phenotype screens in a single database and provide a common interface for accessing and searching the data. By doing this it will integrate the work of the phenotyping work packages and add value to their individual efforts.


The core activity of the workpackage will be to develop:

         a system for the acquisition of phenotype data from mice phenotyped at the different centres participating in the project

         a tracking system to allow partners to identify which mice are being phenotyped, the centre(s) working on them, and the current status of experimental work on them.

At a technical level, this will require us to implement a system for the input of phenotype data, a database to hold the data, and a browser and analysis tools for investigators to look at the data subsequently.

EuroPhenome database

The database will be based on the structure of the EuroPhenome database developed under EUMORPHIA. This currently holds much of the information required to be stored for EUMODIC, but it will need to be extended to include genotype information and information needed for tracking mouse lines. The major area for development in the first instance will be the data acquisition module. Again, we plan to take advantage of advances made during EUMORPHIA, specifically the ontological schema developed for the description of mouse phenotypes. In this, the assay employed to determine phenotype information can be used as a convenient point-of-entry for data capture by annotating it with ontology terms describing the nature of the data being captured (i.e. what phenotypic attribute is being measured) as well as with information about units or measurements, acceptable maximum and minimum values and so on. We plan to make use of this concept to provide a user-friendly and quick means for data capture. It will be necessary to have a working, if minimal, system of this kind in place early in the project to allow data to be entered from the time phenotyping starts. All software developed during the project will be made available to the community on an open source basis.

EMPReSS SOP database

Many of the SOPs used in the project already reside in the SOP database of EUMORPHIA (http://empress.har.mrc.ac.uk). However, any new SOPs implemented and validated as part of EUMODIC, particularly secondary SOPs, will be entered in this database.

Data dissemination

An important secondary aim will be to have a mechanism in place within the data capture system that will enable rapid publication of phenotype data onto the web. We also aim to make the database (or a frequently refreshed copy of it) accessible directly to other, external software systems, for example via direct SOAP queries, in the interests of improved data accessibility and integration. Our longer-term aim in this respect, in collaboration with other phenotype data repositories, is to develop an integrated system for accessing different flavours of mouse phenotype information from a single site. A critical feature of our efforts in data dissemination will be to support links to EUCOMM, EurExpress, EMAGE and EMMA. The close links established between these programmes through PRIME will ensure the development of an integrated data system. The primary data integration module within EuroPhenome will be at the level of the mouse gene that has been knocked out in EUMODIC. This gene will link, via the ENSEMBL/VEGA ID, to numerous data sources, including the expression data for that gene in EURexpress.  A secondary data integration module will be at the level of the phenotype descriptions6 using ontologies such as PATO.  In EuroPhenome, a number of phenotypes will be annotated with phenotype descriptions. These will be linked to numerous other data sources including, for example, expression data in EURexpress that had common descriptions of anatomical structures.

Links with other phenome databases

It is important that the existing and new databases for data on mouse functional genomics are linked either physically or by search terms, so that data can be mined from one database to another. This will make data more accessible and also add value. Scientists in EUMORPHIA have been working closely with other mouse phenotyping efforts and have formed close links with scientists in the JAX, Oak Ridge National Laboratory, RIKEN and Australia. They have met twice at an International Phenome meeting organised by the PRIME project (EC- funded Coordination Action). It is envisaged that these links will be continued through the Casimir project ((EC- funded Coordination Action, under contract negotiation).

Further development of mouse phenotype ontologies

Capturing data from the phenotyping experiments will provide a test of our ontology schema and it is likely that we will be faced with unforeseen challenges in representing the data. In particular, there remains an unresolved issue concerning the interpretation of raw data from (for example) behavioural tests - data such as "time at periphery in an open field test" require further interpretation to convert them into phenotype descriptions immediately useful to scientists. We are also likely to discover incompleteness in the existing ontologies which will need to be remedied. Furthermore, in order to provide the maximum utility, we will investigate ways of relating phenotype data to human (clinical) phenotype data to aid inference about the possible utility of particular mouse lines as models of human disease. An area of particular emphasis in this respect is the description of the full range of pathology data (as opposed to histopathology, which is currently covered by the MPATH ontology), which is likely to be best addressed using a combinatorial schema similar to that described for phenotypes in general.