I am a Research Fellow at the Hellenic Center for Marine Research (http://www.hcmr.gr/en), Crete outstation, Greece, working on MARBIGEN (http://www.marbigen.org/), a European Union project, aiming at supporting marine biodiversity and genomics research in the Eastern Mediterranean. (Home Page: http://epafilis.info)
My prior training is in biomedical informatics including techniques such as:
- the recognition of mentions of biological entities (like genes, proteins, small chemical molecules, organism species) in text
- novel web-based interface development and data integration, e.g. browser extensions (Reflect, http://reflect.ws), that provide researchers, as well as other users, with quick and easy access to concise summaries of biological knowledge.
Currently, I am actively engaged in applying such skills in the field of biodiversity informatics.
The EOL-Rubenstein 2013 "ENVIRONMENTS" proposal is such an example. It focuses on identifying environment descriptive terms in text, such as "coral reef, cultivated land, glacier, pelagic, forest, lagoon", in EOL taxa textual descriptions.
ENVIRONMENTS (http://envo.her.hcmr.gr/environments.html) is an open source tool designed to this end that follows a dictionary-based approach. Practically, ENVIRONMENTS scans the contents of text files by looking up their words against a list of known environment descriptors and reporting those found.
The Environment Ontology (EnvO, http://environmentontology.org/) a community resource offering a controlled, structured vocabulary for biomes (ecosystem types), environmental features, and environmental materials, serves as the source of names and synonyms for such identification process.
To improve detection, upon word look-up, ENVIRONMENTS allows for orthographic variation in the way the environment descriptive terms are written (e.g. plural forms and spacing/hyphenation like in "freshwater", "fresh-water", and "fresh water").
Besides the software and the dictionary major parts of the proposal comprise the tasks required to customize ENVIRONMENTS to the biodiversity literature and to evaluate its accuracy. To this end the creation of a gold standard corpus (i.e. a set of documents in which environment descriptive terms have been manually annotated) is included in the project.
Due to the complexity of the previously mentioned tasks a team of researchers with diverse backgrounds (molecular biology, microbial ecology, data analysis, text mining and more) is providing me with support and contributing to this project:
- Hellenic Center for Marine Research (HCMR, http://www.hcmr.gr/en/): Lucia Fanini (http://www.marbigen.org/users/lucia-fanini), Sarah Faulwetter (http://www.marbigen.org/users/sarah-faulwetter), Christina Pavloudi (http://www.marbigen.org/users/christina-pavloudi), Aikaterini Vasileiadou (http://www.marbigen.org/users/katerina-vasileiadou)
- NNFCPR (http://www.cpr.ku.dk/): Sune Frankild (http://dk.linkedin.com/pub/sune-frankild/1/35B/5A3), Lars Juhl Jensen (http://larsjuhljensen.wordpress.com/about/)
- MPI_MM, Bremen (http://www.mpi-bremen.de/en/): Julia Schnetzer (http://www.mpi-bremen.de/en/Page6679.html)
- Uni. of Glasgow (http://userweb.eng.gla.ac.uk/christopher.quince/): Umer Ijaz (http://userweb.eng.gla.ac.uk/umer.ijaz/)
Collaborators also include: Christos Arvanitidis (HCMR, http://www.marbigen.org/users/christos-arvanitidis), Christopher Quince (Uni. of Glasgow, http://userweb.eng.gla.ac.uk/christopher.quince/)
- Full name
- Evangelos Pafilis
- I am
- an educator, a professional scientist
- Curator level
- Assistant Curator
- Bioinformatics, Biodiversity Informatics Research Fellow, Hellenic Centre for Marine Research (HCMR), Crete, Greece Publications available at: http://epafilis.info/
- Curation scope
- named entity recognition linking biodiversity data automated enrichment of web content (augmented browsing) literature mining