LANE CENTER TECHNICAL REPORT ABSTRACTS

CMU-CS-09-101
Lane Center for Computational Biology
School of Computer Science, Carnegie Mellon University

CMU-CB-09-101

Structured Literature Image Finder: Open Source Software
for Extracting and Disseminating Information from
Text and Figures in Biomedical Literature

Abdul-Saboor Sheikh¹, Amr Ahmed^2,3, Andrew Arnold²,
Luis Pedro Coelho^1,4,5, Joshua Kangas^1,4,5, Eric P. Xing^1,2,3,4,5,6,
William Cohen^1,2,3,4,5, Robert F. Murphy^1,2,4,5,6,7,

October 2009

CMU-CB-09-101.pdf

Keywords: Automated Image Analysis, Biomedical Literature, Data and Image Mining, Figure and Caption Modeling, Information Retrieval, Machine Learning, Natural Language Processing

The SLIF project combines text-mining and image processing to extract structured information from biomedical literature.

SLIF extracts images and their captions from published papers. The captions are automatically parsed for relevant biological entities (protein and cell type names), while the images are classified according to their type (e.g., micrograph or gel). Fluorescence microscopy images are further processed and classified according to the depicted subcellular localization. The results of this process can be queried online using either a user-friendly web-interface or an XML-based web-service. As an alternative to the targeted query paradigm, SLIF also supports browsing the collection based on latent topic models which are derived from both the annotated text and the image data.

In addition to a description of the SLIF system, this technical report describes the hand-labeled datasets used for training SLIF components. These datasets, and the SLIF web application, are publicly available at http://slif.cbi.cmu.edu.

52 pages

¹Center for Bioimage Informatics, Carnegie Mellon University
²Machine Learning Department, Carnegie Mellon University
³Language Technologies Institute, Carnegie Mellon University
⁴Joint Carnegie Mellon University–University of Pittsburgh Ph.D. Program in Computational Biology
⁵Lane Center for Computational Biology, Carnegie Mellon University
⁶Department of Biological Sciences, Carnegie Mellon University
⁷Department of Biomedical Engineering, Carnegie Mellon University

Return to: SCS Technical Report Collection
School of Computer Science

This page maintained by reports@cs.cmu.edu