English
 
Help Privacy Policy Disclaimer
  Advanced SearchBrowse

Item

ITEM ACTIONSEXPORT
  Estimation of pairwise sequence similarity of mammalian enhancers with word neighbourhood counts

Göke, J., Schulz, M. H., Lasserre, J., & Vingron, M. (2012). Estimation of pairwise sequence similarity of mammalian enhancers with word neighbourhood counts. Bioinformatics, 28(5), 656-63. doi:10.1093/bioinformatics/bts028.

Item is

Files

show Files
hide Files
:
Göke.pdf (Publisher version), 674KB
Name:
Göke.pdf
Description:
-
OA-Status:
Visibility:
Public
MIME-Type / Checksum:
application/pdf / [MD5]
Technical Metadata:
Copyright Date:
-
Copyright Info:
© 2012 Oxford University Press
License:
-

Locators

show

Creators

show
hide
 Creators:
Göke, Jonathan1, Author           
Schulz, Marcel H.2, Author
Lasserre, Julia1, Author           
Vingron, Martin3, Author           
Affiliations:
1Dept. of Computational Molecular Biology (Head: Martin Vingron), Max Planck Institute for Molecular Genetics, Max Planck Society, Berlin, Germany, ou_1433547              
2Ray and Stephanie Lane Center for Computational Biology, Carnegie Mellon University, 15213 Pittsburgh, USA , ou_persistent22              
3Gene regulation (Martin Vingron), Dept. of Computational Molecular Biology (Head: Martin Vingron), Max Planck Institute for Molecular Genetics, Max Planck Society, ou_1479639              

Content

show
hide
Free keywords: Algorithms Animals Cluster Analysis Enhancer Elements, Genetic Genome-Wide Association Study Mice/embryology/ genetics Organ Specificity Software
 Abstract: MOTIVATION: The identity of cells and tissues is to a large degree governed by transcriptional regulation. A major part is accomplished by the combinatorial binding of transcription factors at regulatory sequences, such as enhancers. Even though binding of transcription factors is sequence-specific, estimating the sequence similarity of two functionally similar enhancers is very difficult. However, a similarity measure for regulatory sequences is crucial to detect and understand functional similarities between two enhancers and will facilitate large-scale analyses like clustering, prediction and classification of genome-wide datasets. RESULTS: We present the standardized alignment-free sequence similarity measure N2, a flexible framework that is defined for word neighbourhoods. We explore the usefulness of adding reverse complement words as well as words including mismatches into the neighbourhood. On simulated enhancer sequences as well as functional enhancers in mouse development, N2 is shown to outperform previous alignment-free measures. N2 is flexible, faster than competing methods and less susceptible to single sequence noise and the occurrence of repetitive sequences. Experiments on the mouse enhancers reveal that enhancers active in different tissues can be separated by pairwise comparison using N2. Conclusion: N2 represents an improvement over previous alignment-free similarity measures without compromising speed, which makes it a good candidate for large-scale sequence comparison of regulatory sequences. AVAILABILITY: The software is part of the open-source C++ library SeqAn (www.seqan.de) and a compiled version can be downloaded at http://www.seqan.de/projects/alf.html. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

Details

show
hide
Language(s): eng - English
 Dates: 2012-01-122012
 Publication Status: Issued
 Pages: -
 Publishing info: -
 Table of Contents: -
 Rev. Type: Peer
 Identifiers: DOI: 10.1093/bioinformatics/bts028
 Degree: -

Event

show

Legal Case

show

Project information

show

Source 1

show
hide
Title: Bioinformatics
Source Genre: Journal
 Creator(s):
Affiliations:
Publ. Info: Oxford : Oxford University Press
Pages: - Volume / Issue: 28 (5) Sequence Number: - Start / End Page: 656 - 63 Identifier: ISSN: 1367-4803
CoNE: https://pure.mpg.de/cone/journals/resource/954926969991