Estimating mutual information using B-spline functions - an improved similarity 
measure for analysing gene expression data

Daub, C. O.; Steuer, R.; Selbig, J.; Kloska, S.

doi:10.1186/1471-2105-5-118

Local TagsRelease HistoryDetailsSummary

Estimating mutual information using B-spline functions - an improved similarity measure for analysing gene expression data

Daub, C. O., Steuer, R., Selbig, J., & Kloska, S. (2004). Estimating mutual information using B-spline functions - an improved similarity measure for analysing gene expression data. BMC Bioinformatics, 5, 118. doi:10.1186/1471-2105-5-118.

Item is Released

show all hide all

Basic

show hide

Item Permalink: https://hdl.handle.net/11858/00-001M-0000-0014-2D09-2 Version Permalink: https://hdl.handle.net/11858/00-001M-0000-0014-2D0A-F

Genre: Journal Article

Files

show Files

hide Files

:

Daub-2004-Estimating mutual in.pdf (Any fulltext), 438KB

View Save

File Permalink:
https://hdl.handle.net/11858/00-001M-0000-0014-2D0B-D

Name:
Daub-2004-Estimating mutual in.pdf

Description:
-

OA-Status:

Visibility:
Public

MIME-Type / Checksum:
application/pdf / [MD5]

Technical Metadata:

View

Copyright Date:
-

Copyright Info:
-

License:
-

Locators

show

Creators

show

hide

Creators:
Daub, C. O.¹, Author
Steuer, R.², Author
Selbig, J.¹, Author
Kloska, S.³, Author

Affiliations:
1BioinformaticsCRG, Cooperative Research Groups, Max Planck Institute of Molecular Plant Physiology, Max Planck Society, ou_1753315
2External Organizations, ou_persistent22
3Binf, Department Willmitzer, Max Planck Institute of Molecular Plant Physiology, Max Planck Society, ou_1753343

Content

show

hide

Free keywords: microarray sequences patterns entropy protein

Abstract: Background: The information theoretic concept of mutual information provides a general framework to evaluate dependencies between variables. In the context of the clustering of genes with similar patterns of expression it has been suggested as a general quantity of similarity to extend commonly used linear measures. Since mutual information is defined in terms of discrete variables, its application to continuous data requires the use of binning procedures, which can lead to significant numerical errors for datasets of small or moderate size. Results: In this work, we propose a method for the numerical estimation of mutual information from continuous data. We investigate the characteristic properties arising from the application of our algorithm and show that our approach outperforms commonly used algorithms: The significance, as a measure of the power of distinction from random correlation, is significantly increased. This concept is subsequently illustrated on two large-scale gene expression datasets and the results are compared to those obtained using other similarity measures. A C++ source code of our algorithm is available for non-commercial use from kloska@scienion.de upon request. Conclusion: The utilisation of mutual information as similarity measure enables the detection of non-linear correlations in gene expression datasets. Frequently applied linear correlation measures, which are often used on an ad-hoc basis without further justification, are thereby extended.

Details

show

hide

Language(s): eng - English

Dates: Published Online: 2004-08-31Date issued: 2004

Publication Status: Issued

Pages: -

Publishing info: -

Table of Contents: -

Rev. Type: -

Identifiers: ISI: ISI:000223993000001
DOI: 10.1186/1471-2105-5-118
ISSN: 1471-2105 (Electronic) 1471-2105 (Linking)
URI: ://000223993000001 http://www.ncbi.nlm.nih.gov/pmc/articles/PMC516800/pdf/1471-2105-5-118.pdf

Degree: -

Event

show

Legal Case

show

Project information

show

Source 1

show

hide

Title: BMC Bioinformatics

Source Genre: Journal

Creator(s):

Affiliations:

Publ. Info: -

Pages: - Volume / Issue: 5 Sequence Number: - Start / End Page: 118 Identifier: -