English
 
Help Privacy Policy Disclaimer
  Advanced SearchBrowse

Item

ITEM ACTIONSEXPORT
 
 
DownloadE-Mail
  The SYSTERS protein family database: taxon-related protein family size distributions and singleton frequencies

Meinel, T., Vingron, M., & Krause, A. (2003). The SYSTERS protein family database: taxon-related protein family size distributions and singleton frequencies. In H.-W. Mewes, D. Frishman, V. Heun, & S. Kramer (Eds.), Proceedings of the German Conference on Bioinformatics (GCB '03) (pp. 103-108).

Item is

Files

show Files
hide Files
:
gcb2003_meinel.pdf (Any fulltext), 162KB
Name:
gcb2003_meinel.pdf
Description:
-
OA-Status:
Visibility:
Public
MIME-Type / Checksum:
application/pdf / [MD5]
Technical Metadata:
Copyright Date:
-
Copyright Info:
eDoc_access: PUBLIC
License:
-

Locators

show

Creators

show
hide
 Creators:
Meinel, Thomas1, Author
Vingron, Martin2, Author           
Krause, Antje1, Author
Affiliations:
1Max Planck Society, ou_persistent13              
2Gene regulation (Martin Vingron), Dept. of Computational Molecular Biology (Head: Martin Vingron), Max Planck Institute for Molecular Genetics, Max Planck Society, ou_1479639              

Content

show
hide
Free keywords: protein family; large scale clustering; taxonomy; taxon-related; cluster size distribution
 Abstract: Based on the SYSTERS protein family database, we present taxon-related protein family frequencies and distributions. A set of taxon-related protein families is a subset of the whole family set with respect to one taxon, where taxon is not restricted to the species level but may be any rank in the taxonomy. We examine eight ranks in the lineages of seven organisms. A strong linear correlation is observed between the total number of different families and the number of sequences in the data set under consideration. We fitted the generalised power-law function to protein family distributions in a least-squares sense excluding singleton frequencies. Taxon-related family distributions tend to have the same shape and a negative slope being not larger than -2.1 for large data sets. For smaller data sets, the slope is decreasing down to -3.7. Slopes of family distributions are found to be slowly increasing towards higher taxonomic ranks. Our observations lead to a new estimation of single sequence cluster frequencies. Data sets of various species are studied with respect to being complete or incomplete.

Details

show
hide
Language(s): eng - English
 Dates: 2003
 Publication Status: Issued
 Pages: -
 Publishing info: -
 Table of Contents: -
 Rev. Type: -
 Identifiers: eDoc: 175889
 Degree: -

Event

show
hide
Title: German Conference on Bioinformatics
Place of Event: Neuherberg/Garching near Munich
Start-/End Date: 2003-10-12 - 2003-10-14

Legal Case

show

Project information

show

Source 1

show
hide
Title: Proceedings of the German Conference on Bioinformatics (GCB '03)
Source Genre: Proceedings
 Creator(s):
Mewes, H.-W., Editor
Frishman, D., Editor
Heun, V., Editor
Kramer, S., Editor
Affiliations:
-
Publ. Info: -
Pages: - Volume / Issue: - Sequence Number: - Start / End Page: 103 - 108 Identifier: -