The Space Complexity of Pass-Efficient Algorithms for Clustering

Chang, Kevin L.; Kannan, Ravi

Local TagsRelease HistoryDetailsSummary

The Space Complexity of Pass-Efficient Algorithms for Clustering

Chang, K. L., & Kannan, R. (2006). The Space Complexity of Pass-Efficient Algorithms for Clustering. In Proceedings of the Seventeenth Annual ACM-SIAM Symposium on Discrete Algorithms, SODA'06 (pp. 1157-1166). New York, USA: ACM / Siam.

Item is Released

show all hide all

Basic

show hide

Item Permalink: https://hdl.handle.net/11858/00-001M-0000-000F-242C-A Version Permalink: https://hdl.handle.net/11858/00-001M-0000-000F-242D-8

Genre: Conference Paper

Files

show Files

hide Files

:

census-soda.ps (Any fulltext), 424KB

File Permalink:
-

Name:
census-soda.ps

Description:
-

OA-Status:

Visibility:
Private

MIME-Type / Checksum:
application/postscript

Technical Metadata:

Copyright Date:
-

Copyright Info:
Copyright 2006 ACM. Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee.

License:
-

Locators

show

Creators

show

hide

Creators:
Chang, Kevin L.¹, Author
Kannan, Ravi, Author

Affiliations:
1Algorithms and Complexity, MPI for Informatics, Max Planck Society, ou_24019

Content

show

hide

Free keywords: -

Abstract: We present multiple pass streaming algorithms for a basic clustering problem for massive data sets. If our algorithm is allotted 2l passes, it will produce an approximation with error at most ε using Õ(k3/ε2/l) bits of memory, the most critical resource for streaming computation. We demonstrate that this tradeoff between passes and memory allotted is intrinsic to the problem and model of computation by proving lower bounds on the memory requirements of any l pass randomized algorithm that are nearly matched by our upper bounds. To the best of our knowledge, this is the first time nearly matching bounds have been proved for such an exponential tradeoff for randomized computation.In this problem, we are given a set of n points drawn randomly according to a mixture of k uniform distributions and wish to approximate the density function of the mixture. The points are placed in a datastream (possibly in adversarial order), which may only be read sequentially by the algorithm. We argue that this models, among others, the datastream produced by a national census of the incomes of all citizens.

Details

show

hide

Language(s): eng - English

Dates: Modified: 2007-04-16Date issued: 2006

Publication Status: Issued

Pages: -

Publishing info: New York, USA : ACM / Siam

Table of Contents: -

Rev. Type: -

Identifiers: eDoc: 314430
Other: Local-ID: C1256428004B93B8-42E27E9B111F2537C1257289003D2F0F-Chang2006

Degree: -

Event

show

hide

Title: Untitled Event

Place of Event: Miami, USA

Start-/End Date: 2007-02-21

Legal Case

show

Project information

show

Source 1

show

hide

Title: Proceedings of the Seventeenth Annual ACM-SIAM Symposium on Discrete Algorithms, SODA'06

Source Genre: Proceedings

Creator(s):

Affiliations:

Publ. Info: New York, USA : ACM / Siam

Pages: - Volume / Issue: - Sequence Number: - Start / End Page: 1157 - 1166 Identifier: ISBN: 0-89871-605-5