Generating Realistic Synthetic Population Datasets

Wu, Hao; Ning, Yue; Chakraborty, Prithwish; Vreeken, Jilles; Tatti, Nikolaj; Ramakrishnan, Naren

Lokale TagsFreigabegeschichteDetailsÜbersicht

Generating Realistic Synthetic Population Datasets

Wu, H., Ning, Y., Chakraborty, P., Vreeken, J., Tatti, N., & Ramakrishnan, N. (2016). Generating Realistic Synthetic Population Datasets. Retrieved from http://arxiv.org/abs/1602.06844.

Item is Freigegeben

einblenden: alle ausblenden: alle

Basisdaten

einblenden: ausblenden:

Datensatz-Permalink: https://hdl.handle.net/11858/00-001M-0000-002B-08F9-B Versions-Permalink: https://hdl.handle.net/11858/00-001M-0000-002B-08FA-9

Genre: Forschungspapier

Dateien

einblenden: Dateien

ausblenden: Dateien

:

arXiv:1602.06844.pdf (Preprint), 2MB

Öffnen Speichern

Datei-Permalink:
https://hdl.handle.net/11858/00-001M-0000-002B-08FB-7

Name:
arXiv:1602.06844.pdf

Beschreibung:
File downloaded from arXiv at 2016-07-19 13:49

OA-Status:

Sichtbarkeit:
Öffentlich

MIME-Typ / Prüfsumme:
application/pdf / [MD5]

Technische Metadaten:

Öffnen

Copyright Datum:
-

Copyright Info:
-

Lizenz:
http://arxiv.org/help/license

Externe Referenzen

einblenden:

Urheber

einblenden:

ausblenden:

Urheber:
Wu, Hao¹, Autor
Ning, Yue¹, Autor
Chakraborty, Prithwish¹, Autor
Vreeken, Jilles², Autor
Tatti, Nikolaj¹, Autor
Ramakrishnan, Naren¹, Autor

Affiliations:
1External Organizations, ou_persistent22
2Databases and Information Systems, MPI for Informatics, Max Planck Society, ou_24018

Inhalt

einblenden:

ausblenden:

Schlagwörter: Computer Science, Databases, cs.DB

Zusammenfassung: Modern studies of societal phenomena rely on the availability of large datasets capturing attributes and activities of synthetic, city-level, populations. For instance, in epidemiology, synthetic population datasets are necessary to study disease propagation and intervention measures before implementation. In social science, synthetic population datasets are needed to understand how policy decisions might affect preferences and behaviors of individuals. In public health, synthetic population datasets are necessary to capture diagnostic and procedural characteristics of patient records without violating confidentialities of individuals. To generate such datasets over a large set of categorical variables, we propose the use of the maximum entropy principle to formalize a generative model such that in a statistically well-founded way we can optimally utilize given prior information about the data, and are unbiased otherwise. An efficient inference algorithm is designed to estimate the maximum entropy model, and we demonstrate how our approach is adept at estimating underlying data distributions. We evaluate this approach against both simulated data and on US census datasets, and demonstrate its feasibility using an epidemic simulation application.

Details

einblenden:

ausblenden:

Sprache(n): eng - English

Datum: Erstellt: 2016-02-22Geändert: 2016-02-25Online veröffentlicht: 2016

Publikationsstatus: Online veröffentlicht

Seiten: 16 p.

Ort, Verlag, Ausgabe: -

Inhaltsverzeichnis: -

Art der Begutachtung: -

Identifikatoren: arXiv: 1602.06844
URI: http://arxiv.org/abs/1602.06844
BibTex Citekey: Wu_arXiv2016

Art des Abschluß: -

Datensatz

Basisdaten

Dateien

Externe Referenzen

Urheber

Inhalt

Details

Veranstaltung

Entscheidung

Projektinformation

Quelle