# Item

ITEM ACTIONSEXPORT

Released

Journal Article

#### On the distribution of the number of missing words in Random texts

##### Locator

There are no locators available

##### Fulltext (public)

There are no public fulltexts available

##### Supplementary Material (public)

There is no public supplementary material available

##### Citation

Rahmann, S. (2003). On the distribution of the number of missing words in Random texts.* Combinatorics, Probability and Computing,* *12*(1), 72-87. doi:10.1017/S0963548302005473.

Cite as: http://hdl.handle.net/11858/00-001M-0000-0010-8B01-6

##### Abstract

Determining the distribution of the number of empty urns after a number of balls have been thrown randomly into the urns is a classical and well understood problem. We study a generalization: Given a finite alphabet of size [sigma] and a word length q, what is the distribution of the number X of words (of length q) that do not occur in a random text of length n+q[minus sign]1 over the given alphabet? For q=1, X is the number Y of empty urns with [sigma] urns and n balls. For q[gt-or-equal, slanted]2, X is related to the number Y of empty urns with [sigma]q urns and n balls, but the law of X is more complicated because successive words in the text overlap. We show that, perhaps surprisingly, the laws of X and Y are not as different as one might expect, but some problems remain currently open.