The following are examples of raw data from the Academia Sinica Balanced Corpus (5 million words) illustrating the Good-Turing (P) measure as applied to morphological productivity.

In each of the following cases you will find a raw list of possible instances of the indicated formation. You will need to edit the list to remove spurious cases, then calculate the P ratio as n1/N, where n1 is the number of types that occur once, and N is the total number of types:

For 們 it is interesting to compute P both for the case where (plural) pronouns are included, and for the case where they are not. You should see a huge difference in the estimates for P. Which value seems like a "fairer" estimate of the productivity of 們? Do you notice any other interesting examples in looking over the list of words affixed with 們?

Li and Thompson (1981, page 40) claim that 們 is restricted to human nouns, and to polysyllabic bases. Is this broadly correct?

Reference

Li, Charles and Sandra Thompson. 1981. Mandarin Chinese: A Functional Reference Grammar. University of California Press, Berkeley, CA.