Feng (1997) claims that there was a dramatic increase in the number of disyllabic compounds between Chinese of the pre-Han period and Han Chinese. He compares the text of 孟子 Mengzi (Mencius, c. 372--289 BC) and the text of his main Han commentator 趙 岐 Zhao Qi (c. 107--201 AD), noting many cases where Zhao Qi uses two character words to gloss single character words in Mengzi: e.g. 過 guo `mistake' glossed as 謬誤 miuwu `mistake'. Feng argues that this commonly observed increase in the disyllabicity of Chinese words was for prosodic reasons, related to syllable-structure simplification and the minimal prosodic word.

Is this a claim about types or tokens? Feng certainly talks about tokens in the corpus of Mencius and Zhao Qi. But the theoretical claim seems relate to the structure of the vocabulary --- hence types We'll assume he means types.

Was there in fact an increase in disyllabicity? To answer this, we can compare the distribution of disyllabic terms in texts from the pre-Han, Han and later (Jin/Song/Ming) periods to see if there is a general increase of disyllabic forms over time. The question is how to estimate the number of disyllabic types. One way would be to use the Good-Turing estimate: but you'd need a large sample to have a robust estimate of P. The approach we'll adopt here is to use an association measure to generate lists of highly associated forms and compare the "yield" across different periods. Note that under either of these approaches one would want to use equivalent-sized samples.

As samples we used the following texts from the Han period -- about 1.4 million characters, and an equivalent sized sample of texts from the following Han and Jin/Song/Ming periods:

Pre-Han:
尚書, 周易, 儀禮, 周禮, 論語, 孟子, 墨子, 莊子, 荀子, 韓非子呂氏春秋, 老子,商君書,管子,晏子春秋,孫子,尉繚子,六韜,司馬法,公孫龍子,孝經,爾雅, 黃帝內經素問,黃帝內經靈樞,難經,古本竹書紀年輯證,逸周書,穆天子傳,山海經 (原文),

Han:
禮記,春秋公羊傳,春秋穀梁傳,春秋左傳,國語,戰國策,大戴禮記,韓詩外傳,吳越 春秋,越絕書,史記,漢書,新書,春秋繁露,淮南子,新序,說苑,列女傳,鹽鐵論,法 言,傷寒論,風俗通義,新校本史記三家注,新校本漢書,新校本後漢書,

JSM:
警世通言,喻世明言,醒世恆言,新校本晉書,新校本宋書,初刻拍案驚奇,二刻拍案 驚奇

We computed the 500 most highly associated character pairs, using Likelihood Ratios, from each of those periods and judged whether each pair constituted a word. (Judgments due to Chilin Shih.) The following are the resulting tables:

Are these tables consistent with an increase of disyllabic forms? With a dramatic increase between the Pre-Han and Han periods?

Reference

Feng, Shengli. 1997. "Prosodic structure and compound words in Classical Chinese." In Jerome Packard, editor, New Approaches to Chinese Word Formation: Morphology Phonology and the Lexicon in Modern and Ancient Chinese, number 105 in Trends in Linguistics: Studies and Monographs. Mouton de Gruyter, Berlin, pages 197--260.

Acknowledgment

We gratefully acknowledge Academia Sinica for allowing us access to the historical data.