Does the theory of suoxie propounded by Huang and colleagues for Taiwan county names work on other corpora?

Consider the following table, generated from the 10M Character ROCLING corpus, a different corpus from the Academia Sinica corpus used by Huang and colleagues. As in Huang et al's table, it shows, for each component of the two-character county name, the mutual information between that character and the character 縣 `county'.

This was produced by a script that eliminates from consideration any examples where one of the two characters in a county name occurs directly before the character 縣. This has the desirable effect of eliminating the actual suoxie examples, as Huang et al did; but it is likely to err somewhat on the side eliminating too many examples. Is the resulting table a reasonable rendition of what's in the Huang et al papers, and if not what are the important differences?

台 東 0.285 1.402
台 北 0.285 1.173
花 蓮 0.623 1.242
彰 化 1.483 1.129
苗 栗 2.307 2.985
台 中 0.285 1.004
澎 湖 1.490 0.991
雲 林 1.718 1.608
台 南 0.285 1.573
桃 園 0.986 0.911
高 雄 0.464 1.018
南 投 1.573 -0.127
屏 東 3.646 1.402
嘉 義 2.194 0.926
新 竹 0.405 0.765
宜 蘭 0.673 1.469

References

Huang, Chu-Ren, Kathleen Ahrens, and Keh-Jiann Chen. 1994. "A data-driven approach to psychological reality of the mental lexicon: Two studies on chinese corpus linguistics." In Language and its Psychobiological Bases, Taipei.

Huang, Chu-Ren, Wei-Mei Hong, and Keh-Jiann Chen. 1994. Suoxie: An information based lexical rule of abbreviation. In Proceedings of the Second Pacific Asia Conference on Formal and Computational Linguistics II, pages 49-52, Japan.