No Country for Old Men

Some thoughts on the current state of Computational Linguistics

Richard Sproat

June 2009

The hardest thing about being a pioneer is not the pioneering. It is fitting in once the frontier is no longer the frontier but part of a metropolis. For some of us who have been doing computational linguistics since the 1980's, when many areas were sparsely populated, a trip to a conference such as NAACL feels now like a trip to the big city.

So you don't get me wrong, in (implicitly) claiming to be a pioneer, I am not engaging in conceit: one can be a pioneer, without being Daniel Boone. There were thousands of pioneers in the American West, most of whom we have never heard of. I claim to be one of those. The claim is not without justification: when I started working on Chinese word segmentation back in the late 80's there was not a lot of prior work. In a similar vein, when I worked on computational morphology in the early 1990's, there had of course been a couple of decades of prior work, but the field was still sparsely populated. When I worked on finite-state methods for text normalization along with other colleagues at AT&T, I was one of the people involved in pushing the use of finite-statery beyond its relatively limited prior use in such areas as morphology. I remember being told at the time by some NLP people at a major software company in the Redmond, Washington area, that finite-state methods were passé, and had died out in the 1970's. Time has shown them to be stunningly wrong. And more to the point of the present discussion, I remember an early ACL program committee meeting in Rochester in 1993, the point being that the entire committee fit round a single large table. Clearly what was once mostly woodland, is now urban sprawl.

My immediate motivation for writing this short essay relates specifically to reviews I received on a couple of ACL/EMNLP submissions, as well as my recent attendance at the NAACL in Boulder. This could be taken as an opportunity merely to vent spleen, but I am hoping it will be taken as more than that: because I think it says something about where the field is, where it might be going, and whether or not that is such a good idea.

The papers involved were two.

One was a single-authored paper that proposed a technique for dealing with a complex problem in text normalization, namely the expansion of digit-strings into number names in Russian. The technique involved using a finite-state grammar to overgenerate candidate mappings between digit strings and number names, mining the web for actual valid number name expressions, then using the result to train an n-gram language model, and a couple of discriminative models. For those who care, the (EMNLP version of the) paper is here.

The second was a short paper for ACL, written with an undergraduate and a set of graduate students, on a public toolkit that was developed as part of a Johns Hopkins CSLU summer workshop for performing transliteration. Again, for those who care, the paper is here.

The comments I got on the ACL submission of the first paper were various, but a couple stick out in my mind as being rather telling of the state of the field. One comment was that as far as the reviewer could see, the problem only arose in languages like Russian and maybe other Slavic languages. So what is the point here? If a phenomenon only occurs in one language family, no matter how complicated it is, it is not interesting? Even if it were true that Slavic languages are the only languages to exhibit the morphological complexity found in Russian number names, does that make it less important? Word segmentation is only a problem in languages - among modern languages, Chinese, Japanese, Thai, a few languages of South Asia - that use writing systems that fail to mark word boundaries: yet this has spawned a little cottage industry, and (to date) four international bakeoffs, sponsored by SIGHAN. I suspect that the real problem here is that number-name expansion is often viewed (when it is viewed at all) as being a trivial problem, and thus it simply is not one of the "sanctioned" problems of the field.

Another comment (from a different reviewer) was even more baffling: as far as this reviewer could see, the problem only arose in text-to-speech synthesis; what other application of this work was there? Again even if it were true (at least the application of text normalization for speech recognition systems suggests it is not true), what kind of comment is this? Can you imagine someone arguing that a particular technique is only useful for MT? It wouldn't happen, because if there were a technique whose only clear application were in MT, nobody would make that objection because MT is (now) considered to be one of the key areas of research in the field.

So I tried to address the comments that could be addressed, and resubmitted to EMNLP, and of course I got a different set of responses. The one that sticks in my mind from that round claimed that the application of discriminative methods were naive for a couple of reasons, one being that the models were clearly overtrained due to lack of sufficient training examples; the fact that some of the training sets had millions of examples (stated in the paper) presumably just escaped the reviewer's attention, but the real point that the reviewer wanted to make was that the discriminative method chosen (perceptrons - actually just one of the methods) was the wrong method, and that something like a max margin classifier should have been used. Perhaps, but there is an interesting subtext here, one that I have seen before, namely the implication that a simple machine learning method that is not performing quite so well, should be replaced with such and such method, which will perform massively better. The operant term here is "massively": for what evidence is there that one should expect this? How often does a switch of ML methods for a task (keeping the features constant, of course) result in a massive improvement in accuracy? Improvement, surely? A statistically significant improvement even? But a massive improvement? Sure, sometimes it happens. But how often? Is it the majority of cases so that the prior expectation is that more often than not, a switch of ML techniques will yield almost categorically different results? I doubt it. Yet this implication is often the subtext in comments of this kind. I am not sure of the source of this implicit belief (beyond the expected "I know better than you" attitude so common in academic circles). But I do think it at least relates to the general shift in the tenor of the field, a point to which we return below.

For the second short ACL submission, the feedback was puzzling in a different way. Actually, for the most part the reviewers quite liked it. There was a reasonable complaint that the paper was thin on some details - a direct result of the 4-page limit. But on the whole no one complained about the content - except one reviewer, that is, who felt this was a "tools" paper, and did not present any novel research. Well, er, yes: it was indeed a tools paper, and it was not purporting to present any novel research. But what is wrong with that? Is there not a place at the ACL for such papers? When papers on resources are published (whether in the conferences, or the journals), they often end up being the most cited papers in the field: consider the paper describing the Penn Treebank. But more to the point, is not the purpose of conferences to keep the community informed about the current state of the art, which surely includes resources, as well as new theoretical breakthroughs?

Which brings me to my assessment of the current state of the field, supported also by my admittedly jaded view of the recent NAACL conference. Namely, that the field has devolved in large measure into a group of technicians who are more interested in tweaking the techniques than in the problems they are applied to; who are far more impressed by a clever new ML approach to an old problem, than the application of known techniques to a new problem. My impression of the younger generation of NAACL attendees is that the majority, possibly even the vast majority of them are consummately trained in computational learning methods, and are completely on top of the latest techniques, able to discuss the merits of one or another of these at the drop of a hat; but far more interested in these issues than in the actual problems to which they are being applied.

Not that the old ACL was better: there was much merit in sweeping away all of those tiresome papers on parsing that proposed new parsing algorithms that nobody had bothered to scale to work on more than a few sentences. But in switching tracks to the new field of computational linguistics, we have surely gone too far in the other direction. There should be room for papers that explore new problems with familiar techniques. There should be room for papers that present useful tools that merely implement methods that are already known. The field should not be so monolithically focused on clever technical advances.

In the meantime, perhaps it is time for some of us who have been around for a while to move on to new territories.

Addendum: January 2010

On a whim I resubmitted the Russian number names paper to NAACL 2010: I figured I'd go for three for three. I was not disappointed: the paper was rejected as it was from ACL and EMNLP. To be fair the comments were less bizarre than some of those on the previous rounds. But one reviewer did find the problem I was addressing to be "peripheral".

I suppose that is a pretty good summary of the situation: the work is "peripheral" to the interests of the field, and more generally apparently it is neither interesting nor innovative any more. Oh well.

New addendum: finally got accepted at SLT 2010. Yay!

Estonian translation.