Maruhakayama Kofun (丸墓山古墳), Sakitama Kofuns (さきたま古墳 群), Gyōda (行田), Saitama Prefecture (埼玉県), April, 2021
|史伯樂 ರಿಚರ್ಡ್ ಸ್ಪ್ರೋಟ್|
|Tokyo 150-0002 Japan|
Click here for other
A brief summary of the results of the recent Kaggle Text Normalization Challenge.
I am a computational linguist (which means that I have some things in common with grapefruit).
I am a Research Scientist at Google, formerly in New York, now in Tokyo.
From January, 2009, through October 2012, I was a professor at the Center for Spoken Language Understanding at the Oregon Health and Science University.
Prior to going to OHSU, I was a professor in the departments of Linguistics and Electrical and Computer Engineering at the University of Illinois at Urbana-Champaign. I was also a full-time faculty member at the Beckman Institute. I still hold adjunct positions in Linguistics and ECE at UIUC.
I continue to maintain some "side-bar interests" including computational models of the early evolution of writing, the statistical properties of non-linguistic symbol systems, and collaborating on a translation of Wolfgang von Kempelen's Mechanismus der menschlichen Sprache, which was published in 2017.
Prior to coming to Google I was working on several projects, some of which are still current in the sense that my collaborators are still working on them.
I am very interested in writing systems; see some work I was doing on approximate string matching in the Easter Island rongorongo script. I also ran (with Jerry Packard) a reading group centered around Hannas' controversial thesis relating Asia's supposed technological creativity gap, with the Chinese writing system.
Before joining Ken's department I worked in the Human/Computer Interaction Research Department headed by Candy Kamm. My most recent project in that department was WordsEye, an automatic text-to-scene conversion system. The WordsEye technology is now being developed at Semantic Light, LLC. WordsEye is particularly good for creating surrealistic images that I can easily conceive of but are well beyond my artistic ability to execute. All of the following images were generated from text descriptions of the scene. Click on the images to see the text that generated the scene:
|A brief summary of the results of the recent Kaggle Text Normalization Challenge.|
|Jürgen Trouvain's page with a link to the PDF of our new bilingual edition of Kempelen's Mechanismus der menschlichen Sprache.|
|What the more fortunate of us might do if we get a tax rebate courtesy of the Koch Brothers.||A blog on the Biology & Chemistry Department at Liberty University.|
|An Op-Ed-style essay on the 2014 Association for Computational Linguistics Lifetime Achievement Award winner Robert Mercer, Trump, and why it's a pity the ACL does not have a Code of Ethics (and likely wouldn't apply it if they did.)|
|A new piece "Defending Democracy in an Illiberal Age" by Shalom Lappin and myself.||A short essay about the recent US elections, mostly to get this off my chest.|
|A new paper on our ongoing work on applying Recurrent Neural Networks to the problem of text normalization for speech synthesis.|
|Slides for a presentation on the computational simulation of the early evolution of writing, presented at the second conference on Signs of Writing: The Cultural, Social, and Linguistic Contexts of the World’s First Writing Systems, Beijing, June 2015. Also my paper from the first conference in Chicago, November 2014.|
|A press release from the Linguistic Society of America on my new paper in Language that shows that previously published statistical claims about Indus Valley Symbols and Pictish Symbols were wrong.|
|My opinion on Wikipedia entries for living persons in my field (and beyond).|
|Interview on WNYC's New Tech City here. I was trying to explain why no graphical form of communication will ever replace writing. Also a piece that focuses on the discussion of Blissymbolics here.||I am starting a collection of data on Indus Weights here.|
|As of June 1, 2013, I am the new Editor in Chief of the ACM Transactions on Asian Language Information Processing.||See here for a new monograph on statistical analyses of a number of corpora of linguistic and non-linguistic symbol systems.||I was General Chair of InterSpeech 2012.|
|My copy of Wolfgang von Kempelen's book is now online here.||A response to the recent paper in Science by Quentin Atkinson, which has (predictably) received a lot of press. Science does it again, choosing to publish a nice-sounding story that does not stand up to even mildly serious scrutiny.||Update to this page with scans done by the OHSU library of the first 100 pages of my 1791 first edition of Wolfgang von Kempelen's Mechanismus der menschlichen Sprache nebst der Beschreibung seiner sprechenden Maschine.|
|My response to Rao et al's reply to my Computational Linguistics piece (below).||A new paper on the reviewing practices of the general science journals: "Ancient symbols, computational linguistics, and the reviewing practices of the general science journals." Computational Linguistics, 36:3, 2010.||A shocking finding on the relation between literacy and the ratio of male/female births in the population.||A particularly stupid article in Abu Dhabi-based The National on the continuing saga of the statistical "evidence" for the Indus Script thesis prompted me to update this page.|
|Some photos of my recently acquired 1791 first edition of Wolfgang von Kempelen's Mechanismus der menschlichen Sprache nebst der Beschreibung seiner sprechenden Maschine. This is the very first serious study of articulatory phonetics, and the first description of a mechanical speech synthesizer.|
|My two cents' worth on the continuing credulous reports on the Rao et al. work on the Indus symbols.|
|Some musings on the current state of computational linguistics.|
|Slides from Kevin Knight's and my tutorial at NAACL in Boulder.|
|A refutation of the supposed evidence from Rao and colleagues that the Indus Valley symbol system was a script. Also: a simple python program that does a pretty good simulation of Rao et al's results, assuming a Zipfian distribution of characters for a 400-character vocabulary, and conditional independence of the characters. In other words, you get the same behavior as Rao shows for the Indus corpus even with a model that has no syntactic dependence between the glyphs whatsoever. See also Mark Liberman's entry on the Language Log, and Fernando Pereira's skewering of this paper, as well as Science. Here is the letter that we sent to Science, but which they refused to publish. Of course they cited lack of space. But it's very hard to see what kind of letter would be more worthy of space than one that points out a set of fatal flaws in a "peer-reviewed" publication that appeared in Science. Finally, here's a plot that demonstrates, using Rao et al's technique, that European heraldry is a linguistic symbol system. It also seems to show that Amharic (a semitic language) is closely related to the Dravidian language Tamil.|
|General talk about computational linguistics at ACM Reflections/Projections 2008.|
|2008 Johns Hopkins CLSP Summer Workshop on Multilingual Spoken Term Detection. Final slides.|
|Talk on evolutionary modeling of morphology in UIUC Linguistics Seminar, May 1, 2008. Same talk at QITL-3 (Helsinki), and the Max Planck Institute for Evolutionary Anthropology (Leipzig) here.|
|My recent visit to the Creation Museum in Kentucky.|
|Talk on the Phaistos Disk in the September 6, 2007, Linguistics Seminar at UIUC.|
|Co-organizer (with Steve Farmer) of a workshop on Scripts, Non-scripts and (Pseudo)-decipherment, July 11 2007, to be held in conjunction with the 2007 LSA Summer Institute at Stanford University.|
|Guest lecture on WordsEye in LING 588, Spring 2007.|
|In Spring 2007 I am teaching a new, and I believe unique, course entitled Language, Technology and Society (LING270). The course covers language-related technology from the earliest writing systems all the way up to modern speech and language processing. It also explores the social implications of some of these technologies.|
|I was technical co-chair (with Yuji Matsumoto) of the 21st International Conference on Computer Processing of Oriental Languages, December 17--19, 2006.|
|I was co-chair (with Dan Roth) of the Third Midwest Computational Linguistics Colloquium.|
|Presentation at SALA 25.|
|Guidance for researchers contemplating doing joint research with colleagues in India.|
|Keynote address at Second Midwest Computational Linguistics Colloquium.|
|Slides for my talk for the April 22, 2005, Beckman Institute Director's seminar.|
|See Shalom Lappin's and my challenge to the Minimalist Program.|
|Slides from my tutorial, with Tim Buckwalter, at the Arabic Linguistics Symposium, April 3, 2005, UIUC.|
|A travelog from India.|
article with Steve Farmer and Michael Witzel in the Electronic
Journal of Vedic Studies, 11(2), 2004, argues that the
so-called Indus Valley script was not a writing system at all.
The December 17, 2004 issue of Science ran a feature on our work.
Evidently this article has caused a bit of a stir in some circles. See the related challenge (worth $10,000!!) to prove that I'm an idiot.