Cheomseongdae, Gyeongju, Korea

Cheomseongdae (첨성대/瞻星臺), Gyeongju (경주/慶州), Korea, the 7th century observatory built during the reign of Queen Seondeok (선덕여왕/善德女王), September, 2014.

Richard Sproat     리차드 스프로트
史伯樂     ರಿಚರ್ಡ್ ಸ್ಪ್ರೋಟ್
My name in Nastaliq script     My name in the obviously linguistic Indus script
Research Scientist
Google, Inc.
76 Ninth Ave, 4th Floor
New York, NY 10011
are double you ess at ex oh bee ay dot
com
Click here for other stuff.
A press release from the Linguistic Society of America on my new paper in Language that shows that previously published statistical claims about Indus Valley Symbols and Pictish Symbols were wrong. Data associated with this project can be found here. That page also lists some omissions and errata from the published paper.
 

Background Research Interests Courses Publications Patents Professional Activities Curriculum Vitae

I am a computational linguist (which means that I have some things in common with grapefruit).

I am a Research Scientist at Google in New York.

From January, 2009, through October 2012, I was a professor at the Center for Spoken Language Understanding at the Oregon Health and Science University.

Prior to going to OHSU, I was a professor in the departments of Linguistics and Electrical and Computer Engineering at the University of Illinois at Urbana-Champaign. I was also a full-time faculty member at the Beckman Institute. I still hold adjunct positions in Linguistics and ECE at UIUC.

Research Interests

I have worked recently on several projects, some of which are still current in the sense that my collaborators are still working on them.

Some Previous Research Interests

I am very interested in writing systems; see some work I was doing on approximate string matching in the Easter Island rongorongo script. I also ran (with Jerry Packard) a reading group centered around Hannas' controversial thesis relating Asia's supposed technological creativity gap, with the Chinese writing system.

Prior to Joining Academia

Before joining the faculty at UIUC I worked in the Information Systems and Analysis Research Department headed by Ken Church at AT&T Labs --- Research where I worked on Speech and Text Data Mining: extracting potentially useful information from large speech or text databases using a combination of speech/NLP technology and data mining techniques.

Before joining Ken's department I worked in the Human/Computer Interaction Research Department headed by Candy Kamm. My most recent project in that department was WordsEye, an automatic text-to-scene conversion system. The WordsEye technology is now being developed at Semantic Light, LLC. WordsEye is particularly good for creating surrealistic images that I can easily conceive of but are well beyond my artistic ability to execute. All of the following images were generated from text descriptions of the scene. Click on the images to see the text that generated the scene:

WordsEye Picture: Carpe diem. WordsEye Picture: Rigor mortis. WordsEye Picture:
Umbrella world. WordsEye Picture:
Manic depression.

Prior to joining AT&T Labs in 1999 I worked on Text-to-Speech Synthesis at Bell Labs, Lucent Technologies. Among other things, I was responsible for the multilingual text processing module of the Bell Labs Multilingual TTS System.

My Old Home Page at Bell Labs

Here is my old home page at Bell Labs, which I left in 1999, and which for some reason has never gone away.

More and sometimes less recent Stuff

A press release from the Linguistic Society of America on my new paper in Language that shows that previously published statistical claims about Indus Valley Symbols and Pictish Symbols were wrong.
My opinion on Wikipedia entries for living persons in my field (and beyond).
Interview on WNYC's New Tech City here. I was trying to explain why no graphical form of communication will ever replace writing. Also a piece that focuses on the discussion of Blissymbolics here.
I am starting a collection of data on Indus Weights here.
As of June 1, 2013, I am the new Editor in Chief of the ACM Transactions on Asian Language Information Processing.
See here for a new monograph on statistical analyses of a number of corpora of linguistic and non-linguistic symbol systems.
I was General Chair of InterSpeech 2012.
My copy of Wolfgang von Kempelen's book is now online here.
A response to the recent paper in Science by Quentin Atkinson, which has (predictably) received a lot of press. Science does it again, choosing to publish a nice-sounding story that does not stand up to even mildly serious scrutiny.
Update to this page with scans done by the OHSU library of the first 100 pages of my 1791 first edition of Wolfgang von Kempelen's Mechanismus der menschlichen Sprache nebst der Beschreibung seiner sprechenden Maschine.
My response to Rao et al's reply to my Computational Linguistics piece (below).
A new paper on the reviewing practices of the general science journals: "Ancient symbols, computational linguistics, and the reviewing practices of the general science journals." Computational Linguistics, 36:3, 2010.
A shocking finding on the relation between literacy and the ratio of male/female births in the population.
A particularly stupid article in Abu Dhabi-based The National on the continuing saga of the statistical "evidence" for the Indus Script thesis prompted me to update this page.
Some photos of my recently acquired 1791 first edition of Wolfgang von Kempelen's Mechanismus der menschlichen Sprache nebst der Beschreibung seiner sprechenden Maschine. This is the very first serious study of articulatory phonetics, and the first description of a mechanical speech synthesizer.
My two cents' worth on the continuing credulous reports on the Rao et al. work on the Indus symbols.
Some musings on the current state of computational linguistics.
Slides from Kevin Knight's and my tutorial at NAACL in Boulder.
A refutation of the supposed evidence from Rao and colleagues that the Indus Valley symbol system was a script. Also: a simple python program that does a pretty good simulation of Rao et al's results, assuming a Zipfian distribution of characters for a 400-character vocabulary, and conditional independence of the characters. In other words, you get the same behavior as Rao shows for the Indus corpus even with a model that has no syntactic dependence between the glyphs whatsoever. See also Mark Liberman's entry on the Language Log, and Fernando Pereira's skewering of this paper, as well as Science. Here is the letter that we sent to Science, but which they refused to publish. Of course they cited lack of space. But it's very hard to see what kind of letter would be more worthy of space than one that points out a set of fatal flaws in a "peer-reviewed" publication that appeared in Science. Finally, here's a plot that demonstrates, using Rao et al's technique, that European heraldry is a linguistic symbol system. It also seems to show that Amharic (a semitic language) is closely related to the Dravidian language Tamil.
General talk about computational linguistics at ACM Reflections/Projections 2008.
2008 Johns Hopkins CLSP Summer Workshop on Multilingual Spoken Term Detection. Final slides.
Talk on evolutionary modeling of morphology in UIUC Linguistics Seminar, May 1, 2008. Same talk at QITL-3 (Helsinki), and the Max Planck Institute for Evolutionary Anthropology (Leipzig) here.
My recent visit to the Creation Museum in Kentucky.
Talk on the Phaistos Disk in the September 6, 2007, Linguistics Seminar at UIUC.
Co-organizer (with Steve Farmer) of a workshop on Scripts, Non-scripts and (Pseudo)-decipherment, July 11 2007, to be held in conjunction with the 2007 LSA Summer Institute at Stanford University.
Guest lecture on WordsEye in LING 588, Spring 2007.
In Spring 2007 I am teaching a new, and I believe unique, course entitled Language, Technology and Society (LING270). The course covers language-related technology from the earliest writing systems all the way up to modern speech and language processing. It also explores the social implications of some of these technologies.
I was technical co-chair (with Yuji Matsumoto) of the 21st International Conference on Computer Processing of Oriental Languages, December 17--19, 2006.
I was co-chair (with Dan Roth) of the Third Midwest Computational Linguistics Colloquium.
Presentation at SALA 25.
Guidance for researchers contemplating doing joint research with colleagues in India.
Keynote address at Second Midwest Computational Linguistics Colloquium.
Slides for my talk for the April 22, 2005, Beckman Institute Director's seminar.
See Shalom Lappin's and my challenge to the Minimalist Program.
Slides from my tutorial, with Tim Buckwalter, at the Arabic Linguistics Symposium, April 3, 2005, UIUC.
A travelog from India.
A new article with Steve Farmer and Michael Witzel in the Electronic Journal of Vedic Studies, 11(2), 2004, argues that the so-called Indus Valley script was not a writing system at all. The December 17, 2004 issue of Science ran a feature on our work.

Evidently this article has caused a bit of a stir in some circles. See the related challenge (worth $10,000!!) to prove that I'm an idiot.


Background Research Interests Courses Publications Patents Professional Activities Curriculum Vitae