next up previous
Next: TAGS AND ATTRIBUTES Up: SABLE: A STANDARD FOR Previous: SABLE: A STANDARD FOR

   
INTRODUCTION

There is an ever increasing demand for speech synthesis (TTS) technology in various applications including e-mail reading, information access over the web, tutorial and language-teaching applications, and in assistive technology for users with various handicaps. Invariably, an application that was developed with a particular TTS system A cannot be ported, without a fair amount of additional work, to a new TTS system B, for the simple reason that the tag set used to control system A is completely different from those used to control system B. The large variety of tagsets used by TTS systems are thus a problem for the expanded use of this technology since developers are often unwilling to expend effort porting their applications to a new TTS system, even if the new system in question is of demonstrably higher quality than the one they are currently using.[*] 

SABLE is an XML (Extensible Markup Language)/SGML (Standard Generalized Markup Language)-based [2,1] markup scheme for text-to-speech synthesis, developed to address the need for a common TTS control paradigm. SABLE is based in part on two previous proposals by a subset of the present authors: the Spoken Text Markup Language (STML -- [4]; and see also [6] for an even earlier proposal -- SSML) and the Java Speech Markup Language (JSML -- [5]).

The SABLE markup language is being developed with the following goals in mind:

SABLE, like its predecessors, supports two kinds of markup: the first - - termed text description in STML, and structural elements in JSML - marks properties of the text structure that are relevant for rendering a document in speech. In the current version of SABLE, text description is handled by the DIV tag, whose attribute TYPE may be set to such values as sentence, paragraph or even stanza; and by SAYAS, which marks the function of the contained region (e.g. as a date, an e-mail address, a mathematical expression, etc.), and thereby gives hints on how to pronounce the contained region. The second kind of markup - STML's speaker directives or JSML's production elements - control various aspects of how the speech is to be produced. Falling into this latter category are tags such as: EMPH (marks levels of emphasis); PITCH (sets intonational properties); RATE (sets speech rate); and PRON (provides pronunciations as phonemic strings).

In both its generality and its coverage, SABLE has many advantages over existing markups such as Microsoft's SAPI [3], or Apple's Speech Manager control set. Whereas the syntax of other schemes is typically ad hoc, SABLE's is based on XML/SGML, a widely-used standard. Secondly, SAPI and other markup schemes provide tags only for speaker directives, not for text description. Text-description information, for example, that a particular boundary in a text corresponds to the end of a line in a table (e.g., <DIV TYPE="x-tl">), can in principle be used by a TTS system to advantage to produce reasonable speech output that marks auditorily the presence of that boundary. One does not necessarily want to have to instruct the synthesizer to use a particular intonation pattern, or to implement the break in a particular fashion: one might prefer simply to mark the presence of the boundary in an abstract way, and assume that the system will do something reasonable with that information. Text-description is explicitly designed to allow that kind of abstract specification.


next up previous
Next: TAGS AND ATTRIBUTES Up: SABLE: A STANDARD FOR Previous: SABLE: A STANDARD FOR
Richard Sproat
1998-11-16