Next: TAGS AND ATTRIBUTES Up: SABLE: A STANDARD FOR Previous: SABLE: A STANDARD FOR

INTRODUCTION

There is an ever increasing demand for speech synthesis (TTS) technology in various applications including e-mail reading, information access over the web, tutorial and language-teaching applications, and in assistive technology for users with various handicaps. Invariably, an application that was developed with a particular TTS system A cannot be ported, without a fair amount of additional work, to a new TTS system B, for the simple reason that the tag set used to control system A is completely different from those used to control system B. The large variety of tagsets used by TTS systems are thus a problem for the expanded use of this technology since developers are often unwilling to expend effort porting their applications to a new TTS system, even if the new system in question is of demonstrably higher quality than the one they are currently using.

SABLE is an XML (Extensible Markup Language)/SGML (Standard Generalized Markup Language)-based [2,1] markup scheme for text-to-speech synthesis, developed to address the need for a common TTS control paradigm. SABLE is based in part on two previous proposals by a subset of the present authors: the Spoken Text Markup Language (STML -- [4]; and see also [6] for an even earlier proposal -- SSML) and the Java Speech Markup Language (JSML -- [5]).

The SABLE markup language is being developed with the following goals in mind:

Synthesizer control: SABLE enables markup of TTS text input, for improving the quality and appropriateness of speech output.
Multilinguality: the tagset should be appropriate for any language.
Ease of use: specialized knowledge of TTS or linguistics should not be required, though users with such experience should be able to apply their knowledge.
Portability: SABLE provides application developers with a consistent mechanism for controlling synthesizers from different companies and on different computing platforms.
Extensibility: SABLE includes a mechanism for non-standard extensions, so it can evolve to support new features in future releases. To encourage research, SABLE allows individual synthesizers to support enhanced features without compromising the portability of SABLE text.

SABLE, like its predecessors, supports two kinds of markup: the first - - termed text description in STML, and structural elements in JSML - marks properties of the text structure that are relevant for rendering a document in speech. In the current version of SABLE, text description is handled by the DIV tag, whose attribute TYPE may be set to such values as sentence, paragraph or even stanza; and by SAYAS, which marks the function of the contained region (e.g. as a date, an e-mail address, a mathematical expression, etc.), and thereby gives hints on how to pronounce the contained region. The second kind of markup - STML's speaker directives or JSML's production elements - control various aspects of how the speech is to be produced. Falling into this latter category are tags such as: EMPH (marks levels of emphasis); PITCH (sets intonational properties); RATE (sets speech rate); and PRON (provides pronunciations as phonemic strings).

In both its generality and its coverage, SABLE has many advantages over existing markups such as Microsoft's SAPI [3], or Apple's Speech Manager control set. Whereas the syntax of other schemes is typically ad hoc, SABLE's is based on XML/SGML, a widely-used standard. Secondly, SAPI and other markup schemes provide tags only for speaker directives, not for text description. Text-description information, for example, that a particular boundary in a text corresponds to the end of a line in a table (e.g., <DIV TYPE="x-tl">), can in principle be used by a TTS system to advantage to produce reasonable speech output that marks auditorily the presence of that boundary. One does not necessarily want to have to instruct the synthesizer to use a particular intonation pattern, or to implement the break in a particular fashion: one might prefer simply to mark the presence of the boundary in an abstract way, and assume that the system will do something reasonable with that information. Text-description is explicitly designed to allow that kind of abstract specification.

Next: TAGS AND ATTRIBUTES Up: SABLE: A STANDARD FOR Previous: SABLE: A STANDARD FOR

Richard Sproat
1998-11-16