|
Hardware:
Software:
· tech
|
DHVANI: Technical Description
The Phonetics-to-Speech Engine works by diphone-concatenation. It uses a database
of about800basic sounds, which are pitch-marked. All
the engine does is to read the phonetic description, identify the appropriate
diphones, concatenate them at pitch-marks, and play out the resulting
signal.To reduce the database size, we use an open-source implementation
of the GSM 06.10 RPELTP compression standard.This
reduces the database size to about1MB (even though it appears to be 2MB
due to fragmentation).All basic sounds are recorded at 16000Hz as 16 bit
samples.The phonetic description mentioned above is described in the
README files, which are part of the distribution.
The Text-to-Phonetics routine for
Kannada/Hindi reads in a text in UTF-8format and converts this text
into the above mentioned phonetic description.Indian
Languages are, by and large, phonetic is nature, so this task does not
present major challenges. Hindi, however, turns out to be an exception (why is
karna, i.e. to do, pronounced karna and not karana?). The Hindi routine uses a
new algorithm(described in the distribution) to determine where this
implicit vowel occurs; this algorithm works correctly for all basic words but
goes wrong on compound words, like DevNagar (which it would pronounce as
Devangar).It is not clear whether there is any algorithm at all which
will handle compound words without recourse to a lexicon. Kannada, on the other
hand, poses no such problems and is relatively straightforward.
Sample UTF-8 files for Hindi, Kannada, and phonetic
demo files for Hindi, Kannada, Tamil are included with the
distribution.
For further information,
contact Ramesh Hariharan (ramesh@csa.iisc.ernet.in)
|