DHVANI: Technical Description

The Phonetics-to-Speech Engine works by diphone-concatenation. It uses a database of about800basic sounds, which are pitch-marked. All the engine does is to read the phonetic description, identify the appropriate diphones, concatenate them at pitch-marks, and play out the resulting signal.To reduce the database size, we use an open-source implementation of the GSM 06.10 RPE­LTP compression standard.This reduces the database size to about1MB (even though it appears to be 2MB due to fragmentation).All basic sounds are recorded at 16000Hz as 16 bit samples.The phonetic description mentioned above is described in the README files, which are part of the distribution.

The Text-to-Phonetics routine for Kannada/Hindi reads in a text in UTF-8format and converts this text into the above mentioned phonetic description.Indian Languages are, by and large, phonetic is nature, so this task does not present major challenges. Hindi, however, turns out to be an exception (why is karna, i.e. to do, pronounced karna and not karana?). The Hindi routine uses a new algorithm(described in the distribution) to determine where this implicit vowel occurs; this algorithm works correctly for all basic words but goes wrong on compound words, like DevNagar (which it would pronounce as Devangar).It is not clear whether there is any algorithm at all which will handle compound words without recourse to a lexicon. Kannada, on the other hand, poses no such problems and is relatively straightforward.

Sample UTF-8 files for Hindi, Kannada, and phonetic demo files for Hindi, Kannada, Tamil are included with the distribution.

For further information, contact Ramesh Hariharan (

