Working With Speech Using the MSAgent in C# - Introducing the Microsoft Agent
(Page 2 of 5 )
The Microsoft Agent API affords services that support the display and animation of animated characters. Microsoft Agent consists of optional support for speech recognition, and as a result, applications can respond to voice commands. Characters can react using synthesized speech, recorded audio, or text in a cartoon word balloon.
Requirements To use the MSAgent technology, we must have:
- The Microsoft Agent Core components.
- The Microsoft Agent Characters Genie, Merlin, Robby, and Peedy.
- The Microsoft Speech API 4.0a runtime.
- The Microsoft Speech Recognition Engine.
- The Lernout and Hauspie Text-to-Speech engines for at least US English.
All of these components are available from
http://microsoft.com/products/msagent/downloads.htm.
Speech Technologies Explained Text-to-speech is the capability of a computer to translate text information into synthetic speech output. Speech recognition is the capability of a computer to recognize the spoken word for the purpose of receiving a command or data input from the speaker.
Speech recognition and text-to-speech make use of engines, which are the programs that do the real work of recognizing speech or playing text. Nearly all speech-recognition engines translate incoming audio data to engine-specific phonemes, which are then interpreted into text that an application can use. (A phoneme is the smallest structural unit of sound that can be used to distinguish one utterance from another in a spoken language.)
There are two types of text-to-speech recognition:
- Synthesized text-to-speech
- Concatenated text-to-speech
Synthesized Text-to-Speech In synthesized speech, the words are examined and produce the phonetic pronunciations for the words. The phonemes are then moved into a complex algorithm that imitates the human vocal tract and produce the sound.
Concatenated Text-to-Speech In concatenated text-to-speech the algorithm studies the text and pulls recordings, words, and phrases out of a pre-recorded library. The digital audio recordings are concatenated (joined together) to form the final result.
Speech Application Programming Interface (API) The Microsoft Speech Application Programming Interface (API) uses the OLE Component Object Model (COM) architecture under Win32 (Windows 95 and Windows NT). Microsoft Agent's architecture uses Microsoft SAPI for synthesized speech output. Microsoft Agent uses the Microsoft Speech Application Programming Interface (SAPI) to support speech input (speech recognition, or SR) and speech output (text-to-speech, or TTS). Microsoft Agent describes interfaces that permit applications to access its services, enabling an application to control the animation of a character, support user input events, and specify output.
The Character Window In Microsoft Agent applications, the animated characters are displayed in their individual windows that always appear at the top of the window z-order. A user can move a character's window by dragging the character with the left mouse button. The character image moves with the pointer.
The Word Balloon In addition to spoken audio output, the character also supports textual captioning in the form of text output in cartoon-style word balloons. Words show in the balloon as they are spoken. The balloon hides from view when spoken output is completed.
Using the Microsoft Agent in Web Pages To use the Microsoft Agent services in a web page, use the HTML <OBJECT> tag within the <HEAD> or <BODY> element of the page, specifying the Microsoft CLSID (class identifier) for the control. In addition, use a CODEBASE parameter to specify the location of the Microsoft Agent installation file and its version number.
We can use VBScript, JavaScript and JScript to program the Microsoft Agent in web pages.
Using the Microsoft Agent With the .NET Framework Microsoft Agent is available as an ActiveX DLL control. To utilize it within .NET, we can use the AxImp.exe utility provided with the .NET Framework SDK:
AxImp -->> ActiveX Control to Win Forms Assembly Generator.
Syntax: AxImp [/? | [[/source] OCXName]]
Aximp agentctl.dll
The above command creates two files namely AxAgentObjects.dll and AgentObjects.dll. With the use of above two files we are now geared up to utilize the Microsoft Agent control with .NET.
Next: Microsoft Agent in C# >>
More C# Articles
More By Gnana Arun Ganesh