Connecting to a Speech Server

The Speech Kit library is a network service and requires some basic setup before you can use either the recognition or text-to-speech classes.

This setup performs two primary operations:

  • First, it identifies and authorizes your application.

  • Second, it optionally establishes a connection to the speech server immediately, allowing for fast initial speech requests and thus enhancing the user experience.

    Note

    This network connection requires authorization credentials and server details set by the developer. The necessary credentials are provided through the Dragon Mobile SDK portal at http://nuancemobiledeveloper.com.

Speech Kit Setup

The application key SpeechKitApplicationKey is required by the Speech Kit library and must be set by the developer. This key is effectively your application’s password for the speech server and should be kept secret to prevent misuse.

Your unique credentials, provided through the developer portal, include the necessary line of code to set this value. Thus, this process is as simple as copying and pasting the line into your source file. You must set this key before you initialize the Speech Kit system. For example, you configure the application key as follows:

static final byte[] SpeechKitApplicationKey = {(byte)0x12, (byte)0x34, ..., (byte)0x89};

The setup method, SpeechKit.initialize(), takes eight parameters:

  • An application Context (Android.content.Context)
  • An application identifier
  • A server address
  • A port
  • The SSL settings (use SSL, certificate summary, certificate data)
  • The application key defined above.

The appContext parameter is used to determine application level information such as the state of the network. It can be defined with code such as:

Context context = getApplication().getApplicationContext();

The ID parameter identifies your application and is used in conjunction with your key to provide authorization to the speech server.

The host and port parameters define the speech server, which may differ from application to application. Therefore, you should always use the values provided with your authentication parameters.

The ssl parameter indicates whether to connect to the speech server using SSL. The specified server and port must support the given SSL setting, or else a connection error will occur. When using an SSL connection, to prevent man-in-the middle (MITM) attacks, you must ensure that host-name verification is performed by setting the certSummary parameter. The value of certSummary (if not null) must exactly match the Common Name (CN) of the SSL certificate. To retrieve Nuance’s SSL certificate Common Name (certSummary) and the actual certificate data (certData), please refer to the Nuance Developers portal. (http://dragonmobile.nuancemobiledeveloper.com)

The applicationKey stores the key that identifies your application to the server.

The library is configured in the following example:

SpeechKit sk = SpeechKit.initialize(context,
                                    speechKitAppId,
                                    speechKitServer,
                                    speechKitPort,
                                    speechKitSsl,
                                    speechKitCertSummary,
                                    speechKitCertData,
                                    speechKitApplicationKey);

Note

This method is meant to be called one time per application execution to configure the underlying network connection. This method does not attempt to establish the connection to the server.

At this point the speech server is fully configured. The connection to the server will be established automatically when needed. To make sure the next recognition or vocalization is as fast as possible, connect to the server in advance using the optional connect method.

sk.connect();

Note

This method does not indicate failure. Instead, the success or failure of the setup is known when the Recognizer and Vocalizer classes are used.

When the connection is opened, it will remain open for some period of time, ensuring that subsequent speech requests are served quickly as long as the user is actively making use of speech. If the connection times out and closes, it will be re-opened automatically on the next speech request or call to connect.

The application is now configured and ready to recognize and synthesize speech.