Your organization can use AutoTranscribe to transcribe voice interactions between contact center agents and their customers, supporting various use cases including analysis, coaching, and quality management.

ASAPP AutoTranscribe is a streaming speech-to-text transcription service that works with both live streams and audio recordings of completed calls. Integrating your voice system with GenerativeAgent using the AutoTranscribe Websocket enables real-time communication, allowing for seamless interaction between your voice platform and GenerativeAgent’s services.

AutoTranscribe is powered by a speech recognition model that transforms spoken form to written forms in real-time, including punctuation and capitalization. The model can be customized to support domain-specific needs by training on historical call audio and adding custom vocabulary to further boost recognition accuracy.

How it works

  1. Create SSE Stream: The Event Handler (which may exist on the IVR or be a dedicated service) creates a Server-Sent Events (SSE) stream with GenerativeAgent.

  2. Audio Stream: The IVR sends the audio stream from the end user to AutoTranscribe.

  3. Create Conversation: The IVR creates a conversation and adds messages to the Conversation Data.

  4. Request Analysis: The IVR requests GenerativeAgent to analyze the conversation.

The Event Handler then handles events sent via SSE, including GenerativeAgent’s reply, which is sent back to the user through the IVR.

Benefits of using Websocket to Stream events

  • Persistent connection between your voice system and the GenerativeAgent server
  • API streaming for audio, call signaling, and returned transcripts
  • Real-time data exchange for quick responses and efficient handling of user queries
  • Bi-directional communication for smooth and responsive interaction

Before you Begin

Before you start integrating to GenerativeAgent, you need to:

Implementation Steps

  1. Authenticate with ASAPP
  2. Listen and Handle GenerativeAgent Events
  3. Open a Connection
  4. Start an Audio Stream
  5. Send the Audio Stream
  6. Analyze the conversation with GenerativeAgent
  7. Stop the Audio Stream

Step 1: Authenticate with ASAPP

All requests to ASAPP sandbox and production APIs must use HTTPS protocol. Traffic using HTTP will not be redirected to HTTPS.

Use the following HTTPS REST API to authenticate with the ASAPP API Gateway:

POST /autotranscribe/v1/streaming-url [OpenAPI↗]

Headers:

{
    "asapp-api-id": "<asapp provided api id>",
    "asapp-api-secret": "<asapp provided api secret>"
}

Request body:

{
    "externalId": "<unique conversation id>"
}

If authentication succeeds, you’ll receive a secure WebSocket short-lived access URL (TTL: 5 minutes):

{
    "streamingUrl": "<short-lived access URL>"
}

Step 2: Listen and Handle GenerativeAgent Events

GenerativeAgent sends events for all conversations through a single Server-Sent-Event (SSE) stream. Listen and handle these events to enable GenerativeAgent interaction with your users.

Step 3: Open a Connection

Create the WebSocket connection using the access URL:

wss://<internal-voice-gateway-ingress>?token=<short_lived_access_token>

Step 4: Start a stream audio message

Start streaming audio into the AutoTranscribe Websocket using this message sequence:

Your Stream RequestASAPP Response
startStream messagestartResponse message
Stream audio - audio-intranscript message
finishStream messagefinalResponse message

Format WebSocket protocol request messages as text (UTF-8 encoded string data); only the audio stream should be in binary format. All response messages will be formatted as text.

Send a startStream message:

{
   "message":"startStream",
   "sender": {
          "role": "customer",
          "externalId": "JD232442"
   }
}

You’ll receive a startResponse:

{
   "message": "startResponse",
   "streamID": "128342213",
   "status": {
          "code": "1000",
          "description": "OK"
   }
}

Step 5: Send the audio stream

Stream audio as binary data:

ws.send(<binary_blob>)

You’ll receive transcript messages:

{
   "message": "transcript",
   "start": 0,
   "end": 1000,
   "utterance":
   [
      {"text": "Hi, my ID is 123."}
   ]
}

Step 6: Analyze conversations with GenerativeAgent

Call the /analyze endpoint to evaluate the conversation:

curl -X POST 'https://api.sandbox.asapp.com/generativeagent/v1/analyze' \
--header 'asapp-api-id: <API KEY ID>' \
--header 'asapp-api-secret: <API TOKEN>' \
--header 'Content-Type: application/json' \
--data '{
    "conversationId": "01HNE48VMKNZ0B0SG3CEFV24WM"
}'

You can also include a message when calling analyze:

curl -X POST 'https://api.sandbox.asapp.com/generativeagent/v1/analyze' \
--header 'asapp-api-id: <API KEY ID>' \
--header 'asapp-api-secret: <API TOKEN>' \
--header 'Content-Type: application/json' \
--data '{
    "conversationId": "01HNE48VMKNZ0B0SG3CEFV24WM",
    "message": {
        "text": "hello, can I see my bill?",
        "sender": {
            "externalId": "321",
            "role": "customer"
        },
        "timestamp": "2024-01-23T11:50:50Z"
    }
}'

Step 7: Stop the streaming audio message

Send a finishStream message:

{
   "message": "finishStream"
}

You’ll receive a finalResponse:

{
   "message": "finalResponse",
   "streamId": "128342213",
   "status": {
       "code": "1000",
       "description": "OK"
   },
   "summary": {
       "totalAudioBytes": 300,
       "audioDurationMs": 6000,
       "streamingSeconds": 6,
       "transcripts": 10
   }
}

Next Steps

With your system integrated into GenerativeAgent, you’re ready to use it. You may find these other pages helpful: