Live Captions tutorial#

Source code

This tutorial is a simple variation of the JavaScript client tutorial, adding live captions thanks to the use of OpenVidu Live Captions service.

Running this tutorial#

1. Run OpenVidu Server#

Run OpenVidu locallyDeploy OpenVidu

Download OpenVidu

git clone https://github.com/OpenVidu/openvidu-local-deployment -b 3.3.0

Configure the local deployment

Windows macOS Linux

cd openvidu-local-deployment/community
.\configure_lan_private_ip_windows.bat

cd openvidu-local-deployment/community
./configure_lan_private_ip_macos.sh

cd openvidu-local-deployment/community
./configure_lan_private_ip_linux.sh

Enable the Speech Processing agent

Modify file openvidu-local-deployment/community/agent-speech-processing.yaml to enable the Speech Processing agent. At least you need to set the following properties:
```
enabled: true

live_captions:

    processing: automatic

    provider: YOUR_SPEECH_PROVIDER

    # Followed by your provider specific configuration
```
Info

Visit Supported AI providers for more information about the available providers and their specific configuration. Many of them provide a free tier, so you can quickly test them without any cost!
Run OpenVidu
```
docker compose up
```

To use a production-ready OpenVidu deployment, visit the official deployment guide.

Enable the Live Captions service

Once your deployment is up and running, enable the Live Captions service following the official instructions.

2. Download the tutorial code#

git clone https://github.com/OpenVidu/openvidu-livekit-tutorials.git -b 3.3.0

3. Run a server application#

Node.js Go Ruby Java Python Rust PHP .NET

To run this server application, you need Node.js installed on your device.

Navigate into the server directory

cd openvidu-livekit-tutorials/application-server/node

Install dependencies
```
npm install
```
Run the application
```
npm start
```

For more information, check the Node.js tutorial.

To run this server application, you need Go installed on your device.

Navigate into the server directory

cd openvidu-livekit-tutorials/application-server/go

Run the application
```
go run main.go
```

For more information, check the Go tutorial.

To run this server application, you need Ruby installed on your device.

Navigate into the server directory

cd openvidu-livekit-tutorials/application-server/ruby

Install dependencies
```
bundle install
```
Run the application
```
ruby app.rb
```

For more information, check the Ruby tutorial.

To run this server application, you need Java and Maven installed on your device.

Navigate into the server directory

cd openvidu-livekit-tutorials/application-server/java

Run the application
```
mvn spring-boot:run
```

For more information, check the Java tutorial.

To run this server application, you need Python 3 installed on your device.

Navigate into the server directory

cd openvidu-livekit-tutorials/application-server/python

Create a python virtual environment
```
python -m venv venv
```

Activate the virtual environment

Windows macOS Linux

.\venv\Scripts\activate

. ./venv/bin/activate

. ./venv/bin/activate

Install dependencies
```
pip install -r requirements.txt
```
Run the application
```
python app.py
```

For more information, check the Python tutorial.

To run this server application, you need Rust installed on your device.

Navigate into the server directory

cd openvidu-livekit-tutorials/application-server/rust

Run the application
```
cargo run
```

For more information, check the Rust tutorial.

To run this server application, you need PHP and Composer installed on your device.

Navigate into the server directory

cd openvidu-livekit-tutorials/application-server/php

Install dependencies
```
composer install
```
Run the application
```
composer start
```

Warning

LiveKit PHP SDK requires library BCMath. This is available out-of-the-box in PHP for Windows, but a manual installation might be necessary in other OS. Run sudo apt install php-bcmath or sudo yum install php-bcmath

For more information, check the PHP tutorial.

To run this server application, you need .NET installed on your device.

Navigate into the server directory

cd openvidu-livekit-tutorials/application-server/dotnet

Run the application
```
dotnet run
```

Warning

This .NET server application needs the LIVEKIT_API_SECRET env variable to be at least 32 characters long. Make sure to update it here and in your OpenVidu Server.

For more information, check the .NET tutorial.

4. Run the client application#

To run the client application tutorial, you need an HTTP web server installed on your development computer. A great option is http-server. You can install it via NPM:

npm install -g http-server

Navigate into the application client directory:

cd openvidu-livekit-tutorials/ai-services/openvidu-live-captions

Serve the application:
```
http-server -p 5080 ./src
```

Once the server is up and running, you can test the application by visiting http://localhost:5080. You should see a screen like this:

Accessing your application client from other devices in your local network

One advantage of running OpenVidu locally is that you can test your application client with other devices in your local network very easily without worrying about SSL certificates.

Access your application client through https://xxx-yyy-zzz-www.openvidu-local.dev:5443, where xxx-yyy-zzz-www part of the domain is your LAN private IP address with dashes (-) instead of dots (.). For more information, see section Accessing your local deployment from other devices on your network.

Understanding the code#

You can first take a look at the JavaScript client tutorial, as this application shares the same codebase. The only thing added by this tutorial is a new handler for the Room object to receive transcription messages and display them as live captions in the HTML:

app.js
room.registerTextStreamHandler("lk.transcription", async (reader, participantInfo) => { // (1)!
    const message = await reader.readAll(); // (2)!
    const isFinal = reader.info.attributes["lk.transcription_final"] === "true"; // (3)!
    const trackId = reader.info.attributes["lk.transcribed_track_id"]; // (4)!

    if (isFinal) {
      // Due to a bug in LiveKit Server the participantInfo object may be empty.
      // You can still get the participant owning the audio track like below:
      let participant;
      if (localParticipant.audioTrackPublications.has(trackId)) {
        participant = room.localParticipant;
      } else {
        participant = room.remoteParticipants.values().find(p => p.audioTrackPublications.has(trackId));
      }

      const captionsTextarea = document.getElementById("captions"); // (5)!
      const timestamp = new Date().toLocaleTimeString();
      const participantIdentity =
        participant == room.localParticipant ? "You" : participant.identity;
      captionsTextarea.value += `[${timestamp}] ${participantIdentity}: ${message}\n`;
      captionsTextarea.scrollTop = captionsTextarea.scrollHeight;
    }
  }
);

Use method Room.registerTextStreamHandler to register a handler on topic lk.transcription. Transcription messages will arrive to this handler.
Await each transcription message.
Read attribute lk.transcription_final to determine if the transcription message is a final or an interim one. See Final vs Interim transcriptions.
Read attribute lk.transcribed_track_id to know which specific audio track has been transcribed.
Build your live caption message as desired and append it to the HTML.

Using method Room.registerTextStreamHandler we subscribe to topic lk.transcription. All transcription messages will arrive to this handler.

Apart from the message itself (which you get by awaiting method reader.readAll()) there are two main attributes in the transcription message (which you can access via reader.info.attributes):

lk.transcription_final: Indicates whether the transcription message is final or interim. See Final vs Interim transcriptions for more details.
lk.transcribed_track_id: The ID of the audio track that has been transcribed. This is useful to know which specific participant's audio track has been transcribed, if necessary.

Once you have all the information about the transcription message, you can build your live caption text as desired and display it in the HTML (in this case, using a simple <textarea> element).