Skip to content

Live Captions tutorial#

Source code

This tutorial is a simple variation of the JavaScript client tutorial, adding live captions thanks to the use of OpenVidu Live Captions service.

Running this tutorial#

1. Run OpenVidu Server#

  1. Download OpenVidu

    git clone https://github.com/OpenVidu/openvidu-local-deployment -b 3.3.0
    
  2. Configure the local deployment

    cd openvidu-local-deployment/community
    .\configure_lan_private_ip_windows.bat
    
    cd openvidu-local-deployment/community
    ./configure_lan_private_ip_macos.sh
    
    cd openvidu-local-deployment/community
    ./configure_lan_private_ip_linux.sh
    
  3. Enable the Speech Processing agent

    Modify file openvidu-local-deployment/community/agent-speech-processing.yaml to enable the Speech Processing agent. At least you need to set the following properties:

    enabled: true
    
    live_captions:
    
        processing: automatic
    
        provider: YOUR_SPEECH_PROVIDER
    
        # Followed by your provider specific configuration
    

    Info

    Visit Supported AI providers for more information about the available providers and their specific configuration. Many of them provide a free tier, so you can quickly test them without any cost!

  4. Run OpenVidu

    docker compose up
    

To use a production-ready OpenVidu deployment, visit the official deployment guide.

Enable the Live Captions service

Once your deployment is up and running, enable the Live Captions service following the official instructions.

2. Download the tutorial code#

git clone https://github.com/OpenVidu/openvidu-livekit-tutorials.git -b 3.3.0

3. Run a server application#

To run this server application, you need Node.js installed on your device.

  1. Navigate into the server directory
    cd openvidu-livekit-tutorials/application-server/node
    
  2. Install dependencies
    npm install
    
  3. Run the application
    npm start
    

For more information, check the Node.js tutorial.

To run this server application, you need Go installed on your device.

  1. Navigate into the server directory
    cd openvidu-livekit-tutorials/application-server/go
    
  2. Run the application
    go run main.go
    

For more information, check the Go tutorial.

To run this server application, you need Ruby installed on your device.

  1. Navigate into the server directory
    cd openvidu-livekit-tutorials/application-server/ruby
    
  2. Install dependencies
    bundle install
    
  3. Run the application
    ruby app.rb
    

For more information, check the Ruby tutorial.

To run this server application, you need Java and Maven installed on your device.

  1. Navigate into the server directory
    cd openvidu-livekit-tutorials/application-server/java
    
  2. Run the application
    mvn spring-boot:run
    

For more information, check the Java tutorial.

To run this server application, you need Python 3 installed on your device.

  1. Navigate into the server directory

    cd openvidu-livekit-tutorials/application-server/python
    
  2. Create a python virtual environment

    python -m venv venv
    
  3. Activate the virtual environment

    .\venv\Scripts\activate
    
    . ./venv/bin/activate
    
    . ./venv/bin/activate
    
  4. Install dependencies

    pip install -r requirements.txt
    
  5. Run the application

    python app.py
    

For more information, check the Python tutorial.

To run this server application, you need Rust installed on your device.

  1. Navigate into the server directory
    cd openvidu-livekit-tutorials/application-server/rust
    
  2. Run the application
    cargo run
    

For more information, check the Rust tutorial.

To run this server application, you need PHP and Composer installed on your device.

  1. Navigate into the server directory
    cd openvidu-livekit-tutorials/application-server/php
    
  2. Install dependencies
    composer install
    
  3. Run the application
    composer start
    

Warning

LiveKit PHP SDK requires library BCMath. This is available out-of-the-box in PHP for Windows, but a manual installation might be necessary in other OS. Run sudo apt install php-bcmath or sudo yum install php-bcmath

For more information, check the PHP tutorial.

To run this server application, you need .NET installed on your device.

  1. Navigate into the server directory
    cd openvidu-livekit-tutorials/application-server/dotnet
    
  2. Run the application
    dotnet run
    

Warning

This .NET server application needs the LIVEKIT_API_SECRET env variable to be at least 32 characters long. Make sure to update it here and in your OpenVidu Server.

For more information, check the .NET tutorial.

4. Run the client application#

To run the client application tutorial, you need an HTTP web server installed on your development computer. A great option is http-server. You can install it via NPM:

npm install -g http-server
  1. Navigate into the application client directory:

    cd openvidu-livekit-tutorials/ai-services/openvidu-live-captions
    
  2. Serve the application:

    http-server -p 5080 ./src
    

Once the server is up and running, you can test the application by visiting http://localhost:5080. You should see a screen like this:

Accessing your application client from other devices in your local network

One advantage of running OpenVidu locally is that you can test your application client with other devices in your local network very easily without worrying about SSL certificates.

Access your application client through https://xxx-yyy-zzz-www.openvidu-local.dev:5443, where xxx-yyy-zzz-www part of the domain is your LAN private IP address with dashes (-) instead of dots (.). For more information, see section Accessing your local deployment from other devices on your network.

Understanding the code#

You can first take a look at the JavaScript client tutorial, as this application shares the same codebase. The only thing added by this tutorial is a new handler for the Room object to receive transcription messages and display them as live captions in the HTML:

app.js
room.registerTextStreamHandler("lk.transcription", async (reader, participantInfo) => { // (1)!
    const message = await reader.readAll(); // (2)!
    const isFinal = reader.info.attributes["lk.transcription_final"] === "true"; // (3)!
    const trackId = reader.info.attributes["lk.transcribed_track_id"]; // (4)!

    if (isFinal) {
      // Due to a bug in LiveKit Server the participantInfo object may be empty.
      // You can still get the participant owning the audio track like below:
      let participant;
      if (localParticipant.audioTrackPublications.has(trackId)) {
        participant = room.localParticipant;
      } else {
        participant = room.remoteParticipants.values().find(p => p.audioTrackPublications.has(trackId));
      }

      const captionsTextarea = document.getElementById("captions"); // (5)!
      const timestamp = new Date().toLocaleTimeString();
      const participantIdentity =
        participant == room.localParticipant ? "You" : participant.identity;
      captionsTextarea.value += `[${timestamp}] ${participantIdentity}: ${message}\n`;
      captionsTextarea.scrollTop = captionsTextarea.scrollHeight;
    }
  }
);
  1. Use method Room.registerTextStreamHandler to register a handler on topic lk.transcription. Transcription messages will arrive to this handler.
  2. Await each transcription message.
  3. Read attribute lk.transcription_final to determine if the transcription message is a final or an interim one. See Final vs Interim transcriptions.
  4. Read attribute lk.transcribed_track_id to know which specific audio track has been transcribed.
  5. Build your live caption message as desired and append it to the HTML.

Using method Room.registerTextStreamHandler we subscribe to topic lk.transcription. All transcription messages will arrive to this handler.

Apart from the message itself (which you get by awaiting method reader.readAll()) there are two main attributes in the transcription message (which you can access via reader.info.attributes):

  • lk.transcription_final: Indicates whether the transcription message is final or interim. See Final vs Interim transcriptions for more details.
  • lk.transcribed_track_id: The ID of the audio track that has been transcribed. This is useful to know which specific participant's audio track has been transcribed, if necessary.

Once you have all the information about the transcription message, you can build your live caption text as desired and display it in the HTML (in this case, using a simple <textarea> element).