Developing a PWA with support for face and voice recognition – InformTFB

Developing a PWA with support for face and voice recognition

Developing a PWA with support for face and voice recognition

This article focuses on advanced features of PWA (Progressive Web Application), based on some modern APIs. Specifically, here we will talk about developing a web project that supports face and voice recognition. What was previously only available in regular apps can now be used in PWA as well. This opens up a lot of new opportunities for web developers.

The application in question is based on a PWA, the development of which is described in detail in this article. Here we will focus on the following two APIs:

  • Face Detection API, which is intended for implementing facial recognition capabilities in the browser.
  • Web Speech APIthat allows you to convert speech to text and “voice” ordinary texts.

We will add support for these APIs to the existing PWA and equip it with the “selfie” creation functionality. Thanks to its facial recognition capabilities, the app will be able to find out the emotional state, gender, and age of the person taking the selfie. You can also add a caption to the image using the Web Speech API.

About working with experimental features of the web platform

The above APIs will only work if you enable the flag in the Google Chrome browser Experimental Web Platform features. You can find it at chrome://flags.

Enabling the experimental Web Platform features flag

Preparation of a draft

Let’s start by cloning the following repository::

git clone https://github.com/petereijgermans11/progressive-web-app

After cloning is complete, go to the project directory:

cd pwa-article/pwa-app-native-features-rendezvous-init

Next, install the dependencies and launch the project:

npm i && npm start

You can open the app by clicking on the link http://localhost:8080.

The app in the browser

Public URL for the app that can be accessed from a mobile device

There are many ways to organize access to localhost:8080from mobile devices. For example, you can use ngrok to do this.

Install ngrok:

npm install -g ngrok

Run the following command in the terminal:

ngrok http 8080

It will return a public URL FOR the project. Now you can open it on a regular mobile phone using the Google Chrome browser.

Face recognition using JavaScript

Face recognition is one of the most common ways to use artificial intelligence technologies. In recent years, there has been an increase in the use of appropriate mechanisms.

Here, we will extend the existing PWA by equipping it with facial recognition capabilities. Moreover, these features will work even in the browser. We will determine the emotional state, gender, and age of a person based on their”selfie”. To solve these problems, we will use the library face-api.js.

This library includes an API designed for organizing face recognition in the browser. At the heart of this API is the library tensorflow.js.

The results of the application may look something like the following figure.

The results of the application

Here’s a step-by-step plan for working on the app’s facial recognition capabilities.

Step 1: library face-api.js

Library face-api.js as already mentioned, it provides the application with an API for organizing face recognition in the browser. This library is already available in our project, it is located in the folder public/src/lib.

Step 2: models

Models are pre-prepared data that we will use to analyze “selfies” and determine the features we are interested in. Models are located in the folder public/src/models.

Step 3: the file index.html

The index.htmlfollowing content is imported in the file:

  • An existing file in the project facedetection.cssthat is used to style the app.
  • A file face-api.min.jsrepresenting the Face Detection API used for processing model data and extracting features of interest from snapshots.
  • A file facedetection.jsin which we will write code that implements the application logic.

index.htmlYou must first import the styles to the file:

<link rel="stylesheet" href="src/css/facedetection.css">

Then , just below the tag<div id="create-post">, put the following code in the file::

<video id="player" autoplay></video>
<div class="container-faceDetection">
</div>
<canvas id="canvas" width="320px" height="240px"></canvas>
<div class="result-container">
   <div id="emotion">Emotion</div>
   <div id="gender">Gender</div>
   <div id="age">Age</div>
</div>

Here we use an existing tag<video>, applying it to create a “selfie”. In the class tagresult-container, we display the results of determining a person’s emotional state, gender, and age.

Next, you need to place the following code snippet at the bottom index.html. This will allow us to use the API for facial recognition:

<script src="src/lib/face-api.min.js"></script>
<script src="src/js/facedetection.js"></script>

Step 4: import models to PWA

Here, we create feed.jsa separate function in an existing file that is designed to start video streaming. Namely, we will move the following code from the function initializeMedia()to the function startVideo()responsible for video streaming:

const startVideo = () => {
   navigator.mediaDevices.getUserMedia({video: {facingMode: 'user'}, audio: false})
       .then(stream => {
           videoPlayer.srcObject = stream;
           videoPlayer.style.display = 'block';
           videoPlayer.setAttribute('autoplay', '');
           videoPlayer.setAttribute('muted', '');
           videoPlayer.setAttribute('playsinline', '');
       })
       .catch(error => {
           console.log(error);
       });
}

In the filefeed.js, we use Promise.allit to asynchronously load models used by the face recognition API. After the models are loaded, we call the newly created functionstartVideo():

Promise.all([
   faceapi.nets.tinyFaceDetector.loadFromUri("/src/models"),
   faceapi.nets.faceLandmark68Net.loadFromUri("/src/models"),
   faceapi.nets.faceRecognitionNet.loadFromUri("/src/models"),
   faceapi.nets.faceExpressionNet.loadFromUri("/src/models"),
   faceapi.nets.ageGenderNet.loadFromUri("/src/models")
]).then(startVideo);

Step 5: implement the project logic in a file facedetection.js

Let’s talk about the features of the Face Detection API that we will use in the app:

  • faceapi.detectSingleFace — this function uses the SSD Mobilenet V1 facial recognition system. Functions pass an object videoPlayerand an object with parameters. In order to set up recognition of multiple facesdetectSingleFace, you need to replace detectAllFaceswith .
  • withFaceLandmarks — this function is used to find 68 key points (landmarks) of the face.
  • withFaceExpressions — this function finds all faces in the image and detects facial expressions, returning the results as an array.
  • withAgeAndGender — this function also finds all faces in the image, determines the age and gender of people, and returns an array.

The following code should be placed in a file facedetection.jsbelow the code that is already there.

videoPlayer.addEventListener("playing", () => {
const canvasForFaceDetection = faceapi.createCanvasFromMedia(videoPlayer);
let containerForFaceDetection = document.querySelector(".container-faceDetection");
containerForFaceDetection.append(canvasForFaceDetection);
const displaySize = { width: 500, height: 500};
faceapi.matchDimensions(canvasForFaceDetection, displaySize);
setInterval(async () => {
   const detections = await faceapi
     .detectSingleFace(videoPlayer, new faceapi.TinyFaceDetectorOptions())
     .withFaceLandmarks()
     .withFaceExpressions()
     .withAgeAndGender();
   const resizedDetections = faceapi.resizeResults(detections, displaySize);
     canvasForFaceDetection.getContext("2d").clearRect(0, 0, 500, 500);
   faceapi.draw.drawDetections(canvasForFaceDetection, resizedDetections);
   faceapi.draw.drawFaceLandmarks(canvasForFaceDetection, resizedDetections);
   if (resizedDetections && Object.keys(resizedDetections).length > 0) {
     const age = resizedDetections.age;
     const interpolatedAge = interpolateAgePredictions(age);
     const gender = resizedDetections.gender;
     const expressions = resizedDetections.expressions;
     const maxValue = Math.max(...Object.values(expressions));
     const emotion = Object.keys(expressions).filter(
       item => expressions[item] === maxValue
     );
     document.getElementById("age").innerText = `Age - ${interpolatedAge}`;
     document.getElementById("gender").innerText = `Gender - ${gender}`;
     document.getElementById("emotion").innerText = `Emotion - ${emotion[0]}`;
   }
}, 100);
});

The functions described above are used here to solve face recognition problems.

Here, first of all, we connect the videoPlayerevent handler playingto it . It is triggered when the video camera is active.

This variable videoPlayergives access to the HTML element <video>. Video materials will be displayed in this element.

Then an element canvasElementis created , which is represented by a constant canvasForFaceDetection. It is used for face recognition. This element is placed in the container faceDetection.

The function setInterval()makes calls faceapi.detectSingleFace with an interval of 100 milliseconds. This function is called asynchronously, using the async/await construct. As a result, the results of face recognition are displayed in the fields with the IDs emotion, genderand age.

JavaScript speech recognition

The interface that we are going to create for working with the Web Speech API is shown below. As you can see, there is a field on the screen where you can enter text. However, you can also dictate this text to the app by using the microphone icon.

Below the input field, there are controls that allow you to select a language.

Interface used for solving speech recognition problems

We will analyze, as before, a step-by-step plan for implementing the relevant features.

Step 1: file index.html

Import to the index.htmlfollowing materials:

  • Existing styles in the project from a file speech.css.
  • A file speech.jswhere we implement the logic required for speech recognition.

First, import the styles:

<link rel="stylesheet" href="src/css/speech.css">

Then we’ll place the following code right after the tag<form>:

<div id="info">
   <p id="info_start">Click on the microphone icon and begin speaking.</p>
   <p id="info_speak_now">Speak now.</p>
   <p id="info_no_speech">No speech was detected. You may need to adjust your
       <a href="//support.google.com/chrome/bin/answer.py?hl=en&answer=1407892">
           microphone settings</a>.</p>
   <p id="info_no_microfoon" style="display:none">
       No microphone was found. Ensure that a microphone is installed and that
       <a href="//support.google.com/chrome/bin/answer.py?hl=en&answer=1407892">
           microphone settings</a> are configured correctly.</p>
   <p id="info_allow">Click the "Allow" button above to enable your microphone.</p>
   <p id="info_denied">Permission to use microphone was denied.</p>
   <p id="info_blocked">Permission to use microphone is blocked. To change,
       go to chrome://settings/contentExceptions#media-stream</p>
   <p id="info_upgrade">Web Speech API is not supported by this browser.
       Upgrade to <a href="//www.google.com/chrome">Chrome</a>
       version 25 or later.</p>
</div>
<div class="right">
   <button id="start_button" onclick="startButton(event)">
       <img id="start_img" src="./src/images/mic.gif" alt="Start"></button>
</div>
<div class="input-section mdl-textfield mdl-js-textfield mdl-textfield--floating-label div_speech_to_text">
   <span id="title" contenteditable="true" class="final"></span>
   <span id="interim_span" class="interim"></span>
   <p>
</div>
<div class="center">
   <p>
   <div id="div_language">
       <select id="select_language" onchange="updateCountry()"></select>
       <select id="select_dialect"></select>
   </div>
</div>

This section <div id ="info">displays informational messages related to the use of the Web Speech API.

onclickThe button event handler with the ID start_buttonis used to start the speech recognition system.

onchangeThe field event handler select_languagelets you select a language.

The following code should be placed at the bottom index.html. It will allow us to use the features of the Web Speech API.

<script src="src/js/speech.js"></script>

▍Step 2: implementing speech recognition capabilities

The code that should be placed in the file is shown below speech.js. It is responsible for initializing the Web Speech Recognition API, the system responsible for speech recognition:

if ('webkitSpeechRecognition' in window) {
    start_button.style.display = 'inline-block';
    recognition = new webkitSpeechRecognition();
    recognition.continuous = true;
    recognition.interimResults = true;
    recognition.onstart = () => {
       recognizing = true;
       showInfo('info_speak_now');
       start_img.src = './src/images/mic-animate.gif';
    };
    recognition.onresult = (event) => {
       let interim_transcript = '';
       for (let i = event.resultIndex; i < event.results.length; ++i) {
          if (event.results[i].isFinal) {
             final_transcript += event.results[i][0].transcript;
          } else {
             interim_transcript += event.results[i][0].transcript;
          }
       }
       final_transcript = capitalize(final_transcript);
       title.innerHTML = linebreak(final_transcript);
       interim_span.innerHTML = linebreak(interim_transcript);
     };
    
    recognition.onerror = (event) => {
     // код обработки ошибок
    };
    recognition.onend = () => {
      // код, выполняемый при завершении распознавания речи
    };
}

Here, first of all, the API is checked for availability webkitSpeechRecognitionin the object window. This object represents a browser window (JavaScript is part of this object).

If webkitSpeechRecognitionb windowis present, the object is created webkitSpeechRecognitionusing the construct recognition = new webkitSpeechRecognition();.

Then you configure the following API properties::

  • recognition.continuous = true — this property allows you to set a continuous return of results for each speech recognition session.
  • recognition.interimResults = true — this property indicates whether to return intermediate speech recognition results.

We use the following event handlers:

  • recognition.onstart — this handler starts when the speech recognition system starts. After that, a text is displayed that prompts the user to start talking (Speak now), and an animated microphone icon is displayed (mic-animate.gif).
  • recognition.onresult — this handler is triggered when speech recognition results are returned. The results are presented as a two-dimensional array SpeechRecognitionResultList. The property isFinalthat is checked in the loop indicates whether the result is final or intermediate. This property transcriptgives access to the string representation of the result.
  • recognition.onend — this handler is executed when the speech recognition operation is completed. When it is executed, no text is displayed. It only replaces the microphone icon with the standard one.
  • recognition.onerror — this handler is called when errors occur. It displays messages about errors that have occurred.

The start of the process of speech recognition

Add the following code that is responsible for starting speech recognition using the button to the top of the filespeech.js:

const startButton = (event) => {
   if (recognizing) {
       recognition.stop();
       return;
   }
   final_transcript = '';
   recognition.lang = select_dialect.value;
   recognition.start();
   ignore_onend = false;
   title.innerHTML = '';
   interim_span.innerHTML = '';
   start_img.src = './src/images/mic-slash.gif';
   showInfo('info_allow');
   start_timestamp = event.timeStamp;
};

Start the speech recognition is carried out using the function recognition.start(). This function calls an event startthat is processed in the event handlerrecognition.onstart(), the code of which is discussed above. Here, in addition, the language selected by the user is transmitted to the speech recognition system. An animated microphone icon is also displayed here.

Results

The web is getting harder and harder every day. The creators of projects designed to work in the browser are getting more and more features that were previously available only to developers of ordinary applications. Among the reasons for this is that the number of web users is much larger than those who use regular applications. It turns out that the features of ordinary applications available in web projects create a familiar environment for those users who are used to such features when working with ordinary applications. As a result, these users don’t need to go back to their regular apps.

In this repository, you can find the full code of the project that we were working on. If you have mastered this material and want to practice working with various PWA features, take a look here.

Valery Radokhleb
Valery Radokhleb
Web developer, designer

Leave a Reply

Your email address will not be published. Required fields are marked *