Add Voice to Your Node.js Application with Amazon Polly

This week has certainly been an exciting one, especially in the cloud IT market! Amazon Web Services (AWS) just recently announced a really cool, new service called Amazon Polly! AWS Polly is a speech synthesis (text-to-speech) service, aimed at software developers and architects, that enables you to generate speech from text, and play it directly inside your application. Does that sound really awesome? If so, then you’re in the right place!

Amazon Polly can be used in many different ways. For example, if you’re writing an automated call management system, or perhaps you’re writing a monitoring application that periodically announces status information, or maybe you’re writing an announcement system for an airport. Who knows, maybe you’re even a hobby developer who’s excited about home automation, and wants an easy way to get information, without looking at a screen! All of these scenarios can benefit from integrating with speech synthesis services, like Amazon Polly, to dynamically generate rich, life-like content.

Speaking of “life-like,” one of the really cool features about Amazon Polly is that you can select from a variety of different voices, including languages such as: English, German, Polish, Italian, French, and others. This helps give some variety and flexibility to the applications that you integrate with Polly. In fact, during my testing, I noticed that if I provided English text to Polly, the non-English voices would still pronounce the English words, just with a heavy accent. This helps to give an element of “realism” to the results that Amazon Polly gives.

Jeff Barr, a Developer Evangelist at AWS, has already published an article that describes how to use the AWS CLI tool to generate an audio file with Polly. Using the AWS CLI is a great way for system administrators and junior developers to integrate Polly with build scripts and systems automation scripts, but it’s not the right interface to build a scalable, enterprise application. In this article, we’re going to take a look at how to integrate AWS Polly with a Node.js application, using the using the AWS SDK. You can access the Node.js SDK documentation and the Developer Guide / API Reference documentation for Amazon Polly.

NOTE: As of this writing, the latest version of the AWS PowerShell module does not seem to support Amazon Polly. To verify this, I updated the AWSPowerShell module to the latest version and ran Get-Command -Module AWSPowerShell -Name polly. Hopefully support for this is added in the future.

Node.js Project Setup

For the rest of this article, we’re going to use Microsoft’s Visual Studio Code (aka. VSCode) editor, a powerful, cross-platform alternative to GitHub’s Atom Editor. Some of the top reasons I love VSCode are as follows:

  • It has a Command Palette, that simplifies nearly every operation you perform
  • It’s extensible, and there are tons of extensions available on the Visual Studio Marketplace
  • You can customize the color theme, and download new ones
  • You can open any folder as a “project”
  • Keyboard shortcuts – essential for productivity, efficiency, flexibility, and user satisfaction
  • The UI scales very nicely, especially useful for high-DPI displays
  • It integrates nicely with version control software (Git only, at present)

I love VSCode for many other reasons, but basically it works awesomely for cross-platform software development teams, and with projects that leverage multiple application languages and frameworks.

Microsoft Visual Studio Code - Amazon AWS Polly - Node.js

Go ahead and run through these boilerplate steps to create a new folder-based project. We’ll build on this base as we go along, in the rest of the article.

  1. Install Visual Studio Code and Node.js
  2. Create an empty folder on your system
  3. In the empty folder, use the npm init command to generate a new package.json file for your Node module
  4. Open the folder / project in Visual Studio Code
  5. In your package.json file, add the AWS SDK to your dependencies: “aws-sdk”: “^2.7.10”
  6. Run npm install to download / install the dependencies

Create AWS IAM User Account + Access Credentials

One of the first things we need to do, in any cloud project, is make sure that we’re authenticated to the cloud platform. In Amazon Web Services, the authentication and authorization service is called Identity and Access Management (IAM). We’ll create a new user account in IAM, and IAM will automatically generate an Access Key ID and Secret Access Key. These are the username/password-like values that we use from various client tools, to access AWS features, including Polly.

  1. Login to the AWS Console
  2. Navigate to the IAM feature
  3. Create a new user account for programmatic access
  4. Attach the AmazonPollyFullAccess built-in policy
  5. Finish creating the user account

IMPORTANT: Make sure that you copy and paste the Access Key ID and Secret Access Key values after creating the new user account. You will need these in order to authenticate from your Node.js application. If you lose these, you’ll have to regenerate them again. I recommend using a password manager, such as LastPass, to keep track of your various AWS IAM credentials.

Amazon AWS - IAM - Add New User

Add a new user with programmatic access

Amazon AWS - IAM - Attach User Policy

AWS IAM User Created

Authenticate to AWS from Node

Now that we’ve generated our access credentials, we need to add them to our project. To do this, we’ll create a simple JSON file that the AWS SDK knows how to read. There are other options for configuring AWS authentication, but I prefer this method, since it keeps everything in the same logical area on the filesystem.

WARNING: These AWS credentials will be included in a separate JSON file that must not be checked into your project’s source control. You should probably add the file to .gitignore, if you’re using Git, to make sure that you don’t accidentally commit the file to a Git repository. If someone obtains your AWS credentials, they may be able to cause a significant amount of harm by running up your AWS account bill, as has happened in the past.

If you practice the least privilege model, as we did in the previous step, by granting limited permissions to user accounts, you can help limit the impact that leaked credentials might otherwise have. Our IAM user account only has access to Amazon Polly, so compromised credentials for this account would not impact any of our other AWS cloud resources. I can’t stress how important this is, especially for production environments.

  1. In Visual Studio Code, add a new file to the root of your project, called awscreds.json. The actual name you give the file is irrelevant, but I will be referring to it by this name, in my example.
  2. Inside the file, define your AWS credentials, that were generated in the previous section of this article. Use the example template below, and fill in your own details.
  3. For the region JSON property, define one of the supported AWS Polly regions
{
 "accessKeyId": "<YourAccessKeyID>",
 "secretAccessKey": "<YourSecretAccessKey>",
 "region": "us-east-2"
}

After creating the JSON file with your AWS IAM credentials, go ahead and create a new Node.js module to handle authentication. We’ll reference this module inside of our main Node.js application file later on.

  1. In the root of your project, create a new file called awsauth.js
  2. Place the contents below into the file. If you customized the name of your JSON credentials file, make sure you change it in this code snippet.
var AWS = require('aws-sdk');

// Load AWS credentials
AWS.config.loadFromPath('./awscreds.json');

VSCode - AWS Polly - awsauth.js

List the Supported AWS Polly Voices

List AWS Polly Voices with Node.jsAs we discussed towards the beginning of the article, one of the cool features in AWS Polly is that you can choose from a variety of voices. While the AWS Polly documentation enumerates a list of them, you can also dynamically obtain a list of voice metadata from the Polly REST API. The SDK method you want to call is named describeVoices().

When you call this API, you’ll receive a JSON array, containing a list of the supported voices. Each voice has a name / ID, which is how you’ll actually identify the desired voice later on (a very easy, nice, and personal interface, in my opinion), and it also specifies the voice’s gender (male or female), and the language name and code (eg. en-US, de-DE).

  1. In the root of your project folder, create a new file called app.js
  2. Add the following text to app.js
var AWS = require('aws-sdk');
require('./awsauth.js');

// Create a new AWS Polly object
var polly = new AWS.Polly();

polly.describeVoices(function (err, data) {
 if (err) console.log(err, err.stack); // an error occurred
 else console.log(data); // successful response
})

After saving the file, go ahead and run node app.js, and you should see some JSON results printed out to your terminal.

node app.js

As you can see from the code above, the Amazon Polly API is pretty straightforward. All we need to do is authenticate to AWS, create a new Polly object, and then start calling the Polly APIs. While the describeVoices() API is not exactly network intensive, as it doesn’t return a large result set, you can filter the results by passing in a LanguageCode that you want to target. To do that, before specifying the callback function, pass in a JSON object with a LanguageCode property. See the example below, and refer to the Amazon Polly Node.js Reference, for a comprehensive list of supported language codes.

polly.describeVoices({ "LanguageCode": "de-DE" }, function (err, data) {
 if (err) console.log(err, err.stack); // an error occurred
 else console.log(data); // successful response
})

Let’s move on to the next step, which is to generate our first audio file using Amazon Polly!

Generate an Audio File with Polly

Now that we’ve chosen an AWS Polly voice to use, let’s actually go out and generate some audio files with it. Generating text-to-speech is done using a method called synthesizeSpeech(). This API allows us to specify the desired voice name / ID, the text that we want to be converted to speech audio, and the output format. There are a few key input parameters that you need to specify, as a JSON object, when you call synthesizeSpeech(). We’ll define those as a JSON object below. Go ahead and add this to app.js.

var params = {
 OutputFormat: 'mp3',               // You can also specify pcm or ogg_vorbis formats.
 Text: 'Good morning, Trevor.',     // This is where you'll specify whatever text you want to render.
 VoiceId: 'Carla'                   // Specify the voice ID / name from the previous step.
};

After defining the input parameters, we’ll define a callback function. This callback will be invoked after the synthesizeSpeech() API completes. The callback function will write the generated data to a file, otherwise the result would simply reside in a Node buffer. The buffer isn’t really useful to us during development and testing.

var synthCallback = function (err, data) {
  if (err) console.log(err, err.stack); // an error occurred
  else console.log(data); // successful response

fs.writeFile('testing.mp3', data.AudioStream, function (err) {
  if (err) { 
  console.log('An error occurred while writing the file.');
  console.log(err);
  }
  console.log('Finished writing the file to the filesystem')
  });
};

As you can see, the callback above references the built-in Node.js filesystem module, aliased as fs. You’ll need to make sure you add a require statement to app.js, at the top, so that we can properly reference this module. Go ahead and add that now.

// Import the built-in Node.js filesystem module
var fs = require('fs');

Now that we’ve defined the input parameters and the callback function, all you need to do is call the synthesizeSpeech() API, passing in the JSON parameter object and the callback function.

// Call the synthesizeSpeech() API, with the user-defined parameters, and write the result to a file
polly.synthesizeSpeech(params, synthCallback);

At this point, save the app.js file, and in your terminal run this command to invoke your application.

node app.js

You should now have an MP3 audio file in the root directory of your project named testing.mp3. Use your favorite MP3 player to listen to this, and make sure it sounds alright. To be honest, the quality of the Amazon Polly results are pretty impressive. During my own testing, I tried out quite a few different phrases, both English and foreign, along with a few different voices, and was pleased with nearly all of the results.

Congratulations, you’ve just used Amazon Polly for your first time!

PowerShell: AWS Polly Results

Play the Audio File with PowerShell

While you can certainly use your operating system’s built-in media player, or an alternative like VLC, to test out the audio file locally, you can actually automate the process of testing out your media files using PowerShell. This simple PowerShell script will play back a media file.

Feel free to customize this script / function for your own needs, however keep in mind that this currently will only work on the Windows platform. The .NET Core media APIs are highly unlikely to work on the PowerShell Core Edition, which works cross-platform on Mac OS X and Linux systems. If you’re using Mac OS X or Linux as your development workstation operating system, then you’ll probably want to substitute this PowerShell script with a different media playback utility, until the .NET Core media APIs mature.

function Invoke-MediaFile {
    [CmdletBinding()]
    param (
        [ValidateScript({ 
            if (!(Test-Path -Path $PSItem)) { throw 'Invalid file path.' }
            else { $true }
        })]
        [string] $Path
    )
    $MediaPlayer = New-Object -TypeName System.Windows.Media.MediaPlayer
    $MediaPlayer.Volume = 1
    $MediaPlayer.Open($Path)
    $MediaPlayer.Play()
    Start-Sleep -Milliseconds 100
    while ($MediaPlayer.Position -lt $MediaPlayer.NaturalDuration.TimeSpan) {
        Start-Sleep -Milliseconds 200
    }
    $MediaPlayer.Close()
    $MediaPlayer = $null
}

Invoke-MediaFile -Path C:\ArtofShell\polly\testing2.mp3

To run this script, simply fire up the PowerShell Integrated Scripting Editor (ISE) on your Windows system, paste the code in, and run it. Of course, you’ll need to modify the fully-qualified filesystem path to the audio file that was generated, unless you’re using the exact same directory structure that I am. Chances are you’re not. 🙂 If PowerShell ISE isn’t your thing, you can also use Visual Studio Code (hey, it’s cross-platform and works with any language!) in conjunction with the open source PowerShell Extension for VSCode.

Scaling Content Generation with Docker

At this point, we have successfully built an application that is capable of dynamic audio content generation, based on textual input. If your application requires a significant amount of text-to-speech translation, then Amazon Polly is ready and able to satisfy your application’s demands. As one of many infinitely scalable cloud services, Amazon Polly scales along with your applications’ needs. In case a single instance of your application is hindered by blocked threads, Docker can help you scale even more. By incorporating your application code into a custom Docker container image, your client code is capable of infinitely scaling along with Amazon’s global infrastructure.

You can use clustering and orchestration tools like Docker Swarm or Kubernetes to build clusters of container hosts on Amazon EC2 cloud infrastructure, or you can use AWS EC2 Container Service (ECS) to further simplify the setup process. Whatever route you choose, I strongly encourage you to investigate Docker containers for all of your application scaling, portability, and consistent, rapid deployment needs.

Are you brand new to Docker? Still not entirely clear on what Docker is, and how it works, and why you’d want to use it? You can read more about Docker container fundamental concepts on my Docker documentation page! If you have further questions, please reach out to me, and I’ll do my best to help.

Conclusion

As you can see from a strong history, Amazon is continually adding new services to serve the needs of their customers. In fact, you can see some of their incredible accomplishments on the AWS 10 year anniversary website. Amazon Polly is the newest of a vast array of application services that software developers, around the world, can take advantage of and innovate with! AWS is the world leader of cloud technologies, and its global data centers are trusted by small start-ups, all the way up to large enterprises, for reliability, cost savings, and its feature-rich offerings.

Thanks for taking the time to read this article about Amazon’s exciting, new Polly service, in conjunction with Node.js. Please feel free to send feedback directly to me at trevor@trevorsullivan.net, and follow me on Twitter @pcgeek86.