Select Page
Amazon to the rescue with TTS

Amazon to the rescue with TTS

2 weeks ago I published a blog post about Microsoft shutting down their Data Market place with a deadline of March 25th 2017, leaving many smart home owners with almost no options for having a somehow decent quality Text-to-Speech (TTS) event notification and announcement offering for their smart homes.

To make things worse, Microsoft is now redirecting all customers from http://datamarket.azure.com to their standard azure website. As an existing user of their Translation to Text engine you will try to find your existing service and your authorization keys with no luck. All their links on their azure website will try to make you sign up for a new Azure account with a $200 credit… unless you have the old URL/Link available for your service, which is https://datamarket.azure.com/dataset/bing/microsofttranslator or alternatively you can use https://datamarket.azure.com/account/.

As an end user I have to say, that this kind of customer handling is unacceptable especially after Microsoft emailed every customer, that their access will be available until March 25th 2017 and this was even stated on their old data market place website in a top banner. I posted screenshots about those in my previous blog here http://homeautomation.expert/azure-datamarket-shutdown.

With all this uncertainty about the future of Text-to-Speech (TTS) for smart home owners Amazon announced yesterday the release of their new service called “Amazon Polly” https://aws.amazon.com/polly/.

“Amazon Polly is a service that turns text into lifelike speech. Polly lets you create applications that talk, enabling you to build entirely new categories of speech-enabled products. Polly is an Amazon AI service that uses advanced deep learning technologies to synthesize speech that sounds like a human voice. Polly includes 47 lifelike voices spread across 24 languages, so you can select the ideal voice and build speech-enabled applications that work in many different countries.”

Here is an example of the quality of Amazon Polly.

Amazon Polly is offered under the Amazon Free Tier concept for 12 months free of charge, from the day an end user creates his/her AWS account. Under the Free Tier account an end user can submit up to 5.000.000 characters per month. After the Free Tier trial period has ended the end user receives 1.000.000 characters per month for the price of $4.00 per month.

Let’s compare the currently active Microsoft TTS and the Amazon Polly service, despite Microsoft is shutting down their Data Market place and moving this feature under their “Cognitive Services accounts” category in Azure, currently available under preview only with no pricing information ,unless you sign up for an Azure account:

Amazon also provides example use cases enabling end users to estimate, how many characters certain voice tasks will consume. The examples range from number of requests with number of characters per request, emails, book examples and news articles. For this exercise, I examined a typical standard smart home usage using the following formula:

~50 characters per request x 14 requests per hour x 24 hours per day x 30 days per month = 504.000 characters / month

Those numbers are average numbers over the duration of 1 year normalized. A smart home owner would have to double the amount of requests or the length of the announcements to overcome the 1.000.000 character barrier into the next price range of Amazon Polly.

Efficiency

The other important aspect of comparing those two Text-to-Speech (TTS) services is their efficiency. By efficiency the aspect of file size and transfer time is important.

The example voice output above consumes 48kb using Amazon Polly. The same text synthesized using the Microsoft TTS engine consumes 142kb. Taking into account the time to upload the text to be synthesized, the amount of time it takes to actually synthesize this text into a voice output and then pushing it back to the end user, will be impacted by the file size and amount of characters.

Both engines allow the output to be defined in terms of the file format, while the most commonly used output is and will continue to be .mp3 in terms of smart home usage from a compatibility perspective.

Amazon offers a comprehensive tutorial about Polly and code examples using Python, IOS and Android. Microsoft offers examples for Ajax, Soap and HTTP. For both TTS services the end user has to create credentials to use the actual service. For Microsoft the end user creates a client ID and a client secret, which will be used to authenticate the application/end user.

With Amazon the security model is much more sophisticated. Identity and Access Management (IAM) is being used with Amazon, where the end user has a root account, which can be and should be protected with multi-factor authentication. From there the end user can create various users and groups, which can actually use the Amazon Polly service.

The actual Polly service offers two groups per default. The Full access and Read Only access group policies and those can be assigned to user accounts to user the Amazon Polly service utilizing the signature version 4 Test Suite from Amazon for the signing process.

One more important item to mention is that Amazon Polly supports Speech Synthesis Markup Language (SSML). Amazon Polly generates speech from both plain text input and Speech Synthesis Markup Language (SSML) documents that conform to SSML version 1.1. Using SSML tags, you can customize and control aspects of speech such as pronunciation, volume, and speech rate as defined in the W3C recommendation https://www.w3.org/TR/2010/REC-speech-synthesis11-20100907/.

In summary… Amazon released their Text-to-Speech (TTS) service Amazon Polly at the right time, offering superior efficiency in terms of response time and file size, while being 2.5x more cost efficient than Microsoft’s Text-to-Speech (TTS) service today.

First integration attempts into smart home hubs are already in progress e.g. LUA code sharing within 48 hours of Amazon releasing Polly.

HomeAutomation.Expert

Disclaimer: This blog and tweets represent my own view points and not of my employer, Amazon Web Services.

Azure DataMarket shutdown

Azure DataMarket shutdown

Website screenshot
Shutdown Email

Microsoft announced the shutdown of their datamarket place as you can see in that email. One of their services being used in smart home deployment is their TTS (Text to Speech) service allowing smart homes to announce events using voice options in different languages and genders.

This service was free of charge for up to 2.000.000 characters, which was more than enough for the most common smart homes. Anything beyond 2M was reasonably priced, if needed.

This Microsoft TTS service became very popular when Google implemented CAPTCHA (a program or system intended to distinguish human from machine input, typically as a way of thwarting spam and automated extraction of data from websites) resulting in no longer having the capability to announce events using voice in smart homes from Google.

There are other options like Mary TTS, FreeTTS, Acapela, etc, where you can install a local TTS server at your home to replace a cloud based TTS service. However, not everybody has the skills and knowledge to install and maintain a local TTS server. The benefits of having a local TTS server are being independent and even if your internet connectivity is down, you still get voice announcements for your smart home events.

VoiceRSS is another cloud based option offering up to 350 requests per day at no cost. With an average of ~45 characters per request x 350 requests per day x 30 days per month = ~500.000 characters compared to Microsoft’s 2.000.000 characters per month service.

However, quality of voice is another aspect to consider. There are plenty of TTS services out there and THE biggest complaint about those is the robotic sound of those voices or even worse not being able to understand sentences, while understanding single words. This is a huge challenge, as you want a smart home to sound like a smart home and not like a robot from the 70s.

This will be an interesting market to watch and more options will arise in the future, but for now people are looking for alternatives to Microsoft’s TTS service given that it is being shut down March 31st 2017.

x  Powerful Protection for WordPress, from Shield Security
This Site Is Protected By
Shield Security
Verified by ExactMetrics