Apollo 11 Audio on Alexa

When Neil, Buzz, and Michael headed to the moon 50 years ago, it was an almost unimaginable feat. Humankind was walking on a different celestial body. More than 360,000 km (224,000 mi) away from this, families where following the events on the moon from their TVs in the living room.

In time for the 50th anniversary I wanted to bring some of this experience back - back into the living rooms. The Apollo 11 mission has largely been remembered for some very famous quotes, and so this is what I decided to focus on when making my skill - “Moon Landing: The Original Audio”.

Control Center during Demonstration

Using SSML (Speech Synthesis Markup Language) and the <audio> tag I was able to combine original Apollo 11 audio clips from the Internet Archives and Alexa’s voice to create an enjoyable listening experience.

The <audio> tag allows us to combine up to 240 seconds of audio with text-to-speech output. As long as we’ve specified that the outputSpeech is SSML we’re good to go.

"outputSpeech": {
    "type": "SSML",
    "ssml": "<speak>Whatever you want Alexa to say. You can combine this with an audio clip, like this one: 
                 <audio src='https://location-of-your-audio-file.com/yourfile.mp3'/>

There are a few requirements on the MP3 file you use, and if you don’t get these correct, your skill will fail relatively silently. (I good way to debug for audio file / codec specific errors is to use the Test Simulator in the Developer Portal, as this will specify that there’s a problem with the audio file, if that’s what’s causing it to fail.)

The key criteria that MP3 file must meet are:

  • Must be hosted on a HTTPS endpoint. Note, self-signed certificates cannot be used. (S3 is usually a good place to host the files.)
  • Can’t be longer than 240 seconds
  • Bit rate must be 48 kbps
  • Sample rate must be 22050Hz, 24000Hz, or 16000Hz

The last two of these - are easily solved if you’re using for example Audacity and the LAME MP3 Encoder to create your files. But there are other tools like FFmpeg to which can be used as well. Amazon has even provided a step-by-step guide for helping you meet the bit and sample rate criteria.

You don’t have to combine text-to-speech and audio output, you could rely on just the audio output. Using this, you could go a long way in creating a custom voice for your Alexa skill.

If you want to hear what the audio tag sounds like when used with Apollo 11 audio, give “Moon Landing” a go:
UK: https://www.amazon.co.uk/dp/B07VBZHV7P/
US: https://www.amazon.com/dp/B07VBZHV7P/
AU: https://www.amazon.co.au/dp/B07VBZHV7P/
CA: https://www.amazon.ca/dp/B07VBZHV7P/
IN: https://www.amazon.in/dp/B07VBZHV7P/

What about Actions on Google?
Google also use SSML and also support the <audio> tag. However, the requirements on the file differ slightly (e.g. max length of 120 seconds and bit rates) but other aspects are the same (also requires HTTPS endpoint).

Oscar Schafer

Using the SSML <audio> tag to play audio from the Apollo 11 mission on Alexa, 50 years later.