Saturday, September 13, 2025

Amazon Polly Lexicon & SSML | Overview & Hands-On.

Amazon Polly Lexicon & SSML -  Overview & Hands-On.

Scope:

  • Amazon Polly Basics,
  • Amazon Polly Lexicons,
  • Key Features, 
  • Structure of Lexicon written in XML with <lexicon> root and <lexeme> entries,
  • How to Create & upload via AWS CLI,
  • Amazon Polly SSML (Speech Synthesis Markup Language),
  • Lexicons vs SSML,
  • Best practice,
  • Advanced Tips,
  • When to Use What,
  • Insights,
  • Project: Hands-On.

Intro:

    • Amazon Polly uses both pronunciation lexicons (lexicons) and the Speech Synthesis Markup Language (SSML) to customize and control how text is converted into lifelike speech.

 1. Amazon Polly Basics

    • Amazon Polly is AWS’s text-to-speech (TTS) service.
    • By default, it reads plain text, but twtech can enrich and control pronunciation, prosody, and style using Lexicons and SSML (Speech Synthesis Markup Language).

 2. Amazon Polly Lexicons

    • Lexicons are pronunciation dictionaries that twtech may create and store in Amazon Polly.

Key Features

    • Define custom pronunciations using IPA (International Phonetic Alphabet) or Amazon’s proprietary phonetic notation (x-sampa-like).
    • Store them in XML-based PLS (Pronunciation Lexicon Specification) format.
    • Useful for:
      • Brand names: e.g., "X Æ A-12" “Ex Ash A Twelve”
      • Acronyms: "SQL" “Sequel”
      • Industry jargon or regional terms

# Structure of Lexicon written in XML with <lexicon> root and <lexeme> entries:

<?xml version="1.0" encoding="UTF-8"?>

<lexicon version="1.0"

         xmlns="http://www.w3.org/2005/01/pronunciation-lexicon"

         xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"

         xsi:schemaLocation="http://www.w3.org/2005/01/pronunciation-lexicon

                             http://www.w3.org/TR/pronunciation-lexicon/lexicon.xsd"

         alphabet="ipa"

         xml:lang="en-US">

  <lexeme>

    <grapheme>SQL</grapheme>

    <alias>sequel</alias>

  </lexeme>

  <lexeme>

    <grapheme>X Æ A-12</grapheme>

    <phoneme>ɛks æʃ eɪ twɛlv</phoneme>

  </lexeme>

</lexicon>

    • <grapheme> word spelling
    • <alias> text substitution (simple replacement)
    • <phoneme> explicit pronunciation (IPA… International Phonetic Alphabet or Amazon phonetic alphabet)
# How to Create & upload via AWS CLI:

aws polly put-lexicon --name twtechLexicon --content file://twtechlexicon.pls


# Reference in speech synthesis:

# bash

aws polly synthesize-speech \

--output-format mp3 \

--voice-id Joanna \

 --text "I love SQL." \

 --lexicon-names twtechLexicon \

 output.mp3

 3. Amazon Polly SSML (Speech Synthesis Markup Language)

    • SSML (Speech Synthesis Markup Language) is an XML dialect that lets twtech control how Polly speaks text.

# Common SSML Tags in Polly

<speak> Root element

<break time="5s"/> Pause

<prosody rate="slow" pitch="+10%"> Adjust speed, pitch, volume

<phoneme alphabet="ipa" ph="ˈdʒɒ.nə">Joanna</phoneme> Custom pronunciation

<say-as interpret-as="digits">1234</say-as> Control reading style (digits, date, time, currency, etc.)

<amazon:effect name="drc">...</amazon:effect> Dynamic Range Compression

<amazon:auto-breaths> More natural breathing

<amazon:domain name="conversational">...</amazon:domain> Neural TTS styles (news, conversational, etc.)

<emphasis level="strong">important</emphasis> Add stress

<p> / <s> Paragraphs / sentences

# Sample

<speak>

  Hello  twtech team ! 

  <break time="300ms"/> 

  Today we’ll learn <prosody rate="slow" pitch="+5%">Amazon Polly</prosody>. 

  The acronym <say-as interpret-as="characters">SQL</say-as> 

  is often pronounced <phoneme alphabet="ipa" ph="ˈsiːkwəl">SQL</phoneme>.

</speak>

 4. Lexicons vs SSML

Feature

Lexicons

SSML

Scope

Global, reusable

Inline, per request

Storage

Persist in Polly service

Inside text request

Use Case

Consistent brand/product name pronunciation

Fine-grained speech control (intonation, pauses, styles)

Format

PLS (Pronunciation Lexicon Specification) XMLExtensible Markup Language

(separate file)

XML tags inside text

Flexibility

Mostly for words

Full control over speech

Best practice:

    •        Use Lexicons for words/phrases that twtech reuses across multiple syntheses.
    •        Use SSML for contextual control (tone, pacing, style).

 5. Advanced Tips

    • Combine Lexicons + SSML
      • twtech can load a lexicon (for consistent pronunciation) and still use SSML for pauses, pitch, or emphasis.
    • Multiple Lexicons
      • twtech can pass up to 5 lexicons per request.
    • Neural TTS Enhancements
      • With Neural voices, <amazon:domain> and <amazon:effect> make speech sound more human-like (e.g., “newscaster” style).
    • Testing
      • Always test IPA vs aliasIPA gives precision, but alias may sound more natural in casual text.

 6. When to Use What

    • Lexicon Company name, product name, acronyms (SQL sequel), unusual words.
    • SSML Adjust how words are said (slower, louder, emphasized, with a pause).

Insights:

A hands-on end-to-end Amazon Polly project that uses both Lexicons and SSML.

Scope:

    • The lexicon XML
    • File,
    • The SSML (Synthesis Markup Language)
    • Script,
    • AWS CLI commands.

 Step 1. Create a Lexicon File

# Save this as:  twtechlexicon.pls

<?xml version="1.0" encoding="UTF-8"?>

<lexicon version="1.0"

         xmlns="http://www.w3.org/2005/01/pronunciation-lexicon"

         xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"

         xsi:schemaLocation="http://www.w3.org/2005/01/pronunciation-lexicon

                             http://www.w3.org/TR/pronunciation-lexicon/lexicon.xsd"

         alphabet="ipa"

         xml:lang="en-US">

  <!-- Example: SQL normally spelled out, we want "sequel" -->

  <lexeme>

    <grapheme>SQL</grapheme>

    <phoneme>ˈsiːkwəl</phoneme>

  </lexeme>

  <!-- Example: Company brand name -->

  <lexeme>

    <grapheme>AcmeX</grapheme>

    <alias>Acme Ex</alias>

  </lexeme>

</lexicon>

# Uploading a Lexicon File to Polly:

# bash

aws polly put-lexicon \

  --name twtechLexicon \

  --content file://twtechlexicon.pls

# How to Check if the a Lexicon File was successfully uploaded:

aws polly list-lexicons

 Step 2. Write SSML( Synthesis Markup Language) Script

# Save this as script.ssml:

<speak>

  Hello  team! 

  <break time="300ms"/> 

  Today we’ll learn about <prosody rate="slow" pitch="+5%">Amazon Polly</prosody>. 

  Did you know that <say-as interpret-as="characters">SQL</say-as> 

  is usually pronounced <phoneme alphabet="ipa" ph="ˈsiːkwəl">SQL</phoneme>? 

  Our brand name is <emphasis level="moderate">AcmeX</emphasis>, 

  and thanks to the lexicon, it is always pronounced correctly! 

  <amazon:domain name="conversational">

    That’s pretty cool, right?

  </amazon:domain>

</speak>

 Step 3. Synthesize Speech with Lexicon + SSML

# Run:

aws polly synthesize-speech \

  --output-format mp3 \

  --voice-id Joanna \

  --text-type ssml \

  --text file://script.ssml \

  --lexicon-names twtechLexicon \

  output.mp3

 Step 4. Play the Audio

  • Once the command finishes, twtech will have output.mp3 in its current directory.
  • Open it in the audio player — twtech should hear:

    • A natural pause
    • Slower, slightly higher pitch for “Amazon Polly”
    • “SQL” pronounced sequel (via lexicon)
    • “AcmeX” expanded to “Acme Ex” (via lexicon)
    • Conversational tone for the last line

Step 5. Clean Up (If twtech wants to remove unwanted lexicon):

aws polly delete-lexicon --name twtechLexicon

# The above setup shows:  Lexicon + SSML working together in Amazon Polly.

  • Lexicon + SSML in Amazon Polly Setup with Python with Boto3.

Scope:

    • Uploading a lexicon
    • Listing lexicons
    • Synthesizing SSML text with that lexicon
    • Saving the result to an MP3

 Step 1. Install Dependencies

pip install boto3

# twtech makes sure the AWS credentials has been configured (aws configure) with permissions for Polly.

 Step 2. Full Python Script

# twtech-python-polly_setup.py

import boto3

# Initialize Polly client

polly = boto3.client("polly")

#  Uploading Lexicon file (twtech-python-polly_setup.py)

lexicon_name = "twtechLexicon"

lexicon_content = """<?xml version="1.0" encoding="UTF-8"?>

<lexicon version="1.0"

         xmlns="http://www.w3.org/2005/01/pronunciation-lexicon"

         xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"

         xsi:schemaLocation="http://www.w3.org/2005/01/pronunciation-lexicon

                             http://www.w3.org/TR/pronunciation-lexicon/lexicon.xsd"

         alphabet="ipa"

         xml:lang="en-US">

  <lexeme>

    <grapheme>SQL</grapheme>

    <phoneme>ˈsiːkwəl</phoneme>

  </lexeme>

  <lexeme>

    <grapheme>AcmeX</grapheme>

    <alias>Acme Ex</alias>

  </lexeme>

</lexicon>

"""

print("Uploading Lexicon...")

polly.put_lexicon(Name=twtechLexicon, Content=twtechLexicon_content)

# List Lexicons 

print("Available lexicons:")

response = polly.list_lexicons()

for lex in response["Lexicons"]:

    print(" -", lex["Name"])

#  SSML Text 

ssml_text = """<speak>

  Hello team!

  <break time="300ms"/>

  Today we’ll learn about <prosody rate="slow" pitch="+5%">Amazon Polly</prosody>.

  Did you know that <say-as interpret-as="characters">SQL</say-as>

  is usually pronounced <phoneme alphabet="ipa" ph="ˈsiːkwəl">SQL</phoneme>?

  Our brand name is <emphasis level="moderate">AcmeX</emphasis>,

  and thanks to the lexicon, it is always pronounced correctly!

  <amazon:domain name="conversational">

    That’s pretty cool, right?

  </amazon:domain>

</speak>

"""

#  Synthesize Speech

print("Synthesizing speech with SSML + Lexicon...")

response = polly.synthesize_speech(

    Engine="neural",              # use neural TTS if available

    OutputFormat="mp3",

    VoiceId="Joanna",

    TextType="ssml",

    Text=ssml_text,

    LexiconNames=[twtechLexicon]

)

# Save audio stream to file

with open("output.mp3", "wb") as f:

    f.write(response["AudioStream"].read()) 

print("Saved output.mp3 (play it to hear the result).")

 Step 3. Run It as python (twtech-python-polly_setup.py)

    • It will upload the lexicon.
    • Print available lexicons.
    • Generate output.mp3.
    • Play it twtech should hear the same controlled pronunciation and SSML effects as in the CLI version.

 Step 4. Clean Up unwanted lexicon (Optional)

polly.delete_lexicon(Name=twtech-lexicon)

NB:

    • The above script is self-contained.
    • The Script (twtech-python-polly_setup.py) uploads the lexicon, synthesizes, and saves the MP3 in one run.


Project: Hands-On

  • How twtech creates and use the Amazon Polly (UI)

Search for the aws service: Polly

  • How it works

  • Use cases

  • Try Polly:

Input text: 

Hi! My name is Pat. I am an IT engineer with Dominion Systems Ontario Canada.

How twtech enables SSML(Speech Synthesis Markup Language) to the voice:

  • Input text with SSML: 

<speak>Hi! My name is Pat. <break time="5s"/> I am an IT engineer with Dominion Systems Ontario Canada.</speak>

How twtech enables SSML(Speech Synthesis Markup Language) and customize Prononciation to the voice with a created lexicon file uploaded: lexicon



Amazon Polly Deep Dive on How to:  


       Create Lexicons file with VSCode,
       Save file in .xml format,
       Upload lex file to Polly,
       Customize Lex to use special refrences for wors or sentences.

Here’s a twtect lexicon file sample (twtechLexicon.xml):

# Link to file:

https://github.com/Devopspat35/twtech-public-codes/blob/master/twtech-lexicon.xml.txt

# twtechLexicon.xml

 <?xml version="1.0" encoding="UTF-8"?>

 <lexicon version="1.0"

     xmlns="http://www.w3.org/2005/01/pronunciation-lexicon"

     xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"

     xsi:schemaLocation="http://www.w3.org/2005/01/pronunciation-lexicon

       http://www.w3.org/TR/2007/CR-pronunciation-lexicon-20071212/pls.xsd"

     alphabet="x-sampa" xml:lang="en-US">

<lexeme>

<grapheme>Pat</grapheme>

<alias>Patrick</alias>

</lexeme>

<lexeme>

    <grapheme>Dominion</grapheme>

        <!-- Tell Polly to say " I am an IT engineer with Dominion Systems Ontario Canada " -->

    <alias>I am an IT engineer with Dominion Systems Ontario Canada</alias>

</lexeme>

 </lexicon> 

How twtechLexicon.xml works:

  • grapheme = the written word Polly will encounter in SSML/text.
  • phoneme = how Polly should pronounce it (using IPA here).
  • xml:lang = language of the lexicon (important).
  • alphabet="ipa" twtech can also use x-sampa.

The file can be developed from VS-Code and saved in: .xml format

  • Choose a location and save the file: save as


  • How twtech uploads the lexicon file from the saved location: Download folder

  • Upload lexicon file: twtechLexicon.xml

  • Lexicon content can be: view, deleted or Dowonloaded to modify then, re-uploaded.

  • How to test if the lexicon file uploaded is working: twtechLexicon.xml

  • Choose the the lexicon file created for customized pronunciation: twtechLexicon.xml

  •  listening to the Text audio with customized Lexicon, Pat will be pronounce as: Patrick.

  • Doniminion is referenced as a whole sentence: I am an IT engineer with Dominion Systems Ontario Canada


#CLI:

How twtech can also upload the file (twtechLexicon.xml), to Polly: CLI

aws polly put-lexicon --name twtechLex --content file://twtechLexicon.xml

twtech may choose to use: SSML

    • Speech Synthesis Markup Language (SSML).
    • tags allow twtech to modify speech output, for example by selecting a Newscaster voice, changing the phonetic pronunciation of a word, or adding a pause.

Link to Official documentation on generating speech with SSML.


Input Text example :

<speak>Hi! My name is Pat. <break time="5s"/> I am an IT engineer with Dominion Systems Ontario Canada.</speak>

How twtech saves the audio from text to s3 bucket: twtechs3

  • Save to s3 bucket: twtechs3


Save to S3: S3 output bucket

    • Make sure that twtech-S3bucket is created in the same region as twtech Synthesis Task request and that twtech IAM user can write to it.


  • twtech needs to Verify from the s3 bucket if the audio mp3 is saved: twtechs3

Yes: successfully saved


  • How to listen or download the audio from the url creatd: search the audio using ID from bucket and click open.

  • How to download text audio created from bucket:twtechs3

  • How to listen text audio created from bucket:twtechs3

  • Copy the url and paste on the browser:

  • Text Audio can also be downloaded from browser as it plays:







No comments:

Post a Comment

Amazon EventBridge | Overview.

Amazon EventBridge - Overview. Scope: Intro, Core Concepts, Key Benefits, Link to official documentation, Insights. Intro: Amazon EventBridg...