Saturday, September 13, 2025

Amazon Polly Lexicon & SSML | Overview & Hands-On.

Amazon Polly Lexicon & SSML - Overview & Hands-On.

Scope:

Amazon Polly Basics,
Amazon Polly Lexicons,
Key Features,
Structure of Lexicon written in XML with <lexicon> root and <lexeme> entries,
How to Create & upload via AWS CLI,
Amazon Polly SSML (Speech Synthesis Markup Language),
Lexicons vs SSML,
Best practice,
Advanced Tips,
When to Use What,
Insights,
Project: Hands-On.

Intro:

Amazon Polly uses both pronunciation lexicons (lexicons) and the Speech Synthesis Markup Language (SSML) to customize and control how text is converted into lifelike speech.

1. Amazon Polly Basics

Amazon Polly is AWS’s text-to-speech (TTS) service.
By default, it reads plain text, but twtech can enrich and control pronunciation, prosody, and style using Lexicons and SSML (Speech Synthesis Markup Language).

2. Amazon Polly Lexicons

Lexicons are pronunciation dictionaries that twtech may create and store in Amazon Polly.

Key Features

Define custom pronunciations using IPA (International Phonetic Alphabet) or Amazon’s proprietary phonetic notation (x-sampa-like).
Store them in XML-based PLS (Pronunciation Lexicon Specification) format.
Useful for:

Brand names: e.g., "X Æ A-12" → “Ex Ash A Twelve”
Acronyms: "SQL" → “Sequel”
Industry jargon or regional terms

# Structure of Lexicon written in XML with <lexicon> root and <lexeme> entries:

<?xml version="1.0" encoding="UTF-8"?>

<lexicon version="1.0"

xmlns="http://www.w3.org/2005/01/pronunciation-lexicon"

xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"

xsi:schemaLocation="http://www.w3.org/2005/01/pronunciation-lexicon

http://www.w3.org/TR/pronunciation-lexicon/lexicon.xsd"

alphabet="ipa"

xml:lang="en-US">

<lexeme>

<grapheme>SQL</grapheme>

<alias>sequel</alias>

</lexeme>

<lexeme>

<grapheme>X Æ A-12</grapheme>

<phoneme>ɛks æʃ eɪ twɛlv</phoneme>

</lexeme>

</lexicon>

<grapheme> → word spelling
<alias> → text substitution (simple replacement)
<phoneme> → explicit pronunciation (IPA… International Phonetic Alphabet or Amazon phonetic alphabet)

# How to Create & upload via AWS CLI:

aws polly put-lexicon --name twtechLexicon --content file://twtechlexicon.pls

# Reference in speech synthesis:

# bash

aws polly synthesize-speech \

--output-format mp3 \

--voice-id Joanna \

--text "I love SQL." \

--lexicon-names twtechLexicon \

output.mp3

3. Amazon Polly SSML (Speech Synthesis Markup Language)

SSML (Speech Synthesis Markup Language) is an XML dialect that lets twtech control how Polly speaks text.

# Common SSML Tags in Polly

<speak> → Root element

<break time="5s"/> → Pause

<prosody rate="slow" pitch="+10%"> → Adjust speed, pitch, volume

<phoneme alphabet="ipa" ph="ˈdʒɒ.nə">Joanna</phoneme> → Custom pronunciation

<say-as interpret-as="digits">1234</say-as> → Control reading style (digits, date, time, currency, etc.)

<amazon:effect name="drc">...</amazon:effect> → Dynamic Range Compression

<amazon:auto-breaths> → More natural breathing

<amazon:domain name="conversational">...</amazon:domain> → Neural TTS styles (news, conversational, etc.)

<emphasis level="strong">important</emphasis> → Add stress

<p> / <s> → Paragraphs / sentences

# Sample

<speak>

Hello twtech team !

<break time="300ms"/>

Today we’ll learn <prosody rate="slow" pitch="+5%">Amazon Polly</prosody>.

The acronym <say-as interpret-as="characters">SQL</say-as>

is often pronounced <phoneme alphabet="ipa" ph="ˈsiːkwəl">SQL</phoneme>.

</speak>

4. Lexicons vs SSML

Feature		Lexicons	SSML
Scope	Global, reusable		Inline, per request
Storage	Persist in Polly service		Inside text request
Use Case	Consistent brand/product name pronunciation		Fine-grained speech control (intonation, pauses, styles)
Format	PLS (Pronunciation Lexicon Specification) XML…Extensible Markup Language (separate file)		XML tags inside text
Flexibility	Mostly for words		Full control over speech

Best practice:

Use Lexicons for words/phrases that twtech reuses across multiple syntheses.
Use SSML for contextual control (tone, pacing, style).

5. Advanced Tips

Combine Lexicons + SSML

twtech can load a lexicon (for consistent pronunciation) and still use SSML for pauses, pitch, or emphasis.

Multiple Lexicons

twtech can pass up to 5 lexicons per request.

Neural TTS Enhancements

With Neural voices, <amazon:domain> and <amazon:effect> make speech sound more human-like (e.g., “newscaster” style).

Testing

Always test IPA vs alias — IPA gives precision, but alias may sound more natural in casual text.

6. When to Use What

Lexicon → Company name, product name, acronyms (SQL → sequel), unusual words.
SSML → Adjust how words are said (slower, louder, emphasized, with a pause).

Insights:

A hands-on end-to-end Amazon Polly project that uses both Lexicons and SSML.

Scope:

The lexicon XML
File,
The SSML (Synthesis Markup Language)
Script,
AWS CLI commands.

Step 1. Create a Lexicon File

# Save this as: twtechlexicon.pls

<?xml version="1.0" encoding="UTF-8"?>

<lexicon version="1.0"

xmlns="http://www.w3.org/2005/01/pronunciation-lexicon"

xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"

xsi:schemaLocation="http://www.w3.org/2005/01/pronunciation-lexicon

http://www.w3.org/TR/pronunciation-lexicon/lexicon.xsd"

alphabet="ipa"

xml:lang="en-US">

<lexeme>

<grapheme>SQL</grapheme>

<phoneme>ˈsiːkwəl</phoneme>

</lexeme>

<lexeme>

<grapheme>AcmeX</grapheme>

<alias>Acme Ex</alias>

</lexeme>

</lexicon>

# Uploading a Lexicon File to Polly:

# bash

aws polly put-lexicon \
--name twtechLexicon \
--content file://twtechlexicon.pls

# How to Check if the a Lexicon File was successfully uploaded:

aws polly list-lexicons

Step 2. Write SSML( Synthesis Markup Language) Script

# Save this as script.ssml:

<speak>

Hello team!

<break time="300ms"/>

Today we’ll learn about <prosody rate="slow" pitch="+5%">Amazon Polly</prosody>.

Did you know that <say-as interpret-as="characters">SQL</say-as>

is usually pronounced <phoneme alphabet="ipa" ph="ˈsiːkwəl">SQL</phoneme>?

Our brand name is <emphasis level="moderate">AcmeX</emphasis>,

and thanks to the lexicon, it is always pronounced correctly!

<amazon:domain name="conversational">

That’s pretty cool, right?

</amazon:domain>

</speak>

Step 3. Synthesize Speech with Lexicon + SSML

# Run:

aws polly synthesize-speech \

--output-format mp3 \

--voice-id Joanna \

--text-type ssml \

--text file://script.ssml \

--lexicon-names twtechLexicon \

output.mp3

Step 4. Play the Audio

Once the command finishes, twtech will have output.mp3 in its current directory.
Open it in the audio player — twtech should hear:

A natural pause
Slower, slightly higher pitch for “Amazon Polly”
“SQL” pronounced sequel (via lexicon)
“AcmeX” expanded to “Acme Ex” (via lexicon)
Conversational tone for the last line

Step 5. Clean Up (If twtech wants to remove unwanted lexicon):

aws polly delete-lexicon --name twtechLexicon

# The above setup shows: Lexicon + SSML working together in Amazon Polly.

Lexicon + SSML in Amazon Polly Setup with Python with Boto3.

Scope:

Uploading a lexicon
Listing lexicons
Synthesizing SSML text with that lexicon
Saving the result to an MP3

Step 1. Install Dependencies

pip install boto3

# twtech makes sure the AWS credentials has been configured (aws configure) with permissions for Polly.

Step 2. Full Python Script

# twtech-python-polly_setup.py

import boto3

# Initialize Polly client

polly = boto3.client("polly")

# Uploading Lexicon file (twtech-python-polly_setup.py)

lexicon_name = "twtechLexicon"

lexicon_content = """<?xml version="1.0" encoding="UTF-8"?>

<lexicon version="1.0"

xmlns="http://www.w3.org/2005/01/pronunciation-lexicon"

xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"

xsi:schemaLocation="http://www.w3.org/2005/01/pronunciation-lexicon

http://www.w3.org/TR/pronunciation-lexicon/lexicon.xsd"

alphabet="ipa"

xml:lang="en-US">

<lexeme>

<grapheme>SQL</grapheme>

<phoneme>ˈsiːkwəl</phoneme>

</lexeme>

<lexeme>

<grapheme>AcmeX</grapheme>

<alias>Acme Ex</alias>

</lexeme>

</lexicon>

"""

print("Uploading Lexicon...")

polly.put_lexicon(Name=twtechLexicon, Content=twtechLexicon_content)

# List Lexicons

print("Available lexicons:")

response = polly.list_lexicons()

for lex in response["Lexicons"]:

print(" -", lex["Name"])

# SSML Text

ssml_text = """<speak>

Hello team!

<break time="300ms"/>

Today we’ll learn about <prosody rate="slow" pitch="+5%">Amazon Polly</prosody>.

Did you know that <say-as interpret-as="characters">SQL</say-as>

is usually pronounced <phoneme alphabet="ipa" ph="ˈsiːkwəl">SQL</phoneme>?

Our brand name is <emphasis level="moderate">AcmeX</emphasis>,

and thanks to the lexicon, it is always pronounced correctly!

<amazon:domain name="conversational">

That’s pretty cool, right?

</amazon:domain>

</speak>

"""

# Synthesize Speech

print("Synthesizing speech with SSML + Lexicon...")

response = polly.synthesize_speech(

Engine="neural", # use neural TTS if available

OutputFormat="mp3",

VoiceId="Joanna",

TextType="ssml",

Text=ssml_text,

LexiconNames=[twtechLexicon]

)

# Save audio stream to file

with open("output.mp3", "wb") as f:

f.write(response["AudioStream"].read())

print("Saved output.mp3 (play it to hear the result).")

Step 3. Run It as python (twtech-python-polly_setup.py)

It will upload the lexicon.
Print available lexicons.
Generate output.mp3.
Play it — twtech should hear the same controlled pronunciation and SSML effects as in the CLI version.

Step 4. Clean Up unwanted lexicon (Optional)

polly.delete_lexicon(Name=twtech-lexicon)

NB:

The above script is self-contained.
The Script (twtech-python-polly_setup.py) uploads the lexicon, synthesizes, and saves the MP3 in one run.

Project: Hands-On

How twtech creates and use the Amazon Polly (UI)

Search for the aws service: Polly

How it works

Use cases

Try Polly:

Input text:

Hi! My name is Pat. I am an IT engineer with Dominion Systems Ontario Canada.

How twtech enables SSML(Speech Synthesis Markup Language) to the voice:

Input text with SSML:

<speak>Hi! My name is Pat. <break time="5s"/> I am an IT engineer with Dominion Systems Ontario Canada.</speak>

How twtech enables SSML(Speech Synthesis Markup Language) and customize Prononciation to the voice with a created lexicon file uploaded: lexicon

Amazon Polly Deep Dive on How to:

Create Lexicons file with VSCode,
Save file in .xml format,
Upload lex file to Polly,
Customize Lex to use special refrences for wors or sentences.

Here’s a twtect lexicon file sample (twtechLexicon.xml):

# Link to file:

https://github.com/Devopspat35/twtech-public-codes/blob/master/twtech-lexicon.xml.txt

# twtechLexicon.xml

<?xml version="1.0" encoding="UTF-8"?>

<lexicon version="1.0"

xmlns="http://www.w3.org/2005/01/pronunciation-lexicon"

xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"

xsi:schemaLocation="http://www.w3.org/2005/01/pronunciation-lexicon

http://www.w3.org/TR/2007/CR-pronunciation-lexicon-20071212/pls.xsd"

alphabet="x-sampa" xml:lang="en-US">

<lexeme>

<grapheme>Pat</grapheme>

<alias>Patrick</alias>

</lexeme>

<lexeme>

<grapheme>Dominion</grapheme>

<alias>I am an IT engineer with Dominion Systems Ontario Canada</alias>

</lexeme>

</lexicon>

How twtechLexicon.xml works:

grapheme = the written word Polly will encounter in SSML/text.
phoneme = how Polly should pronounce it (using IPA here).
xml:lang = language of the lexicon (important).
alphabet="ipa" → twtech can also use x-sampa.

The file can be developed from VS-Code and saved in: .xml format

Choose a location and save the file: save as

How twtech uploads the lexicon file from the saved location: Download folder

Upload lexicon file: twtechLexicon.xml

Lexicon content can be: view, deleted or Dowonloaded to modify then, re-uploaded.

How to test if the lexicon file uploaded is working: twtechLexicon.xml

Choose the the lexicon file created for customized pronunciation: twtechLexicon.xml

listening to the Text audio with customized Lexicon, Pat will be pronounce as: Patrick.

Doniminion is referenced as a whole sentence: I am an IT engineer with Dominion Systems Ontario Canada

#CLI:

How twtech can also upload the file (twtechLexicon.xml), to Polly: CLI

aws polly put-lexicon --name twtechLex --content file://twtechLexicon.xml

twtech may choose to use: SSML

Speech Synthesis Markup Language (SSML).
tags allow twtech to modify speech output, for example by selecting a Newscaster voice, changing the phonetic pronunciation of a word, or adding a pause.

Link to Official documentation on generating speech with SSML.