Amazon Polly Lexicon & SSML - Overview & Hands-On.
Scope:
- Amazon Polly Basics,
- Amazon Polly Lexicons,
- Key Features,
- Structure of Lexicon written in XML with <lexicon> root and <lexeme> entries,
- How to Create & upload via AWS CLI,
- Amazon Polly SSML (Speech Synthesis Markup Language),
- Lexicons vs SSML,
- Best practice,
- Advanced Tips,
- When to Use What,
- Insights,
- Project: Hands-On.
Intro:
- Amazon Polly uses both pronunciation lexicons (lexicons) and the Speech Synthesis Markup Language (SSML) to customize and control how text is converted into lifelike speech.
1. Amazon Polly Basics
- Amazon Polly is AWS’s text-to-speech (TTS) service.
- By default, it reads plain text, but twtech can enrich and control pronunciation, prosody, and style using Lexicons and SSML (Speech Synthesis Markup Language).
2. Amazon Polly Lexicons
- Lexicons are pronunciation dictionaries that twtech may create and store in Amazon Polly.
Key
Features
- Define custom pronunciations using
IPA (International Phonetic Alphabet) or Amazon’s proprietary
phonetic notation (x-sampa-like).
- Store them in XML-based PLS (Pronunciation Lexicon Specification) format.
- Useful for:
- Brand names: e.g., "X Æ A-12" → “Ex Ash A Twelve”
- Acronyms: "SQL" → “Sequel”
- Industry jargon or regional terms
# Structure of Lexicon written in XML with <lexicon> root and <lexeme> entries:
<?xml
version="1.0" encoding="UTF-8"?>
<lexicon
version="1.0"
xmlns="http://www.w3.org/2005/01/pronunciation-lexicon"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://www.w3.org/2005/01/pronunciation-lexicon
http://www.w3.org/TR/pronunciation-lexicon/lexicon.xsd"
alphabet="ipa"
xml:lang="en-US">
<lexeme>
<grapheme>SQL</grapheme>
<alias>sequel</alias>
</lexeme>
<lexeme>
<grapheme>X Æ A-12</grapheme>
<phoneme>ɛks æʃ eɪ
twɛlv</phoneme>
</lexeme>
</lexicon>
- <grapheme> → word spelling
- <alias> → text substitution (simple replacement)
- <phoneme> → explicit pronunciation (IPA… International Phonetic Alphabet or Amazon phonetic alphabet)
aws polly put-lexicon --name twtechLexicon --content file://twtechlexicon.pls
# Reference in speech synthesis:
# bash
aws polly
synthesize-speech \
--output-format mp3 \
--voice-id Joanna \
--text "I love SQL." \
--lexicon-names twtechLexicon \
output.mp3
3. Amazon Polly SSML (Speech Synthesis Markup Language)
- SSML (Speech
Synthesis Markup Language) is an XML dialect that lets
twtech control how Polly speaks text.
# Common
SSML Tags in Polly
<speak> → Root element
<break
time="5s"/> → Pause
<prosody
rate="slow" pitch="+10%"> → Adjust speed, pitch, volume
<phoneme
alphabet="ipa" ph="ˈdʒɒ.nə">Joanna</phoneme> →
Custom pronunciation
<say-as
interpret-as="digits">1234</say-as> → Control reading style (digits,
date, time, currency, etc.)
<amazon:effect
name="drc">...</amazon:effect> → Dynamic Range Compression
<amazon:auto-breaths> →
More natural breathing
<amazon:domain
name="conversational">...</amazon:domain> → Neural TTS
styles (news, conversational, etc.)
<emphasis
level="strong">important</emphasis> → Add stress
<p> / <s> →
Paragraphs / sentences
# Sample
<speak>
Hello twtech team !
<break time="300ms"/>
Today we’ll learn <prosody
rate="slow" pitch="+5%">Amazon
Polly</prosody>.
The acronym <say-as
interpret-as="characters">SQL</say-as>
is often pronounced <phoneme
alphabet="ipa" ph="ˈsiːkwəl">SQL</phoneme>.
</speak>
4. Lexicons vs SSML
|
Feature |
Lexicons |
SSML |
|
|
Scope |
Global, reusable |
Inline, per request |
|
|
Storage |
Persist in Polly service |
Inside text request |
|
|
Use
Case |
Consistent brand/product name
pronunciation |
Fine-grained speech control (intonation, pauses, styles) |
|
|
Format |
PLS (Pronunciation Lexicon Specification) XML…Extensible Markup Language (separate file) |
XML tags inside text |
|
|
Flexibility |
Mostly for words |
Full control over speech |
|
Best practice:
- Use Lexicons for words/phrases that twtech reuses across multiple syntheses.
- Use SSML for contextual control (tone, pacing, style).
5. Advanced Tips
- Combine Lexicons + SSML
- twtech can load a lexicon (for consistent pronunciation) and still use SSML for pauses, pitch, or emphasis.
- Multiple Lexicons
- twtech can pass up to 5 lexicons per request.
- Neural TTS Enhancements
- With Neural voices, <amazon:domain> and <amazon:effect> make speech sound more human-like (e.g., “newscaster” style).
- Testing
- Always test IPA vs alias — IPA gives precision, but alias may sound more natural in casual text.
6. When to Use What
- Lexicon → Company name, product name, acronyms (SQL → sequel),
unusual words.
- SSML → Adjust how words are said (slower, louder, emphasized, with a pause).
Insights:
A hands-on end-to-end Amazon Polly project that uses both Lexicons and SSML.
Scope:
- The lexicon XML
- File,
- The SSML (Synthesis Markup Language)
- Script,
- AWS CLI commands.
Step 1. Create a Lexicon
File
# Save this as: twtechlexicon.pls
<?xml
version="1.0" encoding="UTF-8"?>
<lexicon
version="1.0"
xmlns="http://www.w3.org/2005/01/pronunciation-lexicon"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://www.w3.org/2005/01/pronunciation-lexicon
http://www.w3.org/TR/pronunciation-lexicon/lexicon.xsd"
alphabet="ipa"
xml:lang="en-US">
<!-- Example: SQL normally spelled out, we
want "sequel" -->
<lexeme>
<grapheme>SQL</grapheme>
<phoneme>ˈsiːkwəl</phoneme>
</lexeme>
<!-- Example: Company brand name -->
<lexeme>
<grapheme>AcmeX</grapheme>
<alias>Acme Ex</alias>
</lexeme>
</lexicon>
# Uploading a Lexicon File to Polly:
# bash
aws polly put-lexicon \
--name twtechLexicon \
--content file://twtechlexicon.pls
# How to Check if the a Lexicon File was successfully uploaded:
aws polly list-lexicons
Step 2. Write SSML( Synthesis Markup Language) Script
# Save this as script.ssml:
<speak>
Hello team!
<break time="300ms"/>
Today we’ll learn about <prosody rate="slow" pitch="+5%">Amazon Polly</prosody>.
Did you know that <say-as
interpret-as="characters">SQL</say-as>
is usually pronounced <phoneme alphabet="ipa" ph="ˈsiːkwəl">SQL</phoneme>?
Our brand name is <emphasis
level="moderate">AcmeX</emphasis>,
and thanks to the lexicon, it is always pronounced correctly!
<amazon:domain
name="conversational">
That’s pretty cool, right?
</amazon:domain>
</speak>
Step 3. Synthesize Speech with
Lexicon + SSML
# Run:
aws polly
synthesize-speech \
--output-format mp3 \
--voice-id Joanna \
--text-type ssml \
--text file://script.ssml \
--lexicon-names twtechLexicon \
output.mp3
Step 4.
Play the Audio
- Once the command finishes, twtech will have output.mp3 in its current directory.
- Open it in the audio player — twtech should hear:
- A natural pause
- Slower, slightly higher pitch for “Amazon Polly”
- “SQL” pronounced sequel (via lexicon)
- “AcmeX” expanded to “Acme Ex” (via lexicon)
- Conversational tone for the last line
Step 5. Clean Up (If twtech wants to remove unwanted lexicon):
aws polly delete-lexicon --name twtechLexicon
# The above setup shows: Lexicon + SSML working together in Amazon Polly.
- Lexicon + SSML in Amazon
Polly Setup with Python
with Boto3.
Scope:
- Uploading a lexicon
- Listing lexicons
- Synthesizing SSML text with that lexicon
- Saving the result to an MP3
Step 1. Install Dependencies
pip install boto3
# twtech makes sure the AWS credentials has been configured (aws configure) with permissions for Polly.
Step
2. Full
Python Script
# twtech-python-polly_setup.py
import boto3
# Initialize Polly client
polly = boto3.client("polly")
# Uploading Lexicon file (
lexicon_name
= "twtechLexicon"
lexicon_content
= """<?xml version="1.0"
encoding="UTF-8"?>
<lexicon
version="1.0"
xmlns="http://www.w3.org/2005/01/pronunciation-lexicon"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://www.w3.org/2005/01/pronunciation-lexicon
http://www.w3.org/TR/pronunciation-lexicon/lexicon.xsd"
alphabet="ipa"
xml:lang="en-US">
<lexeme>
<grapheme>SQL</grapheme>
<phoneme>ˈsiːkwəl</phoneme>
</lexeme>
<lexeme>
<grapheme>AcmeX</grapheme>
<alias>Acme Ex</alias>
</lexeme>
</lexicon>
"""
print("Uploading
Lexicon...")
polly.put_lexicon(Name=twtechLexicon, Content=twtechLexicon_content)
# List Lexicons
print("Available
lexicons:")
response
= polly.list_lexicons()
for lex
in response["Lexicons"]:
print(" -", lex["Name"])
# SSML Text
ssml_text
= """<speak>
Hello team!
<break time="300ms"/>
Today we’ll learn about <prosody rate="slow" pitch="+5%">Amazon Polly</prosody>.
Did you know that <say-as
interpret-as="characters">SQL</say-as>
is usually pronounced <phoneme alphabet="ipa" ph="ˈsiːkwəl">SQL</phoneme>?
Our brand name is <emphasis
level="moderate">AcmeX</emphasis>,
and thanks to the lexicon, it is always pronounced correctly!
<amazon:domain
name="conversational">
That’s pretty cool, right?
</amazon:domain>
</speak>
"""
# Synthesize Speech
print("Synthesizing speech with SSML + Lexicon...")
response
= polly.synthesize_speech(
Engine="neural", # use neural TTS if available
OutputFormat="mp3",
VoiceId="Joanna",
TextType="ssml",
Text=ssml_text,
LexiconNames=[twtechLexicon]
)
# Save audio stream to file
with
open("output.mp3", "wb") as f:
f.write(response["AudioStream"].read())
print("Saved output.mp3 (play it to hear the result).")
Step 3. Run It as python (twtech-python-polly_setup.py)
- It will upload the lexicon.
- Print available lexicons.
- Generate output.mp3.
- Play it — twtech should hear the same controlled pronunciation and SSML effects as in the CLI version.
Step
4. Clean Up unwanted lexicon (Optional)
polly.delete_lexicon(Name=twtech-lexicon)
NB:
- The above script is self-contained.
- The Script (twtech-python-polly_setup.py) uploads the lexicon, synthesizes, and saves the MP3 in one run.
Project: Hands-On
- How twtech creates and use the Amazon Polly (UI)
Search for the aws service: Polly
- How it works
- Use cases
- Try Polly:
Input text:
Hi! My name is Pat. I am an IT engineer with Dominion Systems Ontario Canada.
How twtech enables SSML(Speech Synthesis Markup
Language) to the voice:
- Input text with SSML:
<speak>Hi! My name is Pat. <break time="5s"/> I am an IT engineer with Dominion Systems Ontario Canada.</speak>
How twtech enables SSML(Speech Synthesis Markup
Language) and customize Prononciation to the voice with a created lexicon file uploaded: lexicon
Amazon Polly Deep Dive on How to:
Save file in .xml format,
Upload lex file to Polly,
Customize Lex to use special refrences for wors or sentences.
Here’s a twtect lexicon file sample (twtechLexicon.xml):
# Link to file:
https://github.com/Devopspat35/twtech-public-codes/blob/master/twtech-lexicon.xml.txt
# twtechLexicon.xml
<?xml version="1.0"
encoding="UTF-8"?>
<lexicon version="1.0"
xmlns="http://www.w3.org/2005/01/pronunciation-lexicon"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://www.w3.org/2005/01/pronunciation-lexicon
http://www.w3.org/TR/2007/CR-pronunciation-lexicon-20071212/pls.xsd"
alphabet="x-sampa"
xml:lang="en-US">
<lexeme>
<grapheme>Pat</grapheme>
<alias>Patrick</alias>
</lexeme>
<lexeme>
<grapheme>Dominion</grapheme>
<!-- Tell Polly to say " I am an IT engineer with Dominion
Systems Ontario Canada " -->
<alias>I am an IT engineer with
Dominion Systems Ontario Canada</alias>
</lexeme>
</lexicon>
How
twtechLexicon.xml works:
- grapheme = the written word Polly will encounter
in SSML/text.
- phoneme = how Polly should pronounce it (using IPA here).
- xml:lang = language of the lexicon (important).
- alphabet="ipa" → twtech can also use x-sampa.
The file can be developed from
VS-Code and saved in: .xml format
- Choose a location and save
the file: save as
- How twtech uploads the
lexicon file from the saved location: Download folder
- Upload lexicon file: twtechLexicon.xml
- Lexicon content can be: view, deleted or Dowonloaded to
modify then, re-uploaded.
- How to test if the lexicon file uploaded
is working: twtechLexicon.xml
- Choose the the lexicon file created
for customized pronunciation: twtechLexicon.xml
- listening to the Text audio with customized Lexicon,
Pat will be pronounce as: Patrick.
- Doniminion is referenced as a whole sentence: I am an IT engineer with Dominion
Systems Ontario Canada
#CLI:
How twtech can also upload
the file (twtechLexicon.xml), to
Polly: CLI
aws polly put-lexicon --name twtechLex --content file://twtechLexicon.xml
twtech may choose to use:
SSML
- Speech Synthesis Markup Language (SSML).
- tags allow twtech to modify speech output, for example by selecting a Newscaster voice, changing the phonetic pronunciation of a word, or adding a pause.
Link to Official documentation on generating speech with SSML.
<speak>Hi! My name is Pat. <break time="5s"/> I am an IT engineer with Dominion Systems Ontario Canada.</speak>
How twtech saves the audio from text to s3 bucket: twtechs3
- Save to s3 bucket: twtechs3
Save
to S3: S3 output bucket
- Make sure that twtech-S3bucket is created in the same region as twtech Synthesis Task request and that twtech IAM user can write to it.
- twtech needs to Verify from the s3 bucket if the audio mp3 is saved: twtechs3
Yes: successfully
saved
- How to listen or download the audio from the url creatd:
search the audio using ID from
bucket and click open.
- How to download text audio created from bucket:twtechs3
- How to listen text audio created from bucket:twtechs3
- Copy the url and paste on the browser:
- Text Audio can also be downloaded from browser as it plays:
No comments:
Post a Comment