Wednesday, September 10, 2025

Amazon Transcribe 🎙️| Overview & Hands-On.

Amazon Transcribe 🎙️- Overview & Hands-On.

Scope:

Intro,
What Amazon Transcribe Does,
Sample APIs Batch Transcription (Async),
Reference Architectures Batch Workflow,
Reference Architectures Streaming Workflow,
IAM Permissions (Typical IAM policy for transcription service),
Output Samples Batch Transcript JSON,
Best Practices,
Common Pitfalls,
Advanced use case patterns,
Link to official documentation
Project: Hands-On.

Intro:

Amazon Transcribe is an automatic speech recognition (ASR) service from Amazon Web Services (AWS).
Amazon Transcribe converts audio and video speech into text using machine learning models.
Amazon Transcribe is a scalable and secure service used by developers to add speech-to-text capabilities to applications for various use cases, such as in contact centers for transcribing conversations or in classrooms for creating notes.

1. What Amazon Transcribe Does

Amazon Transcribe is AWS’s speech-to-text (STT) service.
It converts audio/video into time-stamped text and supports:

Batch Transcription (stored files, async)
Real-time / Streaming Transcription (low-latency speech recognition)
Domain-specific customization:
Custom Vocabulary (add brand names, jargon)

Custom Language Models (train with your own text corpus)
Vocabulary Filtering (block words, profanity filter)

Speaker Diarization (who said what)
Channel Identification (multi-channel audio)
Timestamps + Confidence scores

2. Sample APIs Batch Transcription (Async)


import boto3
transcribe = boto3.client('transcribe')
job_name = "twtech-transcription-job"
job_uri = "s3://twtech-s3bucket/audio-file.wav"
transcribe.start_transcription_job(
    TranscriptionJobName=twtech-job_name,
    Media={'MediaFileUri': job_uri},
    MediaFormat='wav',
    LanguageCode='en-US',
    OutputBucketName='twtech-output-bucket',
    Settings={
        'ShowSpeakerLabels': True,
        'MaxSpeakerLabels': 2,
        'VocabularyName': 'twtech-custom-vocab'
    }
)

# Later, check the status
          result = transcribe.get_transcription_job(TranscriptionJobName=twtech-job_name)

Output JSON is written to S3 Streaming Transcription (Real-time)

Supports WebSocket and gRPC.
Use AWS SDKs (Python, JS, Java) or Amazon Transcribe Streaming SDK.

Sample (Python):


import boto3
import asyncio
from amazon_transcribe.client import TranscribeStreamingClient
from amazon_transcribe.handlers import TranscriptResultStreamHandler
from amazon_transcribe.model import TranscriptEvent
class MyHandler(TranscriptResultStreamHandler):
    async def handle_transcript_event(self, transcript_event: TranscriptEvent):
        for result in transcript_event.transcript.results:
            for alt in result.alternatives:
                print("Transcript:", alt.transcript)
async def basic_transcribe():
    client = TranscribeStreamingClient(region="us-east-2")
    stream = await client.start_stream_transcription(
        language_code="en-US",
        media_sample_rate_hz=44100,
        media_encoding="pcm"                            # Pulse-code modulation
    )
    handler = twtech-Handler(stream.output_stream)
    await asyncio.gather(read_microphone(stream), handler.handle_events())

# pcm (Pulse-code modulation)

3. Reference Architectures Batch Workflow

# Reference Architectures Streaming Workflow

🔐 4. IAM Permissions (Typical IAM policy for transcription service):


 transcribe:StartTranscriptionJob
 transcribe:GetTranscriptionJob
 transcribe:DeleteTranscriptionJob
 transcribe:StartStreamTranscription
 s3:GetObject
 s3:PutObject
 kinesis:PutRecord (if streaming output to Kinesis)

5. Output Samples Batch Transcript JSON :


{
  "jobName": "twtech-transcription-job",
  "results": {
    "transcripts": [
      { "transcript": "Hello and welcome to twtech-transcribe-job." }
    ],
    "items": [
      {
        "start_time": "0.54",
        "end_time": "0.95",
        "alternatives": [
          { "confidence": "0.98", "content": "Hello Team" }
        ],
        "type": "pronunciation"
      }
    ]
  },
  "status": "COMPLETED"
}

Streaming Transcript (event payload):


{
  "TranscriptEvent": {
    "Transcript": {
      "Results": [
        {
          "Alternatives": [
            { "Transcript": "Good morning twtech-Team" }
          ],
          "IsPartial": true
        }
      ]
    }
  }
}

 6. Best Practices

Batch vs Streaming: Use batch for pre-recorded files, streaming for sub-second captions or call analytics.
Custom Vocabulary for brand names, medical terms, etc.
Vocabulary Filtering to mask/block sensitive words.
Post-process with Amazon Comprehend (sentiment, entities, key phrases).
Combine with Amazon Translate for multilingual captions.
Speaker Diarization: works best with <10 speakers, clean audio.
Store transcripts in OpenSearch for searchable meeting/call archives.

🚨 7. Common Pitfalls

Audio quality matters: background noise and overlapping speech reduce accuracy.
Latency in Streaming: expect ~1–2 sec for stable partial → final transcripts.
File format support: WAV, MP3, MP4, FLAC; no AAC-in-MP4 (needs conversion).
Storage cost: transcripts stored in S3 add up; lifecycle policies help.
Multi-language limits: Some languages don’t yet support streaming or custom vocab.

Advanced use case patterns:

Contact Center Analytics: Record call → Transcribe → Comprehend → sentiment dashboard.
Media Captioning: Live video → Kinesis → Transcribe → Translate → Amazon IVS / MediaLive captions.
Compliance: Stream → Transcribe → DynamoDB → alert on prohibited keywords.

# twtech-sample-transcribe.yaml

AWSTemplateFormatVersion: '2010-09-09'

Description: ->

SAM template for a ready-to-deploy Transcribe pipeline.

Flow: S3 (uploads/) -> StartTranscribe Lambda -> Transcribe writes JSON to S3 (transcripts/) -> ProcessTranscript Lambda -> DynamoDB + OpenSearch + Comprehend

Transform: AWS::Serverless-2016-10-31

Globals:

Function:

Timeout: 60

Runtime: python3.9

MemorySize: 512

Parameters:

StageName:

Type: String

Default: dev

TranscriptsBucketName:

Type: String

Default: transcribe-pipeline-bucket-${AWS::accountId}

OpenSearchEnabled:

Type: String

AllowedValues: ["true","false"]

Default: "false"

Resources:

TranscribeBucket:

Type: AWS::S3::twtech-s3Bucket

Properties:

BucketName: !twtech-Ref TranscriptsBucketName

NotificationConfiguration:

# Notifications wired by Serverless Events below; additional config optional

TranscribeUploadsPrefix:

Type: AWS::S3::twtech-BucketPolicy

Properties:

Bucket: !Ref TranscribeBucket

PolicyDocument:

Version: '2012-10-17'

Statement: []

TranscribeJobsTable:

Type: AWS::DynamoDB::twtech-transcribe-Table

Properties:

TableName: !Sub transcribe-jobs-${twtech-StageName}

AttributeDefinitions:

- AttributeName: twtech-JobId

AttributeType: S

KeySchema:

- AttributeName: twteh-JobId

KeyType: HASH

BillingMode: PAY_PER_REQUEST

# Optional OpenSearch Domain (disabled by default).

# If twtech enables, its has to configure VPC/Access policies for production.

OpenSearchDomain:

Type: AWS::OpenSearchService::twtech-Domain

Condition: OpenSearchOn

Properties:

DomainName: !Sub transcribe-index-${twtech-StageName}

EngineVersion: 'OpenSearch_1.3'

ClusterConfig:

InstanceType: t3.small.search

InstanceCount: 1

EBSOptions:

EBSEnabled: true

VolumeSize: 10

AccessPolicies:

Version: '2012-10-17'

Statement:

- Effect: Allow

Principal: '*'

Action: 'es:*'

Resource: '*'

StartTranscribeFunctionRole:

Type: AWS::IAM::Role

Properties:

AssumeRolePolicyDocument:

Version: '2012-10-17'

Statement:

- Effect: Allow

Principal:

Service:

- lambda.amazonaws.com

Action: sts:AssumeRole

ManagedPolicyArns:

- arn:aws:iam::aws:policy/service-role/AWSLambdaBasicExecutionRole

Policies:

- PolicyName: twtech-TranscribeStartPolicy

PolicyDocument:

Version: '2012-10-17'

Statement:

- Effect: Allow

Action:

- transcribe:StartTranscriptionJob

- s3:GetObject

- s3:PutObject

Resource: '*'

ProcessTranscriptFunctionRole:

Type: AWS::IAM::Role

Properties:

AssumeRolePolicyDocument:

Version: '2012-10-17'

Statement:

- Effect: Allow

Principal:

Service:

- lambda.amazonaws.com

Action: sts:AssumeRole

ManagedPolicyArns:

- arn:aws:iam::aws:policy/service-role/AWSLambdaBasicExecutionRole

Policies:

- PolicyName: twtech-ProcessTranscriptPolicy

PolicyDocument:

Version: '2012-10-17'

Statement:

- Effect: Allow

Action:

- s3:GetObject

- s3:PutObject

Resource: '*'

- Effect: Allow

Action:

- dynamodb:PutItem

- dynamodb:UpdateItem

Resource: !GetAtt TranscribeJobsTable.Arn

- Effect: Allow

Action:

- comprehend:DetectEntities

- comprehend:DetectKeyPhrases

- comprehend:DetectSentiment

Resource: '*'

- Effect: Allow

Action:

- es:ESHttpPost

- es:ESHttpPut

Resource: '*'

StartTranscribeFunction:

Type: AWS::Serverless::Function

Properties:

FunctionName: !Sub start-transcribe-${twtech-StageName}

Handler: app.lambda_handler

Role: !GetAtt StartTranscribeFunctionRole.Arn

CodeUri: ./start_transcribe/

Environment:

Variables:

OUTPUT_BUCKET: !twtech-Ref TranscribeBucket

JOBS_TABLE: !twtech-Ref TranscribeJobsTable

Events:

S3UploadEvent:

Type: S3

Properties:

Bucket: !twtech-Ref TranscribeBucket

Events: s3:ObjectCreated:Put

Filter:

S3Key:

Rules:

- Name: prefix

Value: uploads/

ProcessTranscriptFunction:

Type: AWS::Serverless::Function

Properties:

FunctionName: !Sub process-transcript-${twtech-StageName}

Handler: app.lambda_handler

Role: !GetAtt ProcessTranscriptFunctionRole.Arn

CodeUri: ./process_transcript/

Environment:

Variables:

JOBS_TABLE: !twtech-Ref TranscribeJobsTable

OPENSEARCH_ENABLED: !Ref OpenSearchEnabled

OPENSEARCH_DOMAIN: !If [OpenSearchOn, !GetAtt OpenSearchDomain.DomainEndpoint, ""]

Events:

TranscriptsPut:

Type: S3

Properties:

Bucket: !Ref TranscribeBucket

Events: s3:ObjectCreated:Put

Filter:

S3Key:

Rules:

- Name: prefix

Value: transcripts/

Conditions:

OpenSearchOn: !Equals [!Ref OpenSearchEnabled, "true"]

Outputs:

BucketName:

Description: S3 bucket used for uploads and transcripts

Value: !twtech-Ref TranscribeBucket

TranscribeJobsTableName:

Description: DynamoDB table for job metadata

Value: !twtech-Ref TranscribeJobsTable

OpenSearchEndpoint:

Description: OpenSearch endpoint (if enabled)

Value: !If [OpenSearchOn, !GetAtt OpenSearchDomain.DomainEndpoint, "Disabled"]

Insights:

Key Features and Functionality

Transcription Methods: It supports both batch transcription (for pre-recorded files stored in an Amazon S3 bucket) and real-time streaming transcription.
Accuracy Improvement: Users can create custom vocabularies for domain-specific terms (e.g., brand names, acronyms) or custom language models to improve transcription accuracy for specific use cases.
Content Moderation: The service can automatically mask, remove, or flag specific terms (like profane words) from transcripts using vocabulary filters.
It also offers Toxicity Detection, which identifies toxic content across categories like hate speech and abuse using both audio and text cues.
Data Redaction: Amazon Transcribe can identify and redact personally identifiable information (PII) like names, email addresses, and phone numbers from the generated transcripts.
Speaker Recognition: The service offers speaker diarization, which automatically recognizes and labels different speakers in a conversation (up to 30 unique speakers).
Output Formatting: Transcripts include automatic punctuation, number normalization (converting spoken numbers into digits), word-level confidence scores, and timestamps for easy navigation or subtitle generation.
Language Support: It supports over 30 languages and features automatic language identification.

Link to official documentation: Amazon Web Services (AWS) account.

https://aws.amazon.com/transcribe/

Project: Hands-On

How twtech creates and use Amazon Transcrib for it Serivices.

Search for aws service: Transcribe.

Create a transcript:

Real-time transcription:

See how Amazon Transcribe creates a text copy of speech in real time. Choose Start streaming and talk.

Language settings:

twtech can select a specific language for its transcription or have Amazon Transcribe identify the predominant language in its media and perform the transcription in that language.

Start streaming:

“ Hello engineers”.

How PII (Personal Identity Information) content is remove from transcript:

This example shows how to remove PII redaction (hIdden)
After enabling PII removal:

Hello my is Pat and my pin is 0000

Language settings: Automatic language identification.

Again, twtech can select a specific language for its transcription or have Amazon Transcribe identify the predominant language in its media and perform the transcription in that language.

How twtech tests the configuration for: Automatic language identification

Hello, my name is Pat.
Est-ce-que tu va bien?

Think - with -Tech

Wednesday, September 10, 2025

Amazon Transcribe 🎙️| Overview & Hands-On.

1. What Amazon Transcribe Does

2. Sample APIs Batch Transcription (Async)

# pcm (Pulse-code modulation)

3. Reference Architectures Batch Workflow

# Reference Architectures Streaming Workflow

🔐 4. IAM Permissions (Typical IAM policy for transcription service):

5. Output Samples Batch Transcript JSON :

Real-time transcription:

No comments:

Post a Comment

Amazon EventBridge | Overview.

Blog Archive