Wednesday, September 10, 2025

Amazon Transcribe ๐ŸŽ™️| Overview & Hands-On.

  Amazon Transcribe ๐ŸŽ™️- Overview & Hands-On.

Scope:

  • Intro,
  • What Amazon Transcribe Does,
  • Sample APIs Batch Transcription (Async),
  • Reference Architectures Batch Workflow,
  • Reference Architectures Streaming Workflow,
  • IAM Permissions (Typical IAM policy for transcription service),
  • Output Samples Batch Transcript JSON,
  • Best Practices,
  • Common Pitfalls,
  • Advanced use case patterns,
  • Link to official documentation
  • Project: Hands-On.

Intro:
    • Amazon Transcribe is an automatic speech recognition (ASR) service from Amazon Web Services (AWS). 
    • Amazon Transcribe converts audio and video speech into text using machine learning models
    • Amazon Transcribe is a scalable and secure service used by developers to add speech-to-text capabilities to applications for various use cases, such as in contact centers for transcribing conversations or in classrooms for creating notes.

 1. What Amazon Transcribe Does

Amazon Transcribe is AWS’s speech-to-text (STT) service.
It converts audio/video into time-stamped text and supports:

    • Batch Transcription (stored files, async)
    • Real-time / Streaming Transcription (low-latency speech recognition)
    • Domain-specific customization:

      Custom Vocabulary (add brand names, jargon)
      • Custom Language Models (train with your own text corpus)
      • Vocabulary Filtering (block words, profanity filter)
    • Speaker Diarization (who said what)
    • Channel Identification (multi-channel audio)
    • Timestamps + Confidence scores

 2. Sample APIs Batch Transcription (Async)

import boto3 transcribe = boto3.client('transcribe') job_name = "twtech-transcription-job" job_uri = "s3://twtech-s3bucket/audio-file.wav" transcribe.start_transcription_job( TranscriptionJobName=twtech-job_name, Media={'MediaFileUri': job_uri}, MediaFormat='wav', LanguageCode='en-US', OutputBucketName='twtech-output-bucket', Settings={ 'ShowSpeakerLabels': True, 'MaxSpeakerLabels': 2, 'VocabularyName': 'twtech-custom-vocab' } ) # Later, check the status result = transcribe.get_transcription_job(TranscriptionJobName=twtech-job_name)

Output JSON is written to S3 Streaming Transcription (Real-time)

    • Supports WebSocket and gRPC.
    • Use AWS SDKs (Python, JS, Java) or Amazon Transcribe Streaming SDK.

Sample (Python):

import boto3 import asyncio from amazon_transcribe.client import TranscribeStreamingClient from amazon_transcribe.handlers import TranscriptResultStreamHandler from amazon_transcribe.model import TranscriptEvent class MyHandler(TranscriptResultStreamHandler): async def handle_transcript_event(self, transcript_event: TranscriptEvent): for result in transcript_event.transcript.results: for alt in result.alternatives: print("Transcript:", alt.transcript) async def basic_transcribe(): client = TranscribeStreamingClient(region="us-east-2") stream = await client.start_stream_transcription( language_code="en-US", media_sample_rate_hz=44100, media_encoding="pcm" # Pulse-code modulation ) handler = twtech-Handler(stream.output_stream) await asyncio.gather(read_microphone(stream), handler.handle_events())

# pcm (Pulse-code modulation)

 3. Reference Architectures Batch Workflow

# Reference Architectures Streaming Workflow

๐Ÿ” 4. IAM Permissions (Typical IAM policy for transcription service):

    • transcribe:StartTranscriptionJob
    • transcribe:GetTranscriptionJob
    • transcribe:DeleteTranscriptionJob
    • transcribe:StartStreamTranscription
    • s3:GetObject
    • s3:PutObject
    • kinesis:PutRecord (if streaming output to Kinesis)

 5. Output Samples Batch Transcript JSON :

{ "jobName": "twtech-transcription-job", "results": { "transcripts": [ { "transcript": "Hello and welcome to twtech-transcribe-job." } ], "items": [ { "start_time": "0.54", "end_time": "0.95", "alternatives": [ { "confidence": "0.98", "content": "Hello Team" } ], "type": "pronunciation" } ] }, "status": "COMPLETED" }

Streaming Transcript (event payload):

{ "TranscriptEvent": { "Transcript": { "Results": [ { "Alternatives": [ { "Transcript": "Good morning twtech-Team" } ], "IsPartial": true } ] } } }

 6. Best Practices

    • Batch vs Streaming: Use batch for pre-recorded files, streaming for sub-second captions or call analytics.
    • Custom Vocabulary for brand names, medical terms, etc.
    • Vocabulary Filtering to mask/block sensitive words.
    • Post-process with Amazon Comprehend (sentiment, entities, key phrases).
    • Combine with Amazon Translate for multilingual captions.
    • Speaker Diarization: works best with <10 speakers, clean audio.
    • Store transcripts in OpenSearch for searchable meeting/call archives. 

 ๐Ÿšจ 7. Common Pitfalls

    • Audio quality matters: background noise and overlapping speech reduce accuracy.
    • Latency in Streaming: expect ~1–2 sec for stable partial → final transcripts.
    • File format support: WAV, MP3, MP4, FLAC; no AAC-in-MP4 (needs conversion).
    • Storage cost: transcripts stored in S3 add up; lifecycle policies help.
    • Multi-language limits: Some languages don’t yet support streaming or custom vocab.

 Advanced use case patterns:

    • Contact Center Analytics: Record call Transcribe Comprehend sentiment dashboard.
    • Media Captioning: Live video Kinesis Transcribe Translate Amazon IVS / MediaLive captions.
    • Compliance: Stream Transcribe DynamoDB alert on prohibited keywords.

# twtech-sample-transcribe.yaml

AWSTemplateFormatVersion: '2010-09-09'
Description: ->
  SAM template for a ready-to-deploy Transcribe pipeline.
  Flow: S3 (uploads/) -> StartTranscribe Lambda -> Transcribe writes JSON to S3 (transcripts/) -> ProcessTranscript Lambda -> DynamoDB + OpenSearch + Comprehend
Transform: AWS::Serverless-2016-10-31
Globals:
  Function:
    Timeout: 60
    Runtime: python3.9
    MemorySize: 512
Parameters:
  StageName:
    Type: String
    Default: dev
  TranscriptsBucketName:
    Type: String
    Default: transcribe-pipeline-bucket-${AWS::accountId}
  OpenSearchEnabled:
    Type: String
    AllowedValues: ["true","false"]
    Default: "false"
Resources:
  TranscribeBucket:
    Type: AWS::S3::twtech-s3Bucket
    Properties:
      BucketName: !twtech-Ref TranscriptsBucketName
      NotificationConfiguration:
        # Notifications wired by Serverless Events below; additional config optional      
  TranscribeUploadsPrefix:
    Type: AWS::S3::twtech-BucketPolicy
    Properties:
      Bucket: !Ref TranscribeBucket
      PolicyDocument:
        Version: '2012-10-17'
        Statement: []
  TranscribeJobsTable:
    Type: AWS::DynamoDB::twtech-transcribe-Table
    Properties:
      TableName: !Sub transcribe-jobs-${twtech-StageName}
      AttributeDefinitions:
        - AttributeName: twtech-JobId
          AttributeType: S
      KeySchema:
        - AttributeName: twteh-JobId
          KeyType: HASH
      BillingMode: PAY_PER_REQUEST
  # Optional OpenSearch Domain (disabled by default). 
# If twtech enables, its has to configure VPC/Access policies for production.
  OpenSearchDomain:
    Type: AWS::OpenSearchService::twtech-Domain
    Condition: OpenSearchOn
    Properties:
      DomainName: !Sub transcribe-index-${twtech-StageName}
      EngineVersion: 'OpenSearch_1.3'
      ClusterConfig:
        InstanceType: t3.small.search
        InstanceCount: 1
      EBSOptions:
        EBSEnabled: true
        VolumeSize: 10
      AccessPolicies:
        Version: '2012-10-17'
        Statement:
          - Effect: Allow
            Principal: '*'
            Action: 'es:*'
            Resource: '*'
  StartTranscribeFunctionRole:
    Type: AWS::IAM::Role
    Properties:
      AssumeRolePolicyDocument:
        Version: '2012-10-17'
        Statement:
          - Effect: Allow
            Principal:
              Service:
                - lambda.amazonaws.com
            Action: sts:AssumeRole
      ManagedPolicyArns:
        - arn:aws:iam::aws:policy/service-role/AWSLambdaBasicExecutionRole
      Policies:
        - PolicyName: twtech-TranscribeStartPolicy
          PolicyDocument:
            Version: '2012-10-17'
            Statement:
              - Effect: Allow
                Action:
                  - transcribe:StartTranscriptionJob
                  - s3:GetObject
                  - s3:PutObject
                Resource: '*'
  ProcessTranscriptFunctionRole:
    Type: AWS::IAM::Role
    Properties:
      AssumeRolePolicyDocument:
        Version: '2012-10-17'
        Statement:
          - Effect: Allow
            Principal:
              Service:
                - lambda.amazonaws.com
            Action: sts:AssumeRole
      ManagedPolicyArns:
        - arn:aws:iam::aws:policy/service-role/AWSLambdaBasicExecutionRole
      Policies:
        - PolicyName: twtech-ProcessTranscriptPolicy
          PolicyDocument:
            Version: '2012-10-17'
            Statement:
              - Effect: Allow
                Action:
                  - s3:GetObject
                  - s3:PutObject
                Resource: '*'
              - Effect: Allow
                Action:
                  - dynamodb:PutItem
                  - dynamodb:UpdateItem
                Resource: !GetAtt TranscribeJobsTable.Arn
              - Effect: Allow
                Action:
                  - comprehend:DetectEntities
                  - comprehend:DetectKeyPhrases
                  - comprehend:DetectSentiment
                Resource: '*'
              - Effect: Allow
                Action:
                  - es:ESHttpPost
                  - es:ESHttpPut
                Resource: '*'
  StartTranscribeFunction:
    Type: AWS::Serverless::Function
    Properties:
      FunctionName: !Sub start-transcribe-${twtech-StageName}
      Handler: app.lambda_handler
      Role: !GetAtt StartTranscribeFunctionRole.Arn
      CodeUri: ./start_transcribe/
      Environment:
        Variables:
          OUTPUT_BUCKET: !twtech-Ref TranscribeBucket
          JOBS_TABLE: !twtech-Ref TranscribeJobsTable
      Events:
        S3UploadEvent:
          Type: S3
          Properties:
            Bucket: !twtech-Ref TranscribeBucket
            Events: s3:ObjectCreated:Put
            Filter:
              S3Key:
                Rules:
                  - Name: prefix
                    Value: uploads/
  ProcessTranscriptFunction:
    Type: AWS::Serverless::Function
    Properties:
      FunctionName: !Sub process-transcript-${twtech-StageName}
      Handler: app.lambda_handler
      Role: !GetAtt ProcessTranscriptFunctionRole.Arn
      CodeUri: ./process_transcript/
      Environment:
        Variables:
          JOBS_TABLE: !twtech-Ref TranscribeJobsTable
          OPENSEARCH_ENABLED: !Ref OpenSearchEnabled
          OPENSEARCH_DOMAIN: !If [OpenSearchOn, !GetAtt OpenSearchDomain.DomainEndpoint, ""]
      Events:
        TranscriptsPut:
          Type: S3
          Properties:
            Bucket: !Ref TranscribeBucket
            Events: s3:ObjectCreated:Put
            Filter:
              S3Key:
                Rules:
                  - Name: prefix
                    Value: transcripts/
Conditions:
  OpenSearchOn: !Equals [!Ref OpenSearchEnabled, "true"]

Outputs:
  BucketName:
    Description: S3 bucket used for uploads and transcripts
    Value: !twtech-Ref TranscribeBucket
  TranscribeJobsTableName:
    Description: DynamoDB table for job metadata
    Value: !twtech-Ref TranscribeJobsTable
  OpenSearchEndpoint:
    Description: OpenSearch endpoint (if enabled)
    Value: !If [OpenSearchOn, !GetAtt OpenSearchDomain.DomainEndpoint, "Disabled"]

Insights:

Key Features and Functionality
    • Transcription Methods: It supports both batch transcription (for pre-recorded files stored in an Amazon S3 bucket) and real-time streaming transcription.
    • Accuracy Improvement: Users can create custom vocabularies for domain-specific terms (e.g., brand names, acronyms) or custom language models to improve transcription accuracy for specific use cases.
    • Content Moderation: The service can automatically mask, remove, or flag specific terms (like profane words) from transcripts using vocabulary filters
    • It also offers Toxicity Detection, which identifies toxic content across categories like hate speech and abuse using both audio and text cues.
    • Data Redaction: Amazon Transcribe can identify and redact personally identifiable information (PII) like names, email addresses, and phone numbers from the generated transcripts.
    • Speaker Recognition: The service offers speaker diarization, which automatically recognizes and labels different speakers in a conversation (up to 30 unique speakers).
    • Output Formatting: Transcripts include automatic punctuation, number normalization (converting spoken numbers into digits), word-level confidence scores, and timestamps for easy navigation or subtitle generation.
    • Language Support: It supports over 30 languages and features automatic language identification. 
Link to official documentation: Amazon Web Services (AWS) account.
https://aws.amazon.com/transcribe/

Project: Hands-On

    • How twtech creates and use Amazon Transcrib for it Serivices.

Search for aws service: Transcribe.

  • Create a transcript:

Real-time transcription:

  • See how Amazon Transcribe creates a text copy of speech in real time. Choose Start streaming and talk.

Language settings: 

  • twtech can select a specific language for its transcription or have Amazon Transcribe identify the predominant language in its media and perform the transcription in that language.

Start streaming: 

  • “ Hello engineers”.

  • How PII (Personal Identity Information)  content is remove from transcript:

  • This example shows how to remove PII redaction (hIdden)
  •  After enabling PII removal: 

Hello my is Pat and my pin is 0000

Language settings: Automatic language identification.

  •  Again, twtech can select a specific language for its transcription or have Amazon Transcribe identify the predominant language in its media and perform the transcription in that language.

How twtech tests the configuration for: Automatic language identification

Hello, my name is Pat.

Est-ce-que tu va bien?




No comments:

Post a Comment

Amazon EventBridge | Overview.

Amazon EventBridge - Overview. Scope: Intro, Core Concepts, Key Benefits, Link to official documentation, Insights. Intro: Amazon EventBridg...