Amazon Kendra - Overview.
Scope:
- Quick elevator (one-liner),
- Core concepts & components,
- How Kendra “thinks” about relevance,
- Query types & behavior,
- GenAI integration (what’s different),
- Security & governance,
- Scalability, performance & capacity,
- Observability & metrics to watch,
- Costs & pricing model (high level),
- Integrations & common architectures,
- Best practices & tuning checklist,
- Common pitfalls & gotchas,
- Sample reference architecture (textual diagram),
- An Architecture diagram,
- Link to AWS Documentation,
- Insights.
Quick elevator (one-liner)
- Amazon Kendra is a managed, ML-powered enterprise search service that provides semantic / natural-language search over heterogeneous content (S3, SharePoint, databases, FAQs, etc.).
- Amazon Kendra is equipped with features for:
- Relevance tuning,
- Connectors,
- Document enrichment,
- (newer) GenAI-index integration for retrieval-augmented responses.
Core concepts & components
- Index — the primary unit: stores documents, attributes, embeddings/representations, and ranking metadata.
- Index types include Developer, Enterprise, and GenAI Enterprise (for easier
reuse with generative tools).
- Data sources / Connectors — native/scheduled connectors for S3, SharePoint, Salesforce, RDS, Confluence, Drive, etc.
- twtech can also implement custom connectors or push via API.
- Ingestion pipeline — fetch (connector) → convert/parse (file types: PDF, Office, HTML, text, etc.) → optional pre-processing/enrichment → indexing. Custom Document Enrichment lets twtech alter attributes/content at ingest time.
- Document chunking & fields — documents are broken into passages/paragraph chunks; fields/metadata (author, source, date, doc-type) are indexed separately so twtech can boost/filter by them.
- Query engine — combines semantic matching (deep models / embeddings), lexical matching, and a ranking model that uses field boosts, recency, and other signals to produce a ranked result set.
- Supports direct answers (answer extraction) and returning source passages.
How Kendra “thinks” about relevance
- Hybrid scoring —
Kendra blends semantic similarity (contextual/embedding-like
signals) with traditional retrieval and engineered ranking features.
- Relevance tuning — twtech can boost documents by field, source, or attribute (e.g., boost FAQs, fresh docs, or a trusted source). This adjusts which documents are favored in ranking; it doesn’t force inclusion but biases ranking. There are UI and API ways to tune.
- Synonyms & domain vocab — upload synonym lists and domain-specific terminology so queries with alternate terms still match the correct content.
Query types & behavior
- Natural-language questions (Who, What, How) → Kendra extracts concise answers when the exact answer
is present in content.
- Descriptive answers → returns passages or full documents for how-to or explanatory queries.
- Keyword / faceted search → supports filters/facets on metadata and structured attributes.
- Limits — Kendra is not designed to aggregate answers requiring cross-document synthesis or to perform arbitrary calculations over content (it favors direct extraction and ranking).
GenAI integration (what’s
different)
- GenAI Enterprise Index: allows
Kendra-indexed data to be used as a retriever for generative systems (e.g., Bedrock, other RAG setups)
— easier reuse of indexed knowledge for assistants and prompt flows.
- It also has more granular capacity options.
Security & governance
- Encryption — data encrypted at rest (AWS-owned/managed/customer KMS) and in transit (TLS).
- Access control — integrates with IAM and supports document-level access control (ACL propagation from source connectors) so search results can be filtered by user permissions.
- Audit / compliance — use CloudTrail and CloudWatch for logging queries, index actions, and connector activity.
Scalability, performance & capacity
- Capacity model —
twtech provisions index capacity units (storage + query units) and these are billed per-hour for
storage/query capacity; query throughput and latency correlate with
provisioned capacity.
- Sync modes — connectors can schedule incremental syncs; for heavy-change environments twtech can push deltas via API to avoid full crawls.
Observability & metrics to watch
- Query
latency,
- Query units utilization,
- Error rate for connectors,
- Document ingestion throughput,
- Freshness lag,
- Top-queries with no matches (helpful for gap analysis).
- Use CloudWatch dashboards + logs to track and alert.
Costs & pricing model (high
level)
Main bill drivers:
- Number/type of
indexes (index hours),
- Storage units for index content,
- Query units (query-hours and/or per-query costs depending on edition),
- Connectors and connector hours.
- GenAI index tiers may have different starting prices and finer-grained capacity.
- Always validate current pricing in twtech AWS account/region.
Integrations & common architectures
- Intranet / knowledge portal: Kendra
index + Experience Builder (or custom UI) + authentication via your
SSO + connectors to S3/SharePoint/Drive.
- Customer support / help center: Kendra behind a chat UI (Lex/third-party) to power FAQ answering and surfacing KB articles; use Custom Document Enrichment to surface product metadata/intent.
- RAG / virtual assistant: Kendra GenAI index used as retriever in a RAG pipeline (retriever → reranker → generator) to ground generative responses in company documents.
Best practices & tuning checklist
- Start small, iterate: index a representative subset (top 5–10 content sources) and tune relevance there.
- Enrich on ingest: normalize metadata (product IDs, region, department) via CDE(custom document enrichment) so twtech can filter/boost reliably.
- Use synonyms & FAQs: capture domain synonyms and explicit FAQ entries for high-value Q→A mapping.
- Relevance tuning: add source/freshness boosts for authoritative or time-sensitive content; measure impact with A/B tests (query logs).
- Permissions: propagate ACLs from sources or enforce via attributes to prevent oversharing.
- Instrument queries: capture zero-result queries and answer-quality feedback so twtech can iteratively re-index or add clarifying content.
Common pitfalls & gotchas
- Expectations mismatch: Kendra
extracts direct answers only when the exact answer text exists; generative
paraphrasing is limited unless twtech pairs with a generation model.
- Poor metadata: if sources lack structured attributes, tuning and faceting are weaker — invest in enrichment.
- Overfitting boosts: overly aggressive boosts can hide useful documents; monitor search quality after each change.
Sample reference architecture (textual
diagram)
- Sources: SharePoint, S3 (documents), RDS
(DB), Salesforce.
- Connectors: scheduled syncs + webhook pushes for near real-time.
- Preprocessing: optional Lambda jobs / Comprehend / Transcribe to extract entities, OCR, or audio transcripts.
- Kendra Index (GenAI/Enterprise) with Custom Document Enrichment.
- Application layer: Search API / Experience Builder / Chat UI (with RAG: Kendra as retriever + generator service).
- Security & Ops: IAM, KMS, CloudWatch, CloudTrail.
An Architecture
diagram image showing:
Connectors → Kendra → App → Security.
Link to AWS Documentation:
https://docs.aws.amazon.com/kendra/latest/dg/hiw-index-types.html?utm_source=chatgpt.com
twtech-Insights:
No comments:
Post a Comment