Case Study: Long-Form Psychiatric Session Voice Analytics

Overview

A mental health product team needed to turn 1.5h+ psychiatric voice sessions into structured, searchable artefacts: transcripts, summaries, risk indicators, and follow-up actions.

Their initial MVP (Next.js + Google Speech-to-Text + Vertex AI) proved the idea but was hard to operate, slow, and not clearly aligned with HIPAA / SOC 2 style requirements.

Webomage was brought in to refactor the MVP into a production-ready, observable system.

Challenges

Very long audio sessions (1.5h+), with dropouts and partial uploads.
Multiple AI components (STT, LLMs, post-processing) wired together in an ad-hoc way.
Latency and reliability issues during transcription and analysis.
Compliance expectations similar to HIPAA / SOC 2 (data handling, access control, auditability).
Product iteration pressure: the team wanted to keep experimenting while stabilising the platform.

Approach

Architecture & data-flow review
- Mapped how audio was uploaded, chunked, transcribed, analysed, and stored.
- Identified failure points, duplication, and missing observability.
Backend refactor with tRPC
- Introduced a tRPC-based backend to formalise API contracts between frontend and backend.
- Centralised validation, error handling, and request tracking.
Streaming-friendly audio & STT pipeline
- Adjusted how long audio was chunked and queued for STT to avoid timeouts.
- Improved retries, backoff, and idempotency for STT calls.
LLM processing & artefacts
- Structured LLM prompts and responses to produce clear artefacts: summaries, highlights, tags, and risk indicators.
- Ensured artefacts were versioned and traceable back to raw inputs.
Compliance-aware storage & access
- Clarified what needed to be stored, for how long, and who could access it.
- Improved separation of concerns between PII, raw audio, transcripts, and derived artefacts.
- Aligned logging and access patterns with HIPAA / SOC 2 style expectations.
Observability and cost visibility
- Added metrics and logs for STT and LLM usage, latency, and failures.
- Gave the team dashboards to understand performance and cost per session.

Outcomes

A production-ready pipeline for multi-hour psychiatric voice sessions, with clear failure modes and retries.
tRPC-backed backend that simplified frontend–backend integration and made the system easier to extend.
Better compliance posture through improved data flows, access control patterns, and auditability.
Visibility into performance and cost per session, enabling informed product decisions.

Relevant capabilities

This project leveraged several of Webomage’s strengths:

AI/LLM infrastructure and multi-step pipelines (STT + LLM + post-processing).
Strong DevOps and observability practices for AI-heavy workloads.
Compliance-aware design for sensitive healthcare/mental health data.
Experience turning MVPs into stable, evolvable products.

➡️ Have an AI/LLM MVP that needs to become a reliable product? Start a conversation.

Overview

Challenges

Approach

Outcomes

Relevant capabilities

Architectural Mastery at Scale

Expert Consultation