You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This guide shows how to analyze utterance gaps from multiple pre-recorded audio files to automatically determine optimal turn detection settings for real-time streaming transcription. It processes an entire folder, aggregates gap statistics across all recordings, and configures the WebSocket with parameters tailored to your specific conversation patterns.
6
6
7
7
## Quickstart
8
-
9
8
```python
10
9
import requests
11
10
import time
@@ -575,7 +574,7 @@ if __name__ == "__main__":
575
574
main()
576
575
```
577
576
578
-
## Step-by-step guide
577
+
## Step-By-Step Guide
579
578
580
579
Before we begin, make sure you have an AssemblyAI account and an API key. You can [sign up](https://assemblyai.com/dashboard/signup) and get your API key from your dashboard.
Sets up API credentials, file paths, audio parameters (16kHz sample rate, mono channel), and initializes global variables for managing WebSocket connections and audio streaming threads.
588
+
589
+
Set up API credentials, file paths, audio parameters (16kHz sample rate, mono channel), and initialize global variables for managing WebSocket connections and audio streaming threads.
590
590
591
591
```python
592
592
import requests
@@ -624,7 +624,8 @@ OPTIMIZED_CONFIG = {}
624
624
```
625
625
626
626
3.**Define get_audio_files() Function**
627
-
Scans a specified folder for audio/video files with supported extensions and returns a sorted list of file paths for batch processing.
627
+
628
+
This function scans a specified folder for audio/video files with supported extensions and returns a sorted list of file paths for batch processing.
Uploads an audio file to AssemblyAI, requests transcription with speaker labels enabled, polls until completion, then calculates gap statistics between utterances (average, median, min, max) and saves the transcript JSON.
654
+
655
+
This function uploads an audio file to AssemblyAI, requests transcription with speaker labels enabled, polls until completion, then calculates gap statistics between utterances (average, median, min, max) and saves the transcript JSON.
Orchestrates the analysis of all files in a folder by calling `analyze_single_file()` for each, aggregates all gap data across files, calculates overall statistics, displays per-file breakdowns, and saves a comprehensive summary JSON.
767
+
768
+
This function orchestrates the analysis of all files in a folder by calling `analyze_single_file()` for each, aggregates all gap data across files, calculates overall statistics, displays per-file breakdowns, and saves a comprehensive summary JSON.
Takes aggregated gap statistics and selects one of three preset configurations (Aggressive <500ms,Balanced500-1000ms,Conservative >1000ms) with optimized turn detection parameters for different conversation styles.
885
+
886
+
This function takes aggregated gap statistics and selects one of three preset configurations with optimized turn detection parameters for different conversation styles.
Manage the real-time streaming connection lifecycle: `on_open` starts the audio streaming thread, `on_message` processes transcription results (partial and final turns), and the close/error handlers clean up resources.
951
+
952
+
These functions manage the real-time streaming connection lifecycle: `on_open` starts the audio streaming thread, `on_message` processes transcription results (partial and final turns), and the close/error handlers clean up resources.
Initializes PyAudio to capture microphone input, establishes a WebSocket connection with the optimized configuration parameters, and streams audio in real-time while displaying transcription results until the user stops with Ctrl+C.
1039
+
1040
+
This function initializes PyAudio to capture microphone input, establishes a WebSocket connection with the optimized configuration parameters, and streams audio in real-time while displaying transcription results until the user stops with Ctrl+C.
1035
1041
1036
1042
```python
1037
1043
defrun_streaming(config):
@@ -1133,7 +1139,8 @@ def run_streaming(config):
1133
1139
```
1134
1140
1135
1141
9.**Define `main()` Workflow**
1136
-
Executes the three-step process: analyze all audio files in the folder, determine the best streaming configuration based on aggregated utterance gaps, then launch real-time streaming with the optimized settings.
1142
+
1143
+
Execute the three-step process: analyze all audio files in the folder, determine the best streaming configuration based on aggregated utterance gaps, then launch real-time streaming with the optimized settings.
0 commit comments