Transcribe
The transcribing feature utilizes WhisperX (an open-source wrapper on Whisper with additional functionality for detecting start and stop times for each word) to transcribe audio or video. Transcribing the content produces a Transcription object with comprehensive transcription information including the word-level, character-level, and sentence-level timestamps. Transcribing content is a prerequisite for clipping content.
Usage
from clipsai import Transcriber
transcriber = Transcriber()
transcription: Transcription = transcriber.transcribe(
audio_file_path="/abs/path/to/video.mp4"
)
Transcriber Class
A class for transcribing audio or video using WhisperX.
Methods
- Name
transcribe
- Type
- -> Transcription
- Description
Transcribes an audio or video file.
Required Parameters
- Name
- audio_file_pathstring
- Description
Absolute path to the audio or video file to transcribe.
Optional Parameters
- Name
- iso6391_lang_codestring = None
- Description
ISO 639-1 language code to transcribe the media in. Default is None, which autodetects the media's language.
- Name
- batch_sizeint = 16
- Description
whisperx batch size. Reduce if low on GPU memory.
- Name
detect_language
- Type
- -> string
- Description
Detects the language of an audio or video file.
Required Parameters
- Name
- audio_file_pathstring
- Description
Absolute path to the audio or video file to transcribe.
Optional Parameters
- Name
- iso6391_lang_codestring = None
- Description
ISO 639-1 language code to transcribe the media in. Default is None, which autodetects the media's language.
- Name
- batch_sizeint = 16
- Description
whisperx batch size. Reduce if low on GPU memory.
Transcription Class
The Transcription class offers a detailed breakdown of audio or video transcriptions. It enables thorough analysis by providing structured access to the content at multiple levels - from individual characters and words to full sentences.
Properties
- Name
characters
- Type
- list[Character]
- Description
A list of characters from the text as Character objects and ordered by start time.
- Name
words
- Type
- list[Word]
- Description
A list of words from the text as Word objects and ordered by start time.
- Name
sentences
- Type
- list[Sentence]
- Description
A list of sentences from the text as Sentence objects and ordered by start time.
- Name
text
- Type
- string
- Description
The full textual content of the transcription.
- Name
language
- Type
- string
- Description
The ISO 639-1 language code of the transcription's language.
- Name
created_time
- Type
- datetime
- Description
The time when the transcription was created.
- Name
start_time
- Type
- float
- Description
The start time of the transcript in seconds.
- Name
end_time
- Type
- float
- Description
The end time of the transcript in seconds.
- Name
source_software
- Type
- string
- Description
The software used for transcribing.
Methods
- Name
find_word_index
- Type
- -> int
- Description
Finds the index in the transcript's character info who's start or end time is closest to 'target_time' (seconds).
Required Parameters
- Name
- target_timefloat
- Description
The time in seconds to search for.
- Name
- type_of_timestring: start | end
- Description
- start: returns the index of the word with the closest start time before target_time.
- end: returns the index of the word with the closest end time after target time.
- Name
find_sentence_index
- Type
- -> int
- Description
Finds the index in the transcript's sentence info who's start or end time is closest to 'target_time' (seconds).
Required Parameters
- Name
- target_timefloat
- Description
The time in seconds to search for.
- Name
- type_of_timestring: start | end
- Description
- start: returns the index of the sentence with the closest start time before target_time.
- end: returns the index of the sentence with the closest end time after target time.
Sentence Class
Represents a sentence in a transcription.
Properties
- Name
start_time
- Type
- float
- Description
The start time of the sentence in seconds.
- Name
end_time
- Type
- float
- Description
The end time of the sentence in seconds.
- Name
start_char
- Type
- int
- Description
The index of the sentence's start character in the full text.
- Name
end_char
- Type
- int
- Description
The index of the sentence's end character in the full text.
- Name
text
- Type
- string
- Description
The text of the word.
Methods
- Name
to_dict
- Type
- -> dict
- Description
Returns the properties of the sentence as a dictionary.
Word Class
Represents a word in a transcription.
Properties
- Name
start_time
- Type
- float
- Description
The start time of the word in seconds.
- Name
end_time
- Type
- float
- Description
The end time of the word in seconds.
- Name
start_char
- Type
- int
- Description
The index of the word's start character in the full text.
- Name
end_char
- Type
- int
- Description
The index of the word's end character in the full text.
- Name
text
- Type
- string
- Description
The text of the word.
Methods
- Name
to_dict
- Type
- -> dict
- Description
Returns the properties of the word as a dictionary.
Character Class
Represents a character in a transcription.
Properties
- Name
start_time
- Type
- float
- Description
The start time of the character in seconds.
- Name
end_time
- Type
- float
- Description
The end time of the character in seconds.
- Name
word_index
- Type
- int
- Description
The index of the word in the transcription of the character.
- Name
sentence_index
- Type
- int
- Description
The index of the sentence in the transcription of the character.
- Name
text
- Type
- string
- Description
The text of the character.
Methods
- Name
to_dict
- Type
- -> dict
- Description
Returns the properties of the character as a dictionary.