Clip

The clipping feature leverages the TextTiling algorithm to segment long-form audio content into coherent clips using the transcript. This approach, first conceptualized by Marti A. Hearst in the 1990s, detects shifts in the topics of a piece of content by analyzing word usage and distribution patterns. Thanks to recent advances in NLP, Texttiling with BERT Embeddings provides significant improvements over Texttiling's original formulation and can be readily applied to SoA transcriptions using Whisper. The algorithm segments the text at the granularity of sentences with the entire process focusing on detecting topic shifts rather than topics themselves. This is particularly effective in identifying distinct sections within a narrative and, consequently, clips of varying lengths optimized for short and extended audio content segments.

Usage

The following returns the start and end time of the clips.

from clipsai import ClipFinder, Transcriber

transcriber = Transcriber()
transcription = transcriber.transcribe(audio_file_path="/abs/path/to/video.mp4")

clipfinder = ClipFinder()
clips = clipfinder.find_clips(transcription=transcription)

print("StartTime: ", clips[0].start_time)
print("EndTime: ", clips[0].end_time)

To trim the video using the returned clips, run the following code.

media_editor = clipsai.MediaEditor()

# use this if the file contains audio stream only
media_file = clipsai.AudioFile("/abs/path/to/audio_only_file.mp4")
# use this if the file contains both audio and video stream
media_file = clipsai.AudioVideoFile("/abs/path/to/video.mp4")

clip = clips[0]  # select the clip you'd like to trim
clip_media_file = media_editor.trim(
    media_file=media_file,
    start_time=clip.start_time,
    end_time=clip.end_time,
    trimmed_media_file_path="/abs/path/to/clip.mp4",  # doesn't exist yet
)

ClipFinder Class

Source Code

A class for finding engaging clips based on the input transcript.

Methods

Name
find_clips
Type
-> list[Clip]
Description
Finds clips in an audio file's transcription using the TextTiling Algorithm.

Required Parameters

Name
transcriptionTranscription
Description
The transcription of an audio or video file to find clips from.

Clip Class

Source Code

Represents a clip of a video or audio file.

Properties

Name
start_time
Type
string
Description
The start time of the clip in seconds.
Name
end_time
Type
string
Description
The end time of the clip in seconds.
Name
start_char
Type
string
Description
The start character in the transcription of the clip.
Name
end_char
Type
string
Description
The end character in the transcription of the clip.

Methods

Name
copy
Type
-> Clip
Description
Returns a copy of the Clip instance.
Name
to_dict
Type
-> dict
Description
Returns a dictionary representation of the clip.