webvtt-py

Contents:

Quickstart

Installation

You can install webvtt-py with pip:

$ pip install webvtt-py

To install with easy_install:

$ easy_install webvtt-py

Requirements

This module requires Python 3.4+.

Source code

This project is hosted on GitHub.

License

Licensed under the MIT License.

Usage

Reading WebVTT caption files

import webvtt

# we can iterate over the captions
for caption in webvtt.read('captions.vtt'):
    print(caption.start)  # start timestamp in text format
    print(caption.end)  # end timestamp in text format
    print(caption.text)  # caption text

# you can also iterate over the lines of a particular caption
for line in vtt[0].lines:
    print(line)

# caption text is returned clean without class tags
# we can access the raw text of a caption with raw_text
>>> vtt[0].text
'This is a caption text'
>>> vtt[0].raw_text
'This is a <c.colorE5E5E5>caption</c> text'

# caption identifiers
>>> vtt[0].identifier
'crédit de transcription'

Reading WebVTT caption files from file-like object

import webvtt
import requests
from io import StringIO

payload = requests.get('http://subtitles.com/1234.vtt').text()
buffer = StringIO(payload)

for caption in webvtt.read_buffer(buffer):
    print(caption.start)
    print(caption.end)
    print(caption.text)

Creating captions

from webvtt import WebVTT, Caption

vtt = WebVTT()

# creating a caption with a list of lines
caption = Caption(
    '00:00:00.500',
    '00:00:07.000',
    ['Caption line 1', 'Caption line 2']
)

# adding a caption
vtt.captions.append(caption)

# creating another caption with a text
caption = Caption(
    '00:00:07.000',
    '00:00:11.890',
    'Caption line 1\nCaption line 2'
)

vtt.captions.append(caption)

Manipulating captions

import webvtt

vtt = webvtt.read('captions.vtt')

# update start timestamp
vtt[0].start = '00:00:01.250'

# update end timestamp
vtt[0].end = '00:00:03.890'

# update caption text
vtt[0].text = 'My caption text'

# delete a caption
del vtt.captions[2]

Saving captions

import webvtt

vtt = webvtt.read('captions.vtt')

# save to original file
vtt.save()

# save to a different file
vtt.save('my_captions.vtt')

# write to opened file
with open('my_captions.vtt', 'w') as fd:
    vtt.write(fd)

Converting captions

You can read captions from the following formats:

  • SubRip (.srt)
  • YouTube SBV (.sbv)
import webvtt

# to read from a different format use the method from_ followed by
# the extension.
vtt = webvtt.from_sbv('captions.sbv')
vtt.save()

# if we just want to convert the file we can do this in one line
webvtt.from_sbv('captions.sbv').save()

Also we can convert WebVTT to other formats:

  • SubRip (.srt)
import webvtt

# save in SRT format
vtt = webvtt.read('captions.vtt')
vtt.save_as_srt()

# write to opened file in SRT format
with open('my_captions.srt', 'w') as fd:
    webvtt.write(fd, format='srt)

webvtt-py package reference

webvtt.webvtt

class webvtt.webvtt.WebVTT(file='', captions=None, styles=None)

Parse captions in WebVTT format and also from other formats like SRT.

To read WebVTT:

WebVTT().read(‘captions.vtt’)

For other formats like SRT, use from_[format in lower case]:

WebVTT().from_srt(‘captions.srt’)

A list of all supported formats is available calling list_formats().

captions

Returns the list of captions.

classmethod from_sbv(file)

Reads captions from a file in YouTube SBV format.

classmethod from_srt(file)

Reads captions from a file in SubRip format.

static list_formats()

Provides a list of supported formats that this class can read from.

classmethod read(file)

Reads a WebVTT captions file.

classmethod read_buffer(buffer)

Reads a WebVTT captions from a file-like object. Such file-like object may be the return of an io.open call, io.StringIO object, tempfile.TemporaryFile object, etc.

save(output='')

Save the document. If no output is provided the file will be saved in the same location. Otherwise output can determine a target directory or file.

total_length

Returns the total length of the captions.

webvtt.segmenter

class webvtt.segmenter.WebVTTSegmenter

Provides segmentation of WebVTT captions for HTTP Live Streaming (HLS).

seconds

Returns the number of seconds used for segmenting captions.

segment(webvtt, output='', seconds=10, mpegts=900000)

Segments the captions based on a number of seconds.

segments

Return the list of segments.

total_segments

Returns the total of segments.

webvtt.cli

Usage:
webvtt segment <file> [–target-duration=SECONDS] [–mpegts=OFFSET] [–output=<dir>] webvtt -h | –help webvtt –version
Options:
-h –help Show this screen. –version Show version. –target-duration=SECONDS Target duration of each segment in seconds [default: 10]. –mpegts=OFFSET Presentation timestamp value [default: 900000]. –output=<dir> Output to directory [default: ./].
Examples:
webvtt segment captions.vtt –output destination/directory
webvtt.cli.main()

Main entry point for CLI commands.

webvtt.cli.segment(f, output, target_duration, mpegts)

Segment command.

webvtt.generic

webvtt.parsers

class webvtt.parsers.SBVParser(parse_options=None)

Bases: webvtt.parsers.TextBasedParser

YouTube SBV parser.

class webvtt.parsers.SRTParser(parse_options=None)

Bases: webvtt.parsers.TextBasedParser

SRT parser.

class webvtt.parsers.TextBasedParser(parse_options=None)

Bases: object

Parser for plain text caption files. This is a generic class, do not use directly.

read(file)

Reads the captions file.

class webvtt.parsers.WebVTTParser

Bases: webvtt.parsers.TextBasedParser

WebVTT parser.

webvtt.writers

webvtt.exceptions

History

0.4.5 (09-04-2020)

  • Fix issue reading buffer

0.4.4 (27-03-2020)

  • Allow parsing empty SBV captions, thanks to @ishunyu (#26)
  • Fix invalid time cues, thanks to @sontek (#19)
  • Enable pytest as test runner, thanks to @sontek (#20)
  • Packaging improvements
  • Added Python 3.8 support
  • Improve parsing empty lines

0.4.3 (22-11-2019)

  • Parsing improvements, thanks to @sontek (#18)
  • Add support for reading content from a file-like object, thanks to @omerholz (#23)
  • Documentation fixes thanks to @sontek (#22) and @netcmcc (#24)

0.4.2 (08-06-2018)

  • Renamed and reorganized few of the modules
  • Parsing methods are now class methods: read, from_srt and from_sbv
  • Improved usability with the addition of shortcuts to avoid instantiating the classes so we can do:

import webvtt

webvtt.read(‘captions.vtt’) # this will return a WebVTT instance

0.4.1 (24-12-2017)

  • Support for saving cue identifiers

0.4.0 (18-09-2017)

The main goal of this release is a refactor of the WebVTT parser to be able to parse easier and give support to new features of the format.

New features:

  • Support for cue identifiers
  • Support for parsing WebVTT captions with comments
  • Support for parsing WebVTT captions with Style blocks
  • Support for BOM in caption files
  • Added method to write the captions to an opened file
  • Convert WebVTT to SRT format
  • Ignore empty captions in SRT format

Other:

  • Refactored WebVTT parser

0.3.3 (23-08-2017)

The text for the caption is now returned clean (tags removed). The cue text could contain tags like: * timestamp tags: <00:19.000> * class tags: <c.classname>text</c> * and others… Important: It currently removes any tag present in the cue text. For example <b> would be removed.

Also a new attribute is available on captions to retrieve the text without cleaning tags: raw_text

0.3.2 (11-08-2017)

The goal of this release if to allow the WebVTT parser to be able to read caption files that contain metadata headers that extend to more than one line.

0.3.1 (08-08-2017)

  • Made hours in WebVTT parser optional as per specs.
  • Added support to parse WebVTT files that contain metadata headers.

0.3.0 (02-06-2016)

New features:

  • Added support for YouTube SBV captions.
  • Added easy iteration to WebVTT class.
  • New CLI command for segmenting captions for HLS.

Other:

  • Improved parsers to reuse functionality.
  • Added an exception for invalid timestamps in captions.
  • Added an exception when saving without a filename.

0.2.0 (23-05-2016)

  • Refactor of the main module and parsers.

0.1.0 (20-05-2016)

This module is released with the following initial features:

  • Read/Edit/Write WebVTT captions.
  • Read SRT captions and convert to WebVTT.
  • Segment WebVTT files for captioning HLS video.

Indices and tables