Improving Subtitling in ScreenFlow
The first step in improving what ScreenFlow calls "captioning" is to get the nomenclature straightened out. The term "captioning" is only used in the US, Canada and Mexico and it is used in ways that confusingly conflate it with the term "subtitling." Attempts to differentiate captioning and subtitling usually try to argue that captioning is for the deaf and hearing impaired whereas subtitling is to support multiple languages. The existence of SDH (Subtitles for the Deaf and Hearing impaired) clearly defeats this argument. The other argument is that captions can be positioned on the screen in such a way as to indicate who the speaker is. Though correct, you'll be hard pressed to find current examples of this practice. For whatever reason, it isn't being done. More commonly, subtitles indicate who the speaker is with text that gives the speakers name, role or other identifying trait (man, woman, girl, etc.). Thus, this aspect of ScreenFlow would be much more readily understood if the term subtitling and its various forms were consistently used throughout. This would also help ScreenFlow users see the issues related to on-demand video for computing devices more clearly. Broadcast and real time video captioning are a very different kettle of fish and, so, only serve to add fog to the landscape.
The second step is to bring the subtitling UI in ScreenFlow into the more familiar realm of actions. The current UI looks and feels as if it were tacked on, Frankenstein style. Here is a short list of the more obnoxious aspects of the current subtitling UI with proposed solutions:
- You can't have a subtitle that is blank or less than 1.5 seconds duration and you cannot have a gap between subtitles. This means that you have to have a subtitle visible from the very beginning of your screencast through to last subtitle and that you cannot have any screen time without a subtitle which disconnects the audience from the highly desirable 1:1 connection between audio and text.
- Instead of using the more intuitive Action UI, one must do a lot of arithmetic or be a really good guesser to figure out what the duration of a subtitle must be in order to match the audio and video. This UI choice completely ignores the presence of an audio wave form that would provide an excellent guide as to where a Subtitle Action should start and stop.
- You can't have more than one soft subtitle track. This precludes supporting any languages other than English (now the only choice in "Set current language"). Why offer a choice between English and English?
So, 1) get the nomenclature clear by using "subtitling" instead of "captioning," 2) render the subtitling UI as an Action so that the duration of subtitle lines can be informed by audio waveforms, have gaps between them, and be as brief as need be and 3) enable multiple, language-labeled subtitles to support multi-lingual audiences.
This will help ScreenFlow content developers more easily acquire the means with which to effectively reach ever wider audiences and do so using the intuitive action UI that makes the rest of ScreenFlow so much easier and so much fun to use.
Hmm, it seems you'd like us to bring over some features from our MacCaption program and "ScreenFlowize" them. ;) It does support web and device specific needs so that would be focus.
Please do make the feature request since we have developers with expertise in this area.
ScreenFlow Feature RequestReply
Rather than take inspiration from MacCaption which is an omnibus tool set for pro broadcasters, it would be better, I think, to focus on improving what ScreenFlow already does and making the use of those ScreenFlow tools faster and easier by using the action model making them more familiar and intuitive. A simple focus on the creation of video with one or more text-based subtitle tracks plus the import and export of SRT is all that is needed. BTW, Telestream's white paper on the use of MacCaptions in education settings is seriously out of date. For example, QuickTime for Windows is no longer viable nor is QuickTime 7 for macOS.
There are actually two major approaches to adding subtitles to MPEG-4 video, in-band (built-in to the video file as with SRT) and out-of-band (a separate file that must accompany the video file as with WebVTT). Currently, ScreenFlow supports only in-band (SRT) subtitles.
There used to be a clear difference between these two subtitling methods. Out-of-band subtitling was for playback on the web and in-band subtitling was for playback on the desktop and mobile devices. With recent versions of Safari, this is no longer universally true. Safari now supports web-based video containing one or more subtitle tracks and one or more alternate audio tracks.
I don't believe that Firefox and Chrome yet support this aspect of HTML 5 but expect that they will at some point in the future. None of these support chapter tracks yet.
Thus, ScreenFlow's current SRT-based approach is quite adequate for playback via MPEG-4 compliant video players such as QuickTime Player X, VLC and Switch as well as Videos.app on iOS, iTunes.app on macOS, Music.app on iOS, iBooks.app on iOS and macOS etc.
Adding WebVTT to ScreenFlow's import/export repertoire would be trivial since these are both text-based formats. This would enable web-based playback with subtitles on Firefox and Chrome where the hosting system supports out-of-band subtitles.
For the audiences targeted by screencasters, keeping it simple and easy is of paramount importance. The automation and scale of pro broadcasting software is not yet needed in screencasting. When and if it does become a necessity, that need will probably best be met with an entirely different software title.Reply
As an aside to this topic, interested parties should recognize the importance of developing a transcript as a necessary first step in creating subtitles for a screencast. Despite my exhortations to the contrary, the majority of students in the ScreenFlow classes that I have taught indicate a pretty strong aversion to writing and following scripts. Even those who create scripts in advance sometimes significantly depart from them. Thus, the problem for many screencasters is how to obtain an ex post facto transcript with which to make subtitles.
If you've viewed any of the WWDC videos [example] you'll see that Apple is now offering transcripts as well as subtitles.
Note that this is in English only so Apple is missing an opportunity here. Still, the audience has been expanded to include the hearing impaired who can read and comprehend English. Certainly a step in the right direction.
One can only guess how Apple does this but my guess is that they use human stenographers. That guess is based upon the occasional appearance of a parenthetical note such as "[unintelligible]" which would indicate that the transcriber is listening to the speaker in real time or via a recording and is unable to understand and transcribe a word.
That's one way to get a transcript. For ScreenFlow users, one can listen, stop, transcribe, review, adjust timing, rinse and repeat. If you have Dragon Dictation, you could export the audio as a file from ScreenFlow and then use that file as input to Dragon Dictation which might need to be trained to that particular voice.
I've been experimenting with a less expensive method using the built-in dictation system in macOS. Normally, dictation requires live input so a big part of my scheme is to fool dictation into thinking that the audio from a recorded file output by ScreenFlow is live. I use the latest version of Audio HiJack Pro to achieve this bit of legerdemain the results of which can be seen in this screencast.
Note that macOS dictation doesn't figure out the correct punctuation. Normally, you have to use spoken commands to include punctuation, start a new paragraph, etc. Still, these ex post facto methods are vastly better than trying to do it all from within ScreenFlow.Reply
Yet another option to consider. Using my somewhat ideal example file (actor Sam Waterson reciting Lincoln's Gettysburg Address), I was quite impressed with Google's automated STT process for subtitling. As you'll read in Google's documentation for this feature, it is a work in progress that is subject to a good many ifs, ands or buts that mirror the results I obtained using Apple dictation.
Of greatest interest to ScreenFlow users is the ability to download an *.srt file from YouTube and then import that into a ScreenFlow project.
Unfortunately, there is a bug in this import feature that I will report separately. Here's a screenshot that illustrates the problem.
At time zero (00.00), we should be seeing "four score and seven years ago" but instead we're seeing a subtitle that shouldn't come into view until 00:00:24,710 --> 00:00:27,949.
Also interesting are the gaps that can occur with an SRT import. I was not able to create a gap using the ScreenFlow UI.Reply