29. 03. 2022 Charles Callaway Documentation

Making Your Own Tutorials, Part 7: Script Delivery

You might remember from the last post in my series on creating video tutorials that we talked about improving the quality of picture-in-picture video tutorials. I broke it down into two parts, the background and the presenter, with this summary for the latter:

What’s realistically the most natural way you can present yourself in the inset video, given your budget and time constraints?

I’d like to focus on the part about budget and time constraints, because let’s face it, if you’ve got a budget for your tutorial videos, you’ll have staff to split up the workload so you focus on just your role.

The Presenter Problem

When shooting a tutorial video on a budget, you’ll likely be playing every role all by yourself: video subject, director, lighting expert, cinematographer, recording engineer, etc. You can use various techniques to reduce the workload, but no matter what, you’re going to have a naturalness problem. The reason is you’re doing so much work to actually set up and make the video itself, you won’t have time to memorize the script and plan your delivery, and so you’ll have to do an enormous number of takes to get enough useful material.

If you’re in a more informal context like social media, this is actually less of a problem. Who hasn’t seen YouTube videos of someone talking where there’s a cut every 10 seconds,
You might even say it depends on your audience: younger people who’ve grown up with social media seem to find it normal, while those who’ve been around longer and are used to a certain level of professionalism find it rather jarring.

I want to address the mainstream business type of tutorial here, not too hip, but not staid either. Just think of your task as the video variation of the old real estate adage: instead of “location, location, location”, it’s “audience, audience, audience”. And do beware that if you do go the informal route, the amount of work spent on video editing can grow exponentially, so you’ll need some really good management skills to keep down your time spent.

A Baseline Example

With that kind of audience in mind, there’s the expectation of a certain standard of coherence, progression, steadiness, and just generally, “get to the point”. Who has time to listen for hours when the reason you’re following a tutorial in the first place is to get your work done.

So let’s assume you’re going to basically write a college-level essay on how to do something, deliver it in front of a camera while doing the lighting, filming, and everything except post-editing all at the exact same time.

If you’re intimately familiar with the material you’re talking about, you’ve already got a leg up on everyone else. You’ll have less to memorize and you’ll know what to emphasize without having to mark up your script with non-verbal cues. And here we’re already getting to the nut of the problem. Looking natural on camera mostly comes down to knowing what you’re going to say before you say it.

Obviously memorizing the text is one way to do this, but it’s not the only one. And like the others, it has both advantages and disadvantages. Often it comes down to our two good friends, budget and time. The longer and more complicated your video tutorials are, the longer it will take to memorize what you have to say, even if you divide it into short scenes. And since you’ve also got lighting, recording, and editing tasks to do, you won’t have that amount of time.

But if you do want to go the memorization route, it’s much faster to memorize one or two paragraphs, shoot the video of those, then memorize another paragraph or two, etc. And don’t forget, the reason you might want to memorize in the first place is so that you can have constant eye contact with the camera, which is the most natural way to deliver a monologue.

So how can you keep that naturalness without memorizing at all (which is almost a requirement if you’re bad at memorizing like I am)? Well, another set of approaches is to read off a piece of paper placed above or below the camera, or to use a teleprompter placed in front of the camera.

Both solutions solve the problem of memorizing, without rendering your voice unnatural. But it doesn’t necessarily help with the rest of your body. For instance, you need to hold your head still to be able to read, and if your eyesight isn’t that great, squinting isn’t a good look on camera.

My Solution

Obviously everyone is different, so when I tell you what works for me, there’s always the possibility it won’t work for you. In any event, I was initially happy when using a teleprompter, because to be honest, the result was much better than when I was trying to memorize.

But one day after I was with a colleague trying to brainstorm how we could continue to improve our videos, I noticed that a lot of video podcasts had the participants wear headphones, even though they were sitting right next to each other.

It turns out there are a number of good reasons for this, and it’s not just for dialogue: a lot of audiobooks for instance are also recorded using this technique. Of course when recording an audiobook you’re not worried about the visual component. But it got me wondering if it would work better than a teleprompter.

You may have also seen television hosts wearing an earpiece, where the host gets cues from an off-screen producer. Imagine instead that the producer was telling you what to say, word by word.

Now imagine even further (since you’re the writer, producer, videographer, etc.) that you’re the one talking in your headphone. Yes, you’ve figured it out. You’ve recorded your self ahead of time reading the script at the right pace. Now all you have to do is repeat yourself out loud as you stare into the camera and perform/emote.

I was surprised at the improvement in naturalness, and I think it’s almost entirely due to not having the pressure of memorizing and not having to stare straight ahead all the time. It also ended up saving me a huge amount of time, on average I needed only 25% as much since I had a lot less preparation to do, I had many fewer takes during recording, and video editing after only 1 or 2 takes is basically the gold standard.

You do need to be careful with the volume in your headset. Too loud means you can’t hear yourself as you speak – you’d be surprised at how your voice sounds when you that happens. Too soft means you can’t hear what you’re supposed to say.

You should also give it a few practice runs the very first time. You may have a bit of difficulty at first at keeping your volume steady, or you may make a bit of a funny face as you learn how to hear and repeat at the same time. Try mixing it up with my advice from the last blog post.

Once you do get it working, you’ll be able to record an entire video in one go, which will result in yet another improvement in naturalness. And you can go for your own vibe as well: big headphones, invisible earpieces, or even showy headphones in your corporate colors or fave design.

Give it a try, and see if you can come up with even more improvements!