BPO – a technical challenge.
I’ve recently spent a few hours working on quite a challenging technical issue for some friends who are planning a round-the-world sailing trip.
They’d like to send back 30 second “video postcards” from their trip using a satellite phone, using less than one “unit”. These videos are to be uploaded to YouTube for distribution. So let’s crunch the numbers.
I looked up the unit cost for both Inmarsat and Iridium. Both quoted prices for units of 1 MB (megabyte). Video bitrates are usually quoted in kilobits or megabits per second.
1 byte = 8 bits, so the unit size is 8Mb or 8000 kb.
The video length is 30 secs, so we have an available bitrate of 8000/30 = 267 kb/s to be shared between audio and video. For reference, AVC-Intra (a pro camera codec) has a video nitrate of about 113000 kb/s, Freeview HD averages around 8000, even high quality MPEG 1 audio layer 3 (MP3) audio files may be 256 kb/s.
So we need a very drastic encoder!
We don’t want vendor lock-in so we need to use open standard codecs. Advanced Video Codec (aka H.264) and Advanced Audio Codec fit the bill nicely. Both of these are available in FFmbc. Most Non-linear editors produce one track per audio channel – so we also need to create a stereo file from the two mono tracks.
First, this isn’t going to be HD. Reduce the raster size using a decent filter like Lanczos. Make sure the reduction is an integer division of the original, e.g. 1920×1080 to 480×270.
AVC only sends a full image once every x frames, by increasing x to 250 we drastically reduce the bitrate.
The intermediate frames are built by referencing other frames. By increasing the number of frames that can be referenced, we get a better picture.
By throwing the kitchen sink at this algorithm wise, we can get even more gains. So we need to allow the referencing algorithm to use its most exhaustive search, we choose the most processor intensive maths algorithm etc.
Audio is important, poor audio can ruin a video. So I started off looking at the audio. We can drop the sampling frequency to 32k. This keeps voices intact and only removes the top harmonics of music. We can then encode it at 64 kbps.
This leaves us with a fixed video bitrate. I’ve never managed to get FFmbc to match the video bitrate to the requested bitrate, so I experimented with the input value to get an overall file size of 0.97 MB, just less than 1 unit.
#!/bin/bash INPUT=$1 OUTPUT=$2 ffmbc -y -i $INPUT -vf scale=480:320:0 -sws_flags lanczos -map_audio_channel\ 0:1:0:0:1:0 -map_audio_channel 0:2:0:0:1:1 -vcodec libx264 -vb 245k -maxrate\ 255k -minrate 235k -bufsize 1500k -g 250 -bf 4 -refs 6 -partitions all -me\ umh -me_range 128 -subme 8 -trellis 2 -pix_fmt yuv420p -timecode 10:00:00:00\ -acodec libfaac -ac 2 -ar 32000 -ab 64k -f mov $OUTPUT
The Test Sequence.
Test sequences need to match the usage. This is for talking heads and there odd landscape. To test the software, I shot a quick 35 second sequence consisting of:
- cutaway with detailed bridge and reflective building
- cutaway with detailed brickwork and reflective building
- piece to camera with shallow depth of field
- street scene with movement
- digital zoom into map
This was shot on a Canon 700d (1920x1080p25) which is probably representative of the type of camera in use. To model the audio, I added a rights free instrumental track and a female voice reading Conrad.
I think it works quite well. YouTube accepts the file, and transcodes it to a very respectable video. The only sequence that doesn’t work is water – which I wasn’t expecting to work.