Building a transcoding VM
Published on 4/19/20, 3:46 PM
Recently at LANified! we have been working on increasing how much content we record at events. We recently recorded a whole bunch of content at Smash Logic's Salt Flats 2020, in addition to other past events. I wanted to streamline the transcoding processes we need for ingestion into our NLE (DaVinci Resolve) and publication (to YouTube or others), as it would save a lot of time now and in the future. Naturally, Information Technology is the solution. This is how I built our transcoding VM, which I have christened "Alchemy".
Right now the primary source of our media is from GoPro recordings or HDMI ingest. These typically record into .mp4 containers with the video codec H.264 and the audio codec AAC, with video bitrates in the realm of 20-60mbps. The upper end of the video bitrate options are typically used for high-movement First Person Shooter gaming recordings, due to how fast the entire frame changes. These bitrates mean that transcoding tasks take a non-trivial amount of time when dealing with a lot of recorded content. The reason we need to transcode away from these parameters is due to DaVinci Resolve's supported codecs for the edition we are using (namely, the free one). The free edition of Resolve 16 (latest as of this writing) does not allow you to ingest H.264, H.265 video codecs and AAC audio codecs. Resolve can work with a very massive list of codecs, however there are some codecs and containers it does not support at all (even in the Studio, aka paid, edition), such as MKV, but we can work around this. I needed to find a combination of codecs, a container and transcode parameters that would minimise/eliminate any fidelity loss in the transcoding for the first pipeline from source material.
Through many permutations of manual transcoding on my workstation, using ffmpeg and handbrake, I eventually ended up with the final target of using the video codec mpeg4-video, audio codec PCM (pcm_s24le), the .mov container, and the parameters I would need for ffmpeg. It's rather frustrating that there are containers out there that have a limited list of codecs that work with them, it seems arbitrarily constrained. This combination of codecs and container served well in my testing as an effective target, as I did not appear to lose fidelity loss (apart from a grey thing that I cannot yet explain).
There are other aspects to the target too, not just the codecs and containers. I also wanted to setup the environment so that it was extremely convenient, as convenient as it could be, plus reliable. This meant architecting the system with robust scripts, having the Virtual Machine operate autonomously and interface with network storage so the User Experience (UX) was extremely convenient. I wanted the UX to be as simply as copying a bunch of files into a particular network folder, come back in a little while, and the media is transcoded into another particular network folder, ready for you to pick up and use immediately.
The Virtual Machine
The transcoding work would need to be performed within a Virtual Machine in our clustered environment. I know some people reading this will probably think to themselves "why not a container?", and the reality is I can get the VM up faster than a container right now. In the future that will probably change, but in this example we will be using a Virtual Machine. The hypervisor in the environment is Proxmox VE, effectively LinuxKVM at the core, and it works wonders for us.
I provisioned the Virtual Machine with 6x CPU Cores (host type), originally 4GB of RAM (later downsized to 1GB of RAM), 10GB of disk (only using 5.3GB in the end) and installed Ubuntu 18.04 on it. This VM will be mounting an SMB network share, which will house the content and stuff like that. Mounting an SMB network share in this fashion helps keep the VM smaller and has other residual benefits, such as storage-side snapshots (ZFS) of data.
Once the OS was installed, I applied all available updates, added a PPA repo (optional, not required) for ffmpeg and installed a few specific packages. The ffmpeg PPA is the savoury1 ffmpeg PPA plus the optional ones recommended on that page. This was just so I could get a newer version of ffmpeg, but this is completely optional and my needs would have been met without this repo. The core packages I needed were ffmpeg, screen, (already installed) and cifs-utils. I also installed snmpd for monitoring, but that's not required for transcoding whatsoever.
Next I needed to setup the network storage so that my VM could mount it and work against it. The NAS that we use is a FreeNAS system setup with a spinning rust (HDD) zpool and an all-flash SSD zpool (reserved for VM Disk images only). ZFS is the core technology for the storage, so this gives us a good amount of flexibility. Initially I tried creating a series of datasets for the different folders that I anticipated we needed, however this caused problems moving and copying folders around between them (which I don't yet know why), so I instead simplified it down to a single dataset and created regular folders on-disk instead. This dataset is on the HDD zpool, as SSD storage is not warranted for this task.
The advantage of creating even a single dataset for this function, instead of just folders for everything, is that I can do stuff regular folders can't. Firstly, I can see from FreeNAS' webGUI (and ZFS CLI if I were so inclined) how much space that single dataset is using. Secondly, I can adjust parameters on that dataset separately from other datasets, such as permissions, ZFS flags and many other things, should I choose. Thirdly, it helps keep things organised and visible when evaluating the zpool and storage system.
Once this dataset was made, I created a user on the NAS and made it a member of a particular group, then set rwx (full permissions) for that group on the dataset and recursively. The user I made I set to the "nologin" shell, so they cannot SSH into the NAS, and set their default group to the group that has permissions to that dataset. The VM will use this account to access the dataset over the network. I then setup a new SMB share serving SMB version 3.0, with no particular parameters otherwise.
I then setup two ZFS snapshot tasks within FreeNAS to take ZFS snapshots of this dataset. This means that if something goes wrong (due to human or computer error) I can go retrieve content at the storage level. The first one takes a snapshot every 30 minutes and keeps each snapshot for 12 hours. The second one takes a snapshot every 12 hours and keeps them for 2 weeks. This is likely to increase my data usage, but this data automatically frees itself as it stops being relevant. I may tune them further over time.
Within the VM I then modified /etc/fstab so that I could mount the dataset over the network. I used the below declaration, naturally removing the sensitive parts.
//IPv4.Address.Goes.Here/Transcoding /mnt/transcoding cifs username=REDACTED,password=REDACTED,rw,dir_mode=0770,file_mode=0770,vers=3.0 0 0
The declarations of "dir_mode" and "file_mode" ensure that as I create folders and files within the mount point in the VM that the permissions and modes stay consistent. "vers=3.0" is how I ensure the VM mounts at SMB version 3.0. Now that this is mounted, and I verify I can create/modify/delete content within the mount, I moved onto the scripting component.
At this point we have the whole infrastructure ready to go, and now we just need to automate this transcoding task. As I mentioned earlier, I want to set this up to be as extremely convenient as possible from the user's perspective (initially me, but others may use it in the future). It also needs to be resilient, so it checks for a few things to avoid duplication of work, starting transcode tasks on files still being copied, or other things that could break the process.
The desired logical flow is:
- Check if the transcode task (screen) is already running, stop if it is. If it is not running, start a screen for the script handling below.
- Check if the files in the target folder are still changing (is it still being copied into place?), stop if they are. If the files do not look like they are changing, start the transcoding loop.
- Start a for loop against all files with the .mp4 file extension (case agnostic) in the target folder. For each file they are transcoded into another folder, then moved into that folder so they don't transcode infinitely.
This logic flow happens once a minute through a cron job that runs the first script. This cron job is defined in the /etc/crontab file, running as the root user, because there's nothing sensitive on this system and this system doesn't have access to sensitive info. Running as root is not recommended as it is a bad habit to get in, but for the sake of example, that's what I'm doing here. This is the cron declaration:
* * * * * root /mnt/transcoding/Source-To-ResolveFree-60fps-PCM-mpeg4Video-mp4ToMov-Combined/transcodeStart.sh
The first five stars declare that this task will run every minute, of every hour, of every day. Then "root" declares the user the task is ran as, and naturally then declare which thing to do, in this case run the above script. Declaring the absolute location of the script file is necessary to avoid any scoping issues, so that cron never is uncertain where the script is I want to run. Do remember that once you save /etc/crontab or other cron files, that you will need to restart/reload your cron daemon to pick up the changes. If you don't restart/reload crond, then you're not going much of anywhere.
This is the contents of the first script:
#Checking if Screen runs then executing task script
if [ `screen -ls | grep -c transcoding` = "0" ]; then
screen -dmS transcoding ./transcodeTask.sh
echo "A transcoding task & screen is already running ya goomba!"
What this script does is check if the screen we want to run is already running, by checking the name of the screen. If it is already running, it exits out, after reporting a human-legible error message (in-case I run this script by hand). If the screen does not exist, it then launches a new screen calling the second script. I do not declare the absolute path for the second script as that is not necessary in this case, but in your case it might be. Running these tasks in a screen process means that I can go look at the task if I am trying to troubleshoot it. It just makes it easy to see the output of the task if I really want to, plus it also helps me block running the task more than once.
This is the contents of the second script that gets called:
#Checks if files are still being modified (copied into place), exits if copying in progress.
#Then loops transcode against all the relevant files, outputting into other folder &
#then moving each source video into the Output folder.
#Check if files are still growing/copying
while [ "$(ls -la "$INPUT"; sleep 15)" != "$(ls -la "$INPUT")" ]; do
#Run loop for multiple files
#Adjust grep query for file extension
for FILE in `ls -A $INPUT | grep -E -w '([\.mMpP4]*)$'`;
ffmpeg -r 60 -i $INPUT/$FILE -c:a pcm_s24le -c:v mpeg4 -q:v 1 $OUTPUT/$FILENAME-Combined.$OUTPUT_EXT
mv $INPUT/$FILE $OUTPUT/
echo 1 > /proc/sys/vm/drop_caches
This script has two core parts to it. The first part is the first while loop, where it checks if files are still changing in the target $INPUT folder. It compares the output of "ls -la" on the target folder, versus the same output 15 seconds later. If they are different (probably files still copying), it exits out. If they are the same, it proceeds to the next step, the actual transcoding loop.
The second for loop processes every FILE in the target folder with the file extension ".mp4" but checks for varied capitalisation, such as ".MP4", ".mP4" and other such things. I don't want it to stop just because the file has the "wrong" capitalisation. The part dealing with FILENAME, basename and such is so that I can strip out the text of the file before the file extension, that way I can use the same text to build the name of the output file. This is used in the "$FILENAME-Combined.$OUTPUT_EXT" part, where we take that first bit of text before the file extension, add "-Combined" to it, then add the desired "$OUTPUT_EXT" (file extension we want the output file to use), and that makes up our file. So, "GOPR0149.MP4" turns into "GOPR0149-Combined.mov" in this particular case, for the transcoded file. This does not modify the source file.
After the transcode step happens, the script moves the source file it just used into the same folder the output file went into, so that we don't process files infinitely. The last step about dropping caches is not a requirement, but part of my effort to stop having shit go into swap files. The justification for this part is outside the scope of this article. The dropping of caches is completely optional, so if you don't want to use that, so be it.
And that's the scripting part!
Thanks for reading my article. If you've gotten this far, I hope this helps you. When I was trying to make this happen for myself I had to cobble together a whole lot of sources to get something I was happy with working. Hopefully this helps you get your goals met just a bit faster.
Now that I have this setup, I'm starting to see that I need to adjust the way the scripts work a bit so that I can have multiple transcoding pipelines processed in similar way, as right now there's some pitfalls that I had not anticipated for multiple pipelines. Namely around the 15 second wait time. If I run the task every minute, and I have two pipelines checking folders for 15 seconds each in duration, that's 30 seconds of the 60 seconds (in a minute) that are just waiting. If I use this method for 4 pipelines, then we're now overlapping tasks probably. So, not yet sure on the best way to improve that further, but that's another day. Right now, I just want to finish this article so I stop procrastinating. Thanks for reading!
The obligatory "Hello World" when you start something is boring and sets the bar too low. We should expect more of ourselves.