Video streaming: for fun and research?
October 14, 2020
Anton Malmquist & Pascal Hertleif
This is the second entry in our “Behind the scenes” column, and about something completely different than Multi Touch in Unity3d. This time, we are talking about streaming video feeds from multiple cameras and merging them in real-time.
Pascal: Hey Anton, remember when we did the video thingy a couple weeks ago? Wanna talk about that for the website? Gotta beat that Multi Touch in Unity3D interview from last time.
Pascal: Cool! So, help me out, why did we do this again? All I remember is that there were two camera inputs on a Linux box.
Anton: Right. So, our client has multiple cameras with overlapping fields of view, but with different augmentations. To demonstrate and emphasize the different strengths of both systems, they wanted to merge both camera feeds into one. Basically, the whole setup should show how the two camera systems work together in different situations.
Pascal: It’s about merging two video feeds then? How does that work?
Anton: Our first simple approach was to take the two different video feeds and pipe them into FFmpeg. Then we added a simple overlay – one of FFmpeg’s built-in filters – while making one camera layer slightly translucent. It’s a naïve approach but it netted us a good idea of what we would like the end product to look like, but not nearly a finished tool.
Pascal: The stream is now inside FFmpeg, on a Linux system. But where do we show it?
Anton: For this project, the two cameras are hooked up to a Ubuntu machine, and from there served as a video stream to be watched on an Android tablet. This means we need to send the merged video stream over a network. Since we were using FFmpeg, we picked the quick and easy solution: using FFserver. This is a pretty usual setup for streaming video over the internet, where clients can connect to the server. Our Android tablet is then just one of these clients using a native video player.
To make it an effective demo of the two camera systems, this should be as real-time as possible. How’d that work out?
Anton: FFserver is, in internet terms, ancient. Also deprecated by the FFmpeg team. And primarily geared towards streaming video over the Internet to many clients. This means that many of its design choices were made to make streaming stutter free on devices with unknown to bad connectivity and capabilities. This results in processing that is not just-in-time, but rather to make it easier to resume a broken connection. On top of this, FFmpeg is, in internet terms, ancient. It is, however, still actively maintained. And one of the best media conversion tools around. It’s primary design seems to not be towards the goal of just-in-time processing. Our original setup used FFmpeg to process the video streams we got from the two sources, scaling, blending and reencoding the streams to get our desired output stream to send to FFserver. While this was very convenient to do with FFmpeg’s filters, it sadly introduced even more latency into the system. In the end the whole setup (incl. pre-processing in the camera systems) we had about 10s of latency.
Pascal: Okay, so we needed to move away from FFserver, and probably also FFmpeg. Now the fun begins!
Anton: We were searching for open source tools; something that we could put on our existing Ubuntu setup without too much hassle. The leading streaming optimized media toolchain (say this three times fast) with these constraints is for sure GStreamer. It is, in Internet terms, ancient. But it’s designed to handle streaming as fast as possible, and can pretty much run as a distributed application on a local network, where we mirror the encoding and decoding process to serve up video as fast as we can.
Pascal: GStreamer is super powerful! It’s based on the idea that you define a pipeline, which describes the different operations you want to perform on your video stream. This worked out quite good, right?
Anton: This pipeline definition creates a highly optimized process. Compare and contrast this to FFmpeg, where each step of processing is run as a separate process, resulting in larger I/O overhead. However, this also means that the parts of the pipeline are more finicky and particular about their in- and output, which meant that there was a lot of trial and error before we got our desired output. An absolute highlight was picking blending filters from “gstreamer-plugins-bad” when possible.
Pascal: I remember one day of just trying different configurations! We wrote a quick python script just to concatenate the textual pipeline definition so we could iterate through more tries with aligning the input streams quicker.
Anton: And once we had our video stream just the way we wanted it, the next step was to forward it to the Android tablet. We first tried to keep the format the same, an RTP stream with either h264 or MJPEG as the only video channel – we tried a bunch of settings. But eventually we saw that directly sending RTP packets with MJPEG over UDP (piping “rtpjpegpay” into a “udpsink” in gstreamer speak) had the absolute lowest latency, which was around 1.5s.
But that meant having a compatible client on the Android side – and to be absolutely sure we picked GStreamer there, too.
Pascal: And what an intense experience that was! Android is, in theory, a supported target for GStreamer. Up to the point where there’s a downloadable archive that has all plugins (including the “bad” ones) and even a Makefile that Android Studio can use. And – tada! – a few tries and one empty but linked C++ file later we have a static library.
Anton: And then it was just a matter of writing/adapting a couple hundred lines of JNI code to actually set up the GStreamer receiving pipeline and we have a video showing up!
Pascal: All in all, a great success!
Anton: And we’ve learned so much.
Pascal: Thanks so much for your time! Let’s do this again soon!