The core element of the mediafai product is the video generation tool, and it centres around the ability to make it extremely cost effective, rapid, and scalable. Having built a previous video engine I was aware of the high costs and low speed that plagued it, however it was horizontally scalable.
What is Horizontal Scaling?
For everyone reading this to learn how a SaaS is built, you would have seen this term mentioned a couple of times. Horizontal scaling refers to the design where if you need to support more users simultaneously, or need something to calculate faster, you can have the application running multiple times alongside eachother so they work concurrently. You can think of horizontal as the servers sitting in a line. If you no longer need to support so many users, remove some servers. This can be achieved dynamically in the cloud in various ways.
The important part to note is most websites are designed this way as they’re simple cases that just need to keep track of the user, but when you have a use case where you need something calculated that may take minutes – such as the video generation – it changes the way the platform is designed underneath so that a process can benefit from this.
Horizontal scaling also exists within a single server by creating threads. An application or parts of an application can run many times simultaneously within the same server – this requires careful consideration to prevent computing the same work more than once, and to prevent conflicts. Hope that helps! Back to solving the video problem.
How to horizontally scale video generation
Generation of a video is fairly straight forward. There are two main steps:
- Generate each layer – capturing them frame by frame.
- Combine all the layers together frame by frame resulting in a video.
That’s an extremely simplified version of it, but they key part there is that each layer needs to be captured. To horizontally scale this is simple – just send each layer off to a different server. Then when it comes to combining the layers unfortunately that could only be achieved with a single server.
The high costs of the previous video engine were a result of how many servers it took in order to keep the rendering times to a reasonable level (~50 servers), but an interesting point was that it could only reach a minimum rendering time of around 6 minutes (for a 30 second video) due to the combining step. This timing was true if there was only one video being generated in that time, but if 10 videos were generated at once a user could be waiting an hour before receiving a result. While the servers were arranged to be extremely cost effective, it’s not a solution for our product. We need to find a better way, with an old animated movie holding one of the secrets.
How does the 1995 film Toy Story fit in?
I have always enjoyed the art of 3d modelling & rendering, the tools used, the ways the images and videos are created. When Toy Story burst on to the scene I was fascinated, as I became aware of the tools Pixar used to create the movie – their rendering engine was called renderman. This application was out of my reach however there was an open source (free) version called BMRT giving everyone the ability to generate more realistic renders. But how did they generate a complex 3d movie with the relatively minimal computing power at the time?
One technique they used was to create a render farm – this is their version of the horizontal scaling mentioned above – where they linked multiple rendering servers to work together to generate the videos. But instead of sending a frame to each server (as we were doing), all the servers would work on the one frame at the same time.

Video generation and capture
scraping
solving months
