Commercial drone products can tackle some automated tasks, but one thing those systems don’t address is filming artistically. A team led by Carnegie Mellon University researchers has proposed a complete system for aerial cinematography that learns humans’ visual preferences. The fully autonomous system does not require scripted scenes, GPS tags to localize targets or prior maps of the environment.

“We’re putting the power of a director inside the drone,” said Rogerio Bonatti, a Ph.D. student in CMU’s Robotics Institute. “The drone positions itself to record the most important aspects in a scene. It autonomously understands the context of the scene — where obstacles are, where actors are — and it actively reasons about which viewpoints are going to make a more visually interesting scene. It also reasons about remaining safe and not crashing.”

As a goal, “artistically interesting” is subjective and difficult to mathematically quantify, so the system was trained using a technique called deep reinforcement learning. In a user study, people viewed scenes on a photo-realistic simulator that changed between frontal, back, left and right perspectives. Shot scale and distance were also explored, as well as the actor’s position on the screen. Users scored scenes based on how visually appealing they were and how artistically interesting they found them.


The system learned that some movements were more interesting than others. For example, other autonomous drone products often use a continuous backshot because it allows the drone to follow a clear, safe path behind the actor. But in the user study, participants reported that a constant backshot becomes boring after a while. They also found that the drone had to switch angles often for the shot to remain interesting, but it couldn’t switch too often.

Bonatti said the team wanted to make the learned behavior generalizable, going from training in simulation to deployment in real life scenarios. While the system averaged users’ preferences for shots as an actor walked a narrow corridor between buildings, it can apply those preferences to similar obstacles like a forest path using topographic mapping.

“Future work could explore many different parameters or create customized artistic preferences based on a director’s style or genre,” said Sebastian Scherer, an associate research professor in the Robotics Institute.

The aerial system is also skilled at maintaining a clear view of the actor, avoiding what’s known as occlusions. “We were the first group to come up with new ways of dealing with occlusion that aren’t just binary, but can actually quantify how bad the occlusion is,” Bonatti said.

Other innovations include efficient motion planners to anticipate the trajectories of actors, and an incremental and efficient mapping system of the environment using LiDAR.

This system could be useful beyond entertainment and sports. Governments and police departments today already use manually flown drones for many applications, including monitoring crowds and understanding traffic patterns. But manually flying drones requires a lot of attention, and an officer cannot spend their energy actually looking at the scene. “Just like learning artistic principles, the machine could be taught the shots necessary for other applications like security,” Bonatti said.

“The goal of the research is not to replace humans. We will still have a market for highly trained professional experts,” said Bonatti. “The goal is to democratize drone cinematography and allow people to really focus on what matters to them.”

This work will be presented at the 2019 International Conference on Intelligent Robots and Systems next month, and has been accepted for publication in the Journal of Field Robotics. The research is sponsored by Yamaha Motor Company.