They have absolutely thought through what they want in terms of features and the features they described absolutely require machine learning as it stands today. I cannot think of any other methods to remove advertisements from objects in a live video feed like the pitcher mound example op provides.
I had fun with that for a few minutes