No, not every combination. The question is about the specific combination of a pelican on a bicycle. It might be easy to come up with another test, but we're looking at the results from a particular one here.
More likely you would just train for emitting svg for some description of a scene and create training data from raster images.
More likely you would just train for emitting svg for some description of a scene and create training data from raster images.