Cloning gets you the raw text objects directly. If you scrape the web UI you're dealing with a lot of markup overhead that just burns compute during ingestion. For training data you usually want the structure to be as clean as possible from the start.