Right. Last time I checked this was easy to demonstrate with word logic problems:
"Adam has two apples and Ben has four bananas. Cliff has two pieces of cardboard. How many pieces of fruit do they have?" (or slightly more complex, this would probably be easily solved, but you get my drift.)
Change the wordings to some entirely random, i.e. something not likely to be found in the LLM corpus, like walruses and skyscrapers and carbon molecules, and the LLM will give you a suitably nonsensical answer showing that it is incapable of handling even simple substitutions that a middle schooler would recognize.