Likely reasoning is part of the original model. It is well known that it is not possible to get a 1bn parameter model to reason, even with RL.