Generally, a higher framerate looks better, to a point (*), even with a smaller frame size. So with a fixed number of bits to work with, lean towards the faster frame rate and smaller frame, until you reach that point, then increase the frame size to use the remaining bits. Provided, of course, that you're not just upscaling for the sake of upscaling. You don't actually add anything that way.
(*) The framerate for video is similar to the sample rate for audio. There's no point in sampling faster than 48kHz for final distribution of audio, because that rate already encodes everything that we can hear. Any more than that simply uses more bits for the *exact same* perception, so there's no point in doing any better. There might be a production, systems-engineering, or perhaps marketing reason to use more than 48k for audio, or more than 60fps for video, but actual physical perception is not it.
The idea that things have to line up on discrete samples is also wrong. It's completely unintuitive why that is, but it's true.
www.xiph.org
That presentation is entirely for audio, but the exact same principles apply to video too, both in time (framerate) and in space (distance between pixels), and in perceptible bit depth too (the actual amount needed is surprisingly low, given most actual content, even without the tricks).