So, first of all, of course, a rhetorical question. Modern interns will probably assume at least 4 bytes per pixel (R, G, B, and A).
But the original post actually talks about CGA [1] with just four colors. Encoding a color needs two bits then, so each byte encodes four pixels.
Oh right. Guess the " (2 bits per color? how is that possible)" is what threw me off there, because I read it as 2 bits per colour channel, rather than cga colour. Of course, "indexed" colours can get away with much fewer bits.