Excellent question!!
1. There's different answers dependent upon the manufacturing technology. Fundamentally the number of ('transisitors' or circuits per unit area) divided by the (number of 'transistors' or circuits necessary to make the photos sensitve buckets plus the number of 't' or c's needed to transport the data off the chip(s).
I intentionally used common terminology to make this a bit more readable. The chip makers have been very guarded as to what tech they use and how far they will go. There's a doubling or two left easily I believe (~100 Mega-pixels) in what is currently employed for the current realm of DSLR format camera. Could you use the image?? I doubt it for many reasons.
100 Megapixels in a large chip (MF or sLF) is virtually available now. Slow, unwieldy and you need to mortage something to get one.
2. This does not factor in issues with noise, heat, and power that are all related by formulas available from the appropriate manufactuer via spec books. Different level of integration (UVLSI, SSIC, etc.) have distinct characteristics.
3. Many lenses were exceed in resolution by the large MF-style chips some time ago. I've got older lens that unique characteristic flaws that on film add character, with the bigger chips just are unusuable. Thus the redesign of all the Hasselblad gear over the past few years. Also I believe the same is applied to the new Leitz lens offering.
Finally as new sensor arrangements emerge the limiting factor also may become how much processing the market will bear in the camera and whether they will be full time video capable as well.
I'm waiting for something along the lines of the 3 chip 'still' camera. There have been experiments but they have been costly for the consumer, even professional due to product consideration.
I've got some more thoughts on this for later!