The problem with bootstrapping was just a misunderstanding of what the code was doing, so that's working fine now. So from that code I calculated clipped mean and median variances for each band in color and zero-point. Each median was close to its respective mean which is a good sign. Mean/median values are:
g color uncert. 0.0130472/0.0130471
g zero-point unc. 0.00871615/0.00871514
r color uncert. 0.0238134/0.0238136
r zero-point unc. 0.0199431/0.0199430
More robust mean values of the zero-point offet and color term for each band are:
g zero-point: 6.84060
g color term: 0.0709389
r zero-point: 7.15033
r color term: 0.0352746
The g band is looking good--uncertainties are relatively small and the color term and the mean zero-point offset is about what it was before. I'm more concerned about the r-band. zero-point is similar to what it was before with a relatively small uncertainty. What concerns me is the color term--the uncertainty for which is comparable than the measured value. This means we can't really trust that value for the color term. I'm not terribly sure about how to handle this.
My next step, as per Ricardo's paper, is to check these values for each exposure to see if the variances between exposures are small compared to the uncertainties described here for the stack. When Ricardo did this he compared between chips, but he also had 36 chips. We have only 8 chips and I expect that the differences between exposures are going to be where any problems lie. We have robust weight maps and distortion maps for all chips and it was the background levels between exposures that concerned us about the stacks to begin with. So I'm more worried about discrepancies between exposures than between chips.