Experiments often result in the need to compare an observation against a reference, where observation and reference are selections made from some specified domain. The goal is to determine how close the observation is to the ideal result represented by the reference, so that, all other things being equal, systems that achieve outputs closer to the ideal reference can be preferred for deployment. Both observation and reference might be sets of items, or might be ordered sequences (rankings) of items. There are thus four possible combinations between sets and rankings. Three of those possibilities are already familiar to IR researchers, and have received detailed exploration. Here we consider the fourth combination, that of comparing an observation set relative to a reference ranking. We introduce a new measurement that we call rank-biased recall to cover this scenario, and demonstrate its usefulness with a case study from multi-phase ranking. We also present a new top-weighted ‘ranking compared to ranking’ measurement, and show that it represents a complementary assessment to the previous rank-biased overlap mechanism, and possesses distinctive characteristics.