We consider load balancing in service systems with affinity relations between jobs and servers. Specifically, an arriving job can be assigned to a fast, primary server from a particular selection associated with this job or to a secondary server to be processed at a slower rate. Such job–server affinity relations can model network topologies based on geographical proximity, or data locality in cloud scenarios. We introduce load balancing schemes that assign jobs to primary servers if available, and otherwise to secondary servers. A novel coupling construction is developed to obtain stability conditions and performance bounds. We also conduct a fluid limit analysis for symmetric model instances, which reveals a delicate interplay between the model parameters and load balancing performance.