We present a summary of important computational issues and opportunities that arise from the use of semi-aggregate data (where the explanatory data for choice scenarios are not necessarily unique for each decision-maker) in discrete choice models. These data are encountered with large transactional databases that have limited consumer information, a common feature in some transportation planning applications, such as airline itinerary choice modeling. We developed a freeware software package called Larch, written in Python and C++, to take advantage of these kind of data to greatly speed the estimation of discrete choice model parameters. Benchmarking experiments against Stata (a commonly used commercial package), Biogeme (a commonly used freeware package), and ALOGIT (a highly specialized commercial package for discrete choice modeling) based on an industry dataset for airline itinerary choice modeling applications shows that the size of the input estimation files are 50–100 times larger in Stata and Biogeme, respectively. Estimation times are also much faster in ALOGIT and Larch; e.g., for a small itinerary choice problem, a multinomial logit model estimated in ALOGIT or Larch converged in less than one second whereas the same model took almost 15 seconds in Stata and more than three minutes in Biogeme.
- Airline itinerary choice models
- Discrete choice models
- Semi-aggregate data