Under this project, we are working to improve the reliability and reproducibility of recommender system evaluations. This encompasses several lines of work:
- Research on evaluation methods, metrics, and protocols, looking to understand their behavior and failure modes.
- Development of the LensKit software to support recommender systems research and promote reproducibility.
Additional, pre-PIReT work on this theme is cataloged at Dr. Ekstrand’s page.
Fernando Diaz, Bhaskar Mitra, Michael D. Ekstrand, Asia J. Biega, and Ben Carterette. 2020. “Evaluating Stochastic Rankings with Expected Exposure”. In Proceedings of the 29th ACM International Conference on Information and Knowledge Management (CIKM ‘20). ACM, 10 pp. DOI:10.1145/3340531.3411962.
Mucun Tian and Michael D. Ekstrand. 2020. “Estimating Error and Bias in Offline Evaluation Results”. Short paper in Proceedings of the 2020 Conference on Computer-Human Interaction and Information Retrieval (CHIIR ‘20). ACM, 5 pp. DOI:10.1145/3343413.3378004.
Michael D. Ekstrand. 2018. “The LKPY Package for Recommender Systems Experiments”. In Proceedings of the REVEAL 2018 Workshop on Offline evaluation for recommender systems, co-located with ACM RecSys 2018.
Mucun Tian and Michael D. Ekstrand. 2018. “Monte Carlo Estimates of Evaluation Metric Error and Bias”. In Proceedings of the REVEAL 2018 Workshop on Offline evaluation for recommender systems, co-located with ACM RecSys 2018.
Nicola Ferro, Norbert Fuhr, Gregory Grefenstette, Joseph A. Konstan, Pablo Castells, Elizabeth M. Daly, Thierry Declerck, Michael D. Ekstrand, Werner Geyer, Julio Gonzalo, Tsvi Kuflik, Krister Lindén, Bernardo Magnini, Jian-Yun Nie, Raffaele Perego, Bracha Shapira, Ian Soboroff, Nava Tintarev, Karin Verspoor, Martijn C. Willemsen, and Justin Zobel. 2018. “The Dagstuhl Perspectives Workshop on Performance Modeling and Prediction”. The Dagstuhl Perspectives Workshop on Performance Modeling and Prediction. SIGIR Forum 52(1) (June 2018), 91–101.
Michael D. Ekstrand and Vaibhav Mahant. 2017. “Sturgeon and the Cool Kids: Problems with Top-N Recommender Evaluation”. In Proceedings of the 30th International Florida Artificial Intelligence Research Society Conference (FLAIRS 2017).