This paper presents a case study on predicting daily customer flow in a university's self-service restaurant. We conduct a systematic comparison of multiple machine learning techniques and diverse feature sets to identify the best-performing model within our experimental scope, aiming to improve forecasting accuracy in this dynamic environment. We analyze real-time data collected via RFID sensors from spring 2019 to 2024. To ensure high data quality, we apply a robust preprocessing pipeline, followed by careful feature engineering to select 10 distinct features, labeled M1 to M10. These features include temporal attributes such as day, month, year, and season, as well as external factors like local weather conditions, public holidays, and menu choices. We conduct a systematic comparison across all feature sets and identify M10 as the optimal combination. A key finding highlights the importance of handling missing data—particularly during the COVID-19 period—as one of the most critical steps in the preprocessing stage. To evaluate the predictive power of our selected features, we tested various machine learning models, including linear regression, random forest, extreme gradient boosting (XGBoost), and long short-term memory (LSTM). Our findings indicate that XGBoost achieves the lowest mean absolute error (MAE) and mean squared error (MSE) values. The XGBoost model outperforms other models across all the feature sets, M1 to M10. XGBoost is particularly effective because it uses past data and a technique called exponential smoothing to understand what customers will do in the short term. Our analysis identifies the most influential features as the previous day's customer count, exponential smoothing outputs, holidays, day of the week, and weather data. Overall, we recommend XGBoost as the most effective model for predicting daily customer numbers in similar contexts, given its superior accuracy across diverse feature sets.
Citation: Himat Shah, Niko Myller, Carolina Islas Sedano. Forecasting daily customer flow in restaurants: a multifactor machine learning approach[J]. Applied Computing and Intelligence, 2025, 5(2): 168-190. doi: 10.3934/aci.2025011
This paper presents a case study on predicting daily customer flow in a university's self-service restaurant. We conduct a systematic comparison of multiple machine learning techniques and diverse feature sets to identify the best-performing model within our experimental scope, aiming to improve forecasting accuracy in this dynamic environment. We analyze real-time data collected via RFID sensors from spring 2019 to 2024. To ensure high data quality, we apply a robust preprocessing pipeline, followed by careful feature engineering to select 10 distinct features, labeled M1 to M10. These features include temporal attributes such as day, month, year, and season, as well as external factors like local weather conditions, public holidays, and menu choices. We conduct a systematic comparison across all feature sets and identify M10 as the optimal combination. A key finding highlights the importance of handling missing data—particularly during the COVID-19 period—as one of the most critical steps in the preprocessing stage. To evaluate the predictive power of our selected features, we tested various machine learning models, including linear regression, random forest, extreme gradient boosting (XGBoost), and long short-term memory (LSTM). Our findings indicate that XGBoost achieves the lowest mean absolute error (MAE) and mean squared error (MSE) values. The XGBoost model outperforms other models across all the feature sets, M1 to M10. XGBoost is particularly effective because it uses past data and a technique called exponential smoothing to understand what customers will do in the short term. Our analysis identifies the most influential features as the previous day's customer count, exponential smoothing outputs, holidays, day of the week, and weather data. Overall, we recommend XGBoost as the most effective model for predicting daily customer numbers in similar contexts, given its superior accuracy across diverse feature sets.
| [1] |
T. Ahmad, C. Huanxin, D. Zhang, H. Zhang, Smart energy forecasting strategy with four machine learning models for climate-sensitive and non-climate sensitive conditions, Energy, 198 (2020), 117283. https://doi.org/10.1016/j.energy.2020.117283 doi: 10.1016/j.energy.2020.117283
|
| [2] |
R. Fok, A. Lasek, L. Jiye, A. An, Modeling daily guest count prediction, Big Data and Information Analytics, 1 (2016), 299–308. https://doi.org/10.3934/bdia.2016012 doi: 10.3934/bdia.2016012
|
| [3] |
G. Boomija, A. Anandaraj, S. Nandhini, S. Lavanya, Restaurant visitor time series forecasting using autoregressive integrated moving average, J. Comput. Theor. Nanosci., 15 (2018), 1590–1593. https://doi.org/10.1166/jctn.2018.7345 doi: 10.1166/jctn.2018.7345
|
| [4] |
J. Chen, H. Liu, J. He, Predicting the influence of group buying on the restaurant's popularity by online reviews, Proceedings of 12th International Conference on Fuzzy Systems and Knowledge Discovery (FSKD), 2015, 1068–1072. https://doi.org/10.1109/FSKD.2015.7382090 doi: 10.1109/FSKD.2015.7382090
|
| [5] |
M. Ching, R. de Dios Bulos, Improving restaurants' business performance using yelp data sets through sentiment analysis, Proceedings of the 3rd International Conference on E-commerce, E-Business and E-Government, 2019, 62–67. https://doi.org/10.1145/3340017.3340018 doi: 10.1145/3340017.3340018
|
| [6] |
E. Fernandes, S. Moro, P. Cortez, F. Batista, R. Ribeiro, A data-driven approach to measure restaurant performance by combining online reviews with historical sales data, Int. J. Hosp. Manag., 94 (2021), 102830. https://doi.org/10.1016/j.ijhm.2020.102830 doi: 10.1016/j.ijhm.2020.102830
|
| [7] |
Y. Fu, C. Soman, Real-time data infrastructure at uber, Proceedings of the 2021 International Conference on Management of Data, 2021, 2503–2516. https://doi.org/10.1145/3448016.3457552 doi: 10.1145/3448016.3457552
|
| [8] | S. Haas, E. Kuehl, J. Moran, K. Venkataraman, How restaurants can thrive in the next normal, McKinsey and Company, 2020. Available from: https://www.mckinsey.com/industries/retail/our-insights/how-restaurants-can-thrive-in-the-next-normal. |
| [9] |
T. Iha, I. Kawamitsu, A. Ohshiro, M. Nakamura, An lstm-based multivariate time series predicting method for number of restaurant customers in tourism resorts, Proceedings of 36th International Technical Conference on Circuits/Systems, Computers and Communications (ITC-CSCC), 2021, 1–4. https://doi.org/10.1109/ITC-CSCC52171.2021.9501432 doi: 10.1109/ITC-CSCC52171.2021.9501432
|
| [10] | S. Kazmi, A. Bozanta, M. Cevik, Time series forecasting for patient arrivals in online health services, Proceedings of the 31st Annual International Conference on Computer Science and Software Engineering, 2021, 43–52. |
| [11] | K. Y. Koay, D. L. T. Ong, I. Idris, N. S. B. Kamarulzaman, C. W. Cheah, The rise of solo dining: prediction and consumer profiling, Journal of Foodservice Business Research, in press. https://doi.org/10.1080/15378020.2024.2307682 |
| [12] |
L. Koivunen, S. Laato, S. Rauti, J. Naskali, P. Nissilä, P. Ojansivu, et al., Increasing customer awareness on food waste at university cafeteria with a sensor-based intelligent self-serve lunch line, Proceedings of IEEE International Conference on Engineering, Technology and Innovation (ICE/ITMC), 2020, 1–9. https://doi.org/10.1109/ICE/ITMC49519.2020.9198571 doi: 10.1109/ICE/ITMC49519.2020.9198571
|
| [13] |
P. Kongnim, N. Cooharojananone, T. Chavarnakul, Forecasting dining times in a full-service thai hotpot restaurant using random forest classifier, Proceedings of the 2023 8th International Conference on Mathematics and Artificial Intelligence, 2023, 15–18. https://doi.org/10.1145/3594300.3594303 doi: 10.1145/3594300.3594303
|
| [14] |
K. Kumaresan, P. Ganeshkumar, Software reliability prediction model with realistic assumption using time series (S)ARIMA model, J. Ambient Intell. Human Comput., 11 (2020), 5561–5568. https://doi.org/10.1007/s12652-020-01912-4 doi: 10.1007/s12652-020-01912-4
|
| [15] |
X. Ma, Y. Tian, C. Luo, Y. Zhang, Predicting future visitors of restaurants using big data, Proceedings of International Conference on Machine Learning and Cybernetics (ICMLC), 2018,269–274. https://doi.org/10.1109/ICMLC.2018.8526963 doi: 10.1109/ICMLC.2018.8526963
|
| [16] |
D. Dilkhush Mihirsen, J. T. Joseph, B. Renisha, Time series analysis for supply chain planning in restaurants, Proceedings of 5th International Conference on Computing, Communication and Security (ICCCS), 2020, 1–5. https://doi.org/10.1109/ICCCS49678.2020.9276872 doi: 10.1109/ICCCS49678.2020.9276872
|
| [17] |
G. Pawar, A. Ganesan, R. Moghe, B. Nayak, T. Khanna, K. M. Adusimilli, Learning to predict two-wheeler travel distance, Proceedings of the 5th Joint International Conference on Data Science and Management of Data (9th ACM IKDD CODS and 27th COMAD), 2022,153–161. https://doi.org/10.1145/3493700.3493726 doi: 10.1145/3493700.3493726
|
| [18] |
F. Rarh, D. Pojee, S. Zulphekari, V. Shah, Restaurant table reservation using time-series prediction, Proceedings of 2nd International Conference on Communication and Electronics Systems (ICCES), 2017,153–155. https://doi.org/10.1109/CESYS.2017.8321254 doi: 10.1109/CESYS.2017.8321254
|
| [19] |
T. Sarapisto, L. Koivunen, T. Mäkilä, A. Klami, P. Ojansivu, Camera-based meal type and weight estimation in self-service lunch line restaurants, Proceedings of 12th International Conference on Pattern Recognition Systems (ICPRS), 2022, 1–7. https://doi.org/10.1109/ICPRS54038.2022.9854056 doi: 10.1109/ICPRS54038.2022.9854056
|
| [20] |
A. Schmidt, W. U. Kabir, T. Hoque, Machine learning based restaurant sales forecasting, Mach. Learn. Knowl. Extr., 4 (2022), 105–130. https://doi.org/10.3390/make4010006 doi: 10.3390/make4010006
|
| [21] |
H. Shah, P. Fränti, Combining statistical, structural, and linguistic features for keyword extraction from web pages, Applied computing and intelligence, 2 (2022), 115–132. https://doi.org/10.3934/aci.2022007 doi: 10.3934/aci.2022007
|
| [22] |
J. Shen, M. O. Shafiq, Short-term stock market price trend prediction using a comprehensive deep learning system, J. Big Data, 7 (2020), 66. https://doi.org/10.1186/s40537-020-00333-6 doi: 10.1186/s40537-020-00333-6
|
| [23] |
M. Siek, K. Guswanto, Developing accurate predictive model using computational intelligence for optimal inventory management, Proceedings of 13th International Conference on Information and Communication Technology and System (ICTS), 2021,218–223. https://doi.org/10.1109/ICTS52701.2021.9608492 doi: 10.1109/ICTS52701.2021.9608492
|
| [24] |
C. Su, Passenger flow forecast of catering business based on autoregressive integrated moving average and smoothing index prediction model, Proceedings of International Signal Processing, Communications and Engineering Management Conference (ISPCEM), 2020, 53–57. https://doi.org/10.1109/ISPCEM52197.2020.00016 doi: 10.1109/ISPCEM52197.2020.00016
|
| [25] | T. Tanizaki, Y. Hanayama, T. Shimmura, Forecasting customers visiting using machine learning and characteristics analysis with low forecasting accuracy days, In: Advances in production management systems: towards smart and digital manufacturing, Cham: Springer, 2020,670–678. https://doi.org/10.1007/978-3-030-57997-5_77 |
| [26] | T. Tanizaki, S. Kozuma, T. Shimmura, Forecasting the number of customers visiting restaurants using machine learning and statistical method, In: Advances in production management systems: artificial intelligence for sustainable and resilient production systems, Cham: Springer, 2021,189–197. https://doi.org/10.1007/978-3-030-85906-0_21 |
| [27] |
S. Wang, Y. Zhang, X. Lin, Y. Hu, Q. Huang, B. Yin, Dynamic hypergraph structure learning for multivariate time series forecasting, IEEE Trans. Big Data, 10 (2024), 556–567. https://doi.org/10.1109/TBDATA.2024.3362188 doi: 10.1109/TBDATA.2024.3362188
|
| [28] |
D. Wu, S. Zhong, H. Song, J. Wu, Do topic and sentiment matter? predictive power of online reviews for hotel demand forecasting, Int. J. Hosp. Manag., 120 (2024), 103750. https://doi.org/10.1016/j.ijhm.2024.103750 doi: 10.1016/j.ijhm.2024.103750
|
| [29] |
Z. Yu, M. Tian, Z. Wang, B. Guo, T. Mei, Shop-type recommendation leveraging the data from social media and location-based services, ACM Trans. Knowl. Discov. D., 11 (2017), 1–21. https://doi.org/10.1145/2930671 doi: 10.1145/2930671
|
| [30] |
J. Zhang, C. Chow, Geosoca: exploiting geographical, social and categorical correlations for point-of-interest recommendations, Proceedings of the 38th International ACM SIGIR Conference on Research and Development in Information Retrieval, 2015,443–452. https://doi.org/10.1145/2766462.2767711 doi: 10.1145/2766462.2767711
|
| [31] |
X. Zhang, S. Yang, G. Srivastava, M. Chen, X. Cheng, Hybridization of cognitive computing for food services, Appl. Soft Comput., 89 (2020), 106051. https://doi.org/10.1016/j.asoc.2019.106051 doi: 10.1016/j.asoc.2019.106051
|
| [32] |
M. A. Zhao, R. Jayadi, Forecasting daily visitors and menu demands in an indonesian chain restaurant using support vector regression machine, Proceedings of International Conference on Artificial Intelligence and Mechatronics Systems (AIMS), 2021, 1–6. https://doi.org/10.1109/AIMS52415.2021.9466036 doi: 10.1109/AIMS52415.2021.9466036
|
| [33] |
L. Zhu, W. Yu, K. Zhou, X. Wang, W. Feng, P. Wang, et al., Order fulfillment cycle time estimation for on-demand food delivery, Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2020, 2571–2580. https://doi.org/10.1145/3394486.3403307 doi: 10.1145/3394486.3403307
|