In this study, we investigated and determined the financial indicators that significantly affect the performance of initial public offerings (IPOs) by using the Multi-Logistic Regression (MLR) method based on several financial variables as independent variables extracted from prospectuses. The prospectus serves as an important source of information for potential investors as it significantly increases the likelihood of attracting their attention. A total of twelve characteristics in two segments, namely "prospectus characteristics" and "financial ratios", were used as the features to assess IPO performance in the Saudi stock market in three categories: BELOWAVERAGE, AVERAGE, and ABOVE AVERAGE performance. The accuracy, recall, precision, and F1 scores as well as the confusion matrix and AUC were used to evaluate our dataset. Python was used, as an Integrated Development Environment (IDE), to develop the model. Based on the results of the classification analysis, 71.4% of the predictions were accurate with an AUC of 0.71 for the ABOVE class. The most significant financial variable that affected IPO returns was the subscription quarter (SQ), followed by sector code (SC), and recent year net profit margin (NPM%). The MLR model had a higher level of accuracy when compared with other machine learning algorithms. By using the model developed here, investors can improve their ability to predict the direction of the return on their investment in an IPO, at least for the first month. A practical application of the MLR method is discussed in the paper, along with how it can be used to predict the probability of the performance of an IPO in the future.
Citation: Mazin Fahad Alahmadi, Mustafa Tahsin Yilmaz. Prediction of IPO performance from prospectus using multinomial logistic regression, a machine learning model[J]. Data Science in Finance and Economics, 2025, 5(1): 105-135. doi: 10.3934/DSFE.2025006
In this study, we investigated and determined the financial indicators that significantly affect the performance of initial public offerings (IPOs) by using the Multi-Logistic Regression (MLR) method based on several financial variables as independent variables extracted from prospectuses. The prospectus serves as an important source of information for potential investors as it significantly increases the likelihood of attracting their attention. A total of twelve characteristics in two segments, namely "prospectus characteristics" and "financial ratios", were used as the features to assess IPO performance in the Saudi stock market in three categories: BELOWAVERAGE, AVERAGE, and ABOVE AVERAGE performance. The accuracy, recall, precision, and F1 scores as well as the confusion matrix and AUC were used to evaluate our dataset. Python was used, as an Integrated Development Environment (IDE), to develop the model. Based on the results of the classification analysis, 71.4% of the predictions were accurate with an AUC of 0.71 for the ABOVE class. The most significant financial variable that affected IPO returns was the subscription quarter (SQ), followed by sector code (SC), and recent year net profit margin (NPM%). The MLR model had a higher level of accuracy when compared with other machine learning algorithms. By using the model developed here, investors can improve their ability to predict the direction of the return on their investment in an IPO, at least for the first month. A practical application of the MLR method is discussed in the paper, along with how it can be used to predict the probability of the performance of an IPO in the future.
| [1] | Ali SS, Mubeen M, Hussain A (2018) Prediction of stock performance by using logistic regression model: Evidence from Pakistan Stock Exchange (PSX). In: Patron of the Conference, 15. https://doi.org/10.18488/journal.1007/2018.8.7/1007.7.247.258 |
| [2] | Anand E, Pandya G (2023) Prediction of IPO subscription—A logistic regression model. SDMIMD J Manage 14. Available from: https://informaticsjournals.co.in/index.php/sdmimd/article/view/33253. |
| [3] | Ashford K (2023) What is an IPO? Forbes. Available from: https://www.forbes.com/advisor/investing/initial-public-offering-ipo/. |
| [4] |
Baba B, Sevil G (2020) Predicting IPO initial returns using random forest. Borsa Istanbul Rev 20: 13–23. https://doi.org/10.1016/j.bir.2019.08.001 doi: 10.1016/j.bir.2019.08.001
|
| [5] |
Connelly B, Certo S, Ireland R, et al. (2011) Signaling theory: A review and assessment. J Manage 37: 39–67. https://doi.org/10.1177/0149206310388419 doi: 10.1177/0149206310388419
|
| [6] |
Daily CM, Certo ST, Dalton DR, et al. (2003) IPO underpricing: A meta-analysis and research synthesis. Entrep Theory Pract 27: 271–295. https://doi.org/10.1111/1540-8520.t01-1-00 doi: 10.1111/1540-8520.t01-1-00
|
| [7] | Das N, Sadhukhan B, Chatterjee T, et al. (2022) Effect of public sentiment on stock market movement prediction during the COVID-19 outbreak. Soc Netw Anal Min 12: 92. https://link.springer.com/article/10.1007/s13278-022-00919-3 |
| [8] |
DeCastro BR (2019) Cumulative ROC curves for discriminating three or more ordinal outcomes with cutpoints on a shared continuous measurement scale. Plos One 14: e0221433. https://doi.org/10.1371/journal.pone.0221433 doi: 10.1371/journal.pone.0221433
|
| [9] | Emidi C, Galán S (2022) Prospectus content as predictor of IPO outcome: A topic model approach. J Behav Exp Financ 34: 100669. Available from: https://lup.lub.lu.se/student-papers/search/publication/9083567. |
| [10] |
Feng CH, Disis ML, Cheng C, et al. (2022) Multimetric feature selection for analyzing multicategory outcomes of colorectal cancer: Random forest and multinomial logistic regression models. Lab Invest 102: 236–244. https://doi.org/10.1038/s41374-021-00662-x doi: 10.1038/s41374-021-00662-x
|
| [11] |
Hanbing Z, Jarrett JE, Pan X (2019) The post-IPO performance in the PRC. Int J Bus Manage 14: 109–138. http://doi.org/10.5539/ijbm.v14n11p109 doi: 10.5539/ijbm.v14n11p109
|
| [12] | Hosmer Jr DW, Lemeshow S, Sturdivant RX (2013) Applied logistic regression (3rd ed.). Wiley. https://doi.org/10.1002/9781118548387 |
| [13] |
Huang S, Mao Y, Wang C, et al. (2021) Public market players in the private world: Implications for the going-public process. Rev Financ Stud 34: 2411–2447. https://doi.org/10.1093/rfs/hhaa092 doi: 10.1093/rfs/hhaa092
|
| [14] | Kagan J (2023) Underpricing: Definition, how it works, and why it's used. Investopedia. Available from: https://www.investopedia.com/terms/u/underpricing.asp. |
| [15] | Kooli M, Meknassi S (2007) The survival profile of US IPO issuers: 1985–2005. J Wealth Manage 10: 105–119. Available from: https://www.pm-research.com/content/iijwealthmgmt/10/2/105. |
| [16] | Kraus M, Feuerriegel S (2017) Decision support from financial disclosures with deep neural networks and transfer learning. Decis Support Syst 104: 38–48. https://arXiv.org/abs/1710.03954 |
| [17] |
Loughran T, McDonald B (2011) When is a liability not a liability? Textual analysis, dictionaries, and 10-Ks. J Financ 66: 35–65. https://doi.org/10.1111/j.1540-6261.2010.01625.x doi: 10.1111/j.1540-6261.2010.01625.x
|
| [18] | Multinomial logistic regression. (2023) Bookdown. Available from: https://bookdown.org/chua/ber642_advanced_regression/multinomial-logistic-regression.html. |
| [19] | Multinomial logistic regression (2023) UCLA Institute for Digital Research and Education. Available from: https://stats.oarc.ucla.edu/stata/dae/multinomiallogistic-regression/. |
| [20] |
Nguyen TH, Shirai K, Velcin J (2015) Sentiment analysis on social media for stock movement prediction. Expert Syst Appl 42: 9603–9611. https://doi.org/10.1016/j.eswa.2015.07.052 doi: 10.1016/j.eswa.2015.07.052
|
| [21] |
Ni S (2023) Predicting IPO performance from prospectus sentiment. BCP Bus Manage 38: 3063–3075. https://doi.org/10.54691/bcpbm.v38i.4237 doi: 10.54691/bcpbm.v38i.4237
|
| [22] | Pedregosa F, Varoquaux G, Gramfort A, et al. (2011) Scikit-learn: Machine learning in Python. J Mach Learn Res 12: 2825–2830. Available from: https://jmlr.org/papers/v12/pedregosa11a.html. |
| [23] |
Qutab I, Malik KI, Arooj H (2022) Sentiment classification using multinomial logistic regression on roman Urdu text. Int J Innov Sci Technol 4: 223–335. https://doi.org/10.33411/IJIST/2022040204 doi: 10.33411/IJIST/2022040204
|
| [24] |
Ranganathan P, Pramesh C, Aggarwal R (2017) Common pitfalls in statistical analysis: Logistic regression. Perspect Clin Res 8: 148–151. https://doi.org/10.4103/picr.PICR_87_17 doi: 10.4103/picr.PICR_87_17
|
| [25] |
Schumaker RP, Chen H (2009) Textual analysis of stock market prediction using breaking financial news: The AZFin text system. ACM T Inform Syst 27: 1–19. https://doi.org/10.1145/1462198.1462204 doi: 10.1145/1462198.1462204
|
| [26] |
Schumaker RP, Zhang Y, Huang CN, et al. (2012) Evaluating sentiment in financial news articles. Decis Support Syst 53: 458–464. https://doi.org/10.1016/j.dss.2012.03.001 doi: 10.1016/j.dss.2012.03.001
|
| [27] |
Tao J, Deokar AV, Deshmukh A (2018) Analysing forward-looking statements in initial public offering prospectuses: A text analytics approach. J Bus Anal 1: 54–70. https://doi.org/10.1080/2573234X.2018.1507604 doi: 10.1080/2573234X.2018.1507604
|
| [28] | Upadhyay A, Bandyopadhyay G, Dutta A (2012) Forecasting stock performance in Indian market using multinomial logistic regression. J Bus Stud Q 3: 16–28. |
| [29] | Vryniotis V (2023) Machine learning tutorial: The multinomial logistic regression (softmax regression). Datumbox. Available from: https://blog.datumbox.com/machine-learning-tutorial-the-multinomial-logistic-regression-softmax-regression/. |