Traditional methods for identifying strategic partners often rely on human judgment and informal networks, which would likely cause poor, strongly-biased choices, and missed opportunities. In this study, we aimed to support partner selection through a machine learning approach using publicly available startup data. First, we addressed class imbalance using SMOTE and generated negative labels through heuristic clustering. A Logistic Regression model was then trained to classify high-potential startups based on features such as technological capabilities, funding phase, innovation hub association, founders' structure, and geographical proximity. As for the results, the model achieved solid performance with 88.2% accuracy, 80% precision, and an F1-score of 0.727. Key patterns showed that technology prowess had the most prominent impact on defining a 'high-potential' startup partner, closely followed by their funding stage and close geographical distance to the focal company. This study provides a replicable, data-driven framework to implement into the early partner screening stage for companies, especially SMEs, along with insights on selection criteria for startups.
Citation: Dung Hai Dinh, Van Thanh Tran, Nicole Ondrusch. Classifying high-potential startups for strategic partnerships using machine learning – The case of german digital startups[J]. Data Science in Finance and Economics, 2026, 6(1): 1-30. doi: 10.3934/DSFE.2026001
Traditional methods for identifying strategic partners often rely on human judgment and informal networks, which would likely cause poor, strongly-biased choices, and missed opportunities. In this study, we aimed to support partner selection through a machine learning approach using publicly available startup data. First, we addressed class imbalance using SMOTE and generated negative labels through heuristic clustering. A Logistic Regression model was then trained to classify high-potential startups based on features such as technological capabilities, funding phase, innovation hub association, founders' structure, and geographical proximity. As for the results, the model achieved solid performance with 88.2% accuracy, 80% precision, and an F1-score of 0.727. Key patterns showed that technology prowess had the most prominent impact on defining a 'high-potential' startup partner, closely followed by their funding stage and close geographical distance to the focal company. This study provides a replicable, data-driven framework to implement into the early partner screening stage for companies, especially SMEs, along with insights on selection criteria for startups.
| [1] |
Aithal PS, Aithal S (2018) Study of Various General-Purpose Technologies and Their Comparison Towards Developing Sustainable Society. Int J Manag Technol Soc Sci 3: 16–33. http://doi.org/10.5281/zenodo.1409476 doi: 10.5281/zenodo.1409476
|
| [2] |
Allmendinger MP, Berger ESC (2020) Selecting Corporate Firms for Collaborative Innovation: Entrepreneurial Decision Making in Asymmetric Partnerships. Int J Innov Manag 24: 2050003. https://doi.org/10.1142/s1363919620500036 doi: 10.1142/s1363919620500036
|
| [3] |
Attah RU, Garba BMP, Gil-Ozoudeh I, et al. (2024) Evaluating strategic technology partnerships: Providing conceptual insights into their role in corporate strategy and technological innovation. Int J Front Sci Technol Res 7: 077–089. https://doi.org/10.53294/ijfstr.2024.7.2.0058 doi: 10.53294/ijfstr.2024.7.2.0058
|
| [4] |
Baiman S, Fischer PE, Rajan MV (2000) Information, Contracting, and Quality Costs. Manag Sci 46: 776–789. https://doi.org/10.1287/mnsc.46.6.776.11939 doi: 10.1287/mnsc.46.6.776.11939
|
| [5] |
Bettinelli C, Bergamaschi M, Kokash R, et al. (2016) Process Innovation, Alliances, and the Interplay of Firm Age: Early Evidence from Italian Small Firms. Int Bus Res 9: 86. https://doi.org/10.5539/ibr.v9n5p86 doi: 10.5539/ibr.v9n5p86
|
| [6] |
Broekel T, Binder M (2007) The Regional Dimension of Knowledge Transfers—A Behavioral Approach. Ind Innov 14: 151–175. https://doi.org/10.1080/13662710701252500 doi: 10.1080/13662710701252500
|
| [7] |
Cardamone P, Pupo V, Ricotta F (2015) University Technology Transfer and Manufacturing Innovation: The Case of Italy. Rev Policy Res 32: 297–322. https://doi.org/10.1111/ropr.12125 doi: 10.1111/ropr.12125
|
| [8] | Chang H, Gausemeier J, Ihmels S, et al. (2008) Innovative Technology Management System with Bibliometrics in the Context of Technology Intelligence, In: Castillo, O., Xu, L., Ao, S.-I. (Eds.), Trends in Intelligent Systems and Computer Engineering, Lecture Notes in Electrical Engineering. Springer US, Boston, MA, 349–361. https://doi.org/10.1007/978-0-387-74935-8_25 |
| [9] |
Chang MH, Liou JJH, Lo HW (2019) A Hybrid MCDM Model for Evaluating Strategic Alliance Partners in the Green Biopharmaceutical Industry. Sustainability 11: 4065. https://doi.org/10.3390/su11154065 doi: 10.3390/su11154065
|
| [10] |
Chawla NV, Bowyer KW, Hall LO, et al. (2002) SMOTE: Synthetic Minority Over-sampling Technique. J Artif Intell Res 16: 321–357. https://doi.org/10.1613/jair.953 doi: 10.1613/jair.953
|
| [11] | Cumming DJ, Grilli L, Murtinu S (2013) Governmental and Independent Venture Capital Investments in Europe: A Firm-Level Performance Analysis. SSRN Electron J. https://doi.org/10.2139/ssrn.2294746 |
| [12] |
Daiy AK, Shen KY, Huang JY, et al. (2021) A Hybrid MCDM Model for Evaluating Open Banking Business Partners. Mathematics 9: 587. https://doi.org/10.3390/math9060587 doi: 10.3390/math9060587
|
| [13] |
Desalegn G, Tangl A, Boros A (2024) The mediating role of customer attitudes in the linkage between e-commerce and the digital economy. Natl Account Rev 6: 245–265. https://doi.org/10.3934/NAR.2024011 doi: 10.3934/NAR.2024011
|
| [14] |
Dube L, Verster T (2023) Enhancing classification performance in imbalanced datasets: A comparative analysis of machine learning models. Data Sci Financ Econ 3: 354–379. https://doi.org/10.3934/DSFE.2023021 doi: 10.3934/DSFE.2023021
|
| [15] | Ellison Z (2023) Where Is Your Startup in Its Funding Lifecycle? BuitIn. com. Available from: https://builtin.com/founders-entrepreneurship/startup-funding-lifecycle. |
| [16] |
Eom BY, Lee K (2010) Determinants of industry–academy linkages and, their impact on firm performance: The case of Korea as a latecomer in knowledge industrialization. Res Policy 39: 625–639. https://doi.org/10.1016/j.respol.2010.01.015 doi: 10.1016/j.respol.2010.01.015
|
| [17] | Federal Ministry for Economic Affairs and Climate Action (n.d.) START-UP FINDER. |
| [18] | Frei S, Chatterji NS, Bartlett PL (2022) Benign Overfitting without Linearity: Neural Network Classifiers Trained by Gradient Descent for Noisy Linear Data. Conference on Learning Theory. https://doi.org/10.48550/arXiv.2202.05928 |
| [19] |
Fritsch M, Kauffeld-Monz M (2010) The impact of network structure on knowledge transfer: an application of social network analysis in the context of regional innovation networks. Ann Reg Sci 44: 21–38. https://doi.org/10.1007/s00168-008-0245-8 doi: 10.1007/s00168-008-0245-8
|
| [20] |
Geum Y, Lee S, Yoon B, et al. (2013) Identifying and evaluating strategic partners for collaborative R&D: Index-based approach using patents and publications. Technovation 33: 211–224. https://doi.org/10.1016/j.technovation.2013.03.012 doi: 10.1016/j.technovation.2013.03.012
|
| [21] |
Gigerenzer G, Gaissmaier W (2011) Heuristic Decision Making. Annu Rev Psychol 62: 451–482. https://doi.org/10.1146/annurev-psych-120709-145346 doi: 10.1146/annurev-psych-120709-145346
|
| [22] |
Giglio C, Corvello V, Coniglio IM, et al. (2025) Cooperation between large companies and start-ups: An overview of the current state of research. Eur Manag J 43: 142–153. https://doi.org/10.1016/j.emj.2023.08.002 doi: 10.1016/j.emj.2023.08.002
|
| [23] |
Giunta A, Pericoli FM, Pierucci E (2016) University–Industry collaboration in the biopharmaceuticals: the Italian case. J Technol Transf 41: 818–840. https://doi.org/10.1007/s10961-015-9402-2 doi: 10.1007/s10961-015-9402-2
|
| [24] |
Gonzales JT (2023) Implications of AI innovation on economic growth: a panel data study. J Econ Struct 12: 13. https://doi.org/10.1186/s40008-023-00307-w doi: 10.1186/s40008-023-00307-w
|
| [25] |
Guertler MR, Sick N (2021) Exploring the enabling effects of project management for SMEs in adopting open innovation – A framework for partner search and selection in open innovation projects. Int J Proj Manag 39: 102–114. https://doi.org/10.1016/j.ijproman.2020.06.007 doi: 10.1016/j.ijproman.2020.06.007
|
| [26] |
Günsel A, Dodourova M, Tükel Ergün A, et al. (2019) Research on effectiveness of technology transfer in technology alliances: evidence from Turkish SMEs. Technol Anal Strateg Manag 31: 279–291. https://doi.org/10.1080/09537325.2018.1495836 doi: 10.1080/09537325.2018.1495836
|
| [27] |
He H, Garcia EA (2009) Learning from Imbalanced Data. IEEE Trans Knowl Data Eng 21: 1263–1284. https://doi.org/10.1109/TKDE.2008.239 doi: 10.1109/TKDE.2008.239
|
| [28] | Hostettler S, Bolay JC (2014) Technologies and Partnerships, In: Bolay, J.-C., Hostettler, S., Hazboun, E. (Eds.), Technologies for Sustainable Development. Springer International Publishing, Cham, 3–9. https://doi.org/10.1007/978-3-319-00639-0_1 |
| [29] |
Hu C, Zhu X, Liu R, et al. (2024) The impact of external technology acquisition on enterprise innovation performance: the moderating effect of geographical distance. Technol Anal Strateg Manag 36: 2875–2889. https://doi.org/10.1080/09537325.2023.2174359 doi: 10.1080/09537325.2023.2174359
|
| [30] |
Islahulhaq WW, Iis DR (2021) Classification of Non-Performing Financing Using Logistic Regression and Synthetic Minority Over-sampling Technique-Nominal Continuous (SMOTE-NC). Int J Adv Soft Comput Its Appl 13: 116–128. https://doi.org/10.15849/IJASCA.211128.09 doi: 10.15849/IJASCA.211128.09
|
| [31] | Jeon J, Lee C, Park Y (2011) How to Use Patent Information to Search Potential Technology Partners in Open Innovation. J Intellect Prop RIGHTS 16: 385–393. |
| [32] | Joseph M Hilbe (2015) Practical Guide to Logistic Regression. CRC Press - Taylor & Francis Group. https://doi.org/10.1201/b18678 |
| [33] |
Kang I, Han S, Shin GC (2014) A process leading to strategic alliance outcome: The case of IT companies in China, Japan and Korea. Int Bus Rev 23: 1127–1138. https://doi.org/10.1016/j.ibusrev.2014.03.008 doi: 10.1016/j.ibusrev.2014.03.008
|
| [34] |
Kang J, Lee J, Jang D, et al. (2019) A Methodology of Partner Selection for Sustainable Industry-University Cooperation Based on LDA Topic Model. Sustainability 11: 3478. https://doi.org/10.3390/su11123478 doi: 10.3390/su11123478
|
| [35] |
Kopka A, Fornahl D (2024) Artificial intelligence and firm growth — catch-up processes of SMEs through integrating AI into their knowledge bases. Small Bus Econ 62: 63–85. https://doi.org/10.1007/s11187-023-00754-6 doi: 10.1007/s11187-023-00754-6
|
| [36] |
Lee K, Park I, Yoon B (2016) An Approach for R&D Partner Selection in Alliances between Large Companies, and Small and Medium Enterprises (SMEs): Application of Bayesian Network and Patent Analysis. Sustainability 8: 117. https://doi.org/10.3390/su8020117 doi: 10.3390/su8020117
|
| [37] | Li J, Zheng K, Xu H, et al. (2019) The Strength of the Weakest Supervision: Topic Classification Using Class Labels, In: 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Student Research Workshop. Presented at the North American Chapter of the Association for Computational Linguistics, Association for Computational Linguistics, 22–28. |
| [38] |
Li Y, Yan K (2025) Prediction of bank credit customers churn based on machine learning and interpretability analysis. Data Sci Financ Econ 5: 19–34. https://doi.org/10.3934/DSFE.2025002 doi: 10.3934/DSFE.2025002
|
| [39] |
Liu Y, Brown SD (2013) Comparison of five iterative imputation methods for multivariate classification. Chemom Intell Lab Syst 120: 106–115. https://doi.org/10.1016/j.chemolab.2012.11.010 doi: 10.1016/j.chemolab.2012.11.010
|
| [40] |
Lo HW, Chang DS, Huang LT (2022) Sustainable Strategic Alliance Partner Selection Using a Neutrosophic-Based Decision-Making Model: A Case Study in Passive Component Manufacturing. Complexity 2022: 1–18. https://doi.org/10.1155/2022/9483256 doi: 10.1155/2022/9483256
|
| [41] | Richardson M (2009) Principal Component Analysis. Available from: http://www.sdss.jhu.edu/~szalay/class/2024/etc/SignalProcPCA.pdf. |
| [42] |
Meulman F, Reymen IMMJ, Podoynitsyna KS, et al. (2018) Searching for Partners in Open Innovation Settings: How to Overcome the Constraints of Local Search. Calif Manage Rev 60: 71–97. https://doi.org/10.1177/0008125617745087 doi: 10.1177/0008125617745087
|
| [43] | Mockler RJ (1999) Multinational strategic alliances, Wiley series in practical strategy. Wiley, Chichester ; New York. |
| [44] |
Mohnen P, Rõigas K, Varblane U (2018) Which firms use universities as cooperation partners? - A comparative view in Europe. Int J Technol Manag 76: 32. https://doi.org/10.1504/IJTM.2018.10009595 doi: 10.1504/IJTM.2018.10009595
|
| [45] |
Mori J, Kajikawa Y, Kashima H, et al. (2012) Machine learning approach for finding business partners and building reciprocal relationships. Expert Syst Appl 39: 10402–10407. https://doi.org/10.1016/j.eswa.2012.01.202 doi: 10.1016/j.eswa.2012.01.202
|
| [46] | Niederhauser L, Waefler T, Huber S, et al. (2022) Matching B2B-Partners in the Sharing Economy. The 13th International Conference on Applied Human Factors and Ergonomics (AHFE 2022), 56: 10-16. https://doi.org/10.54941/ahfe1002246 |
| [47] | Öndas V (2021) A Study on High-tech Startup Failure. https://doi.org/10.13140/RG.2.2.25524.37765 |
| [48] |
Partal MO, Gönel F (2025) Startup performance from an economic development perspective: impact evaluation after funding stage. Glob Bus Econ Rev 32: 357–376. https://doi.org/10.1504/GBER.2025.146493 doi: 10.1504/GBER.2025.146493
|
| [49] | Patel N (2015) 90% Of Startups Fail: Here's What You Need to Know About The 10%. Forbes. |
| [50] | Poslavskaya E, Korolev A (2023) Encoding categorical data: Is there yet anything "hotter" than one-hot encoding? https://doi.org/10.48550/arXiv.2312.16930 |
| [51] | Poyiadzi R, Bacaicoa-Barber D, Cid-Sueiro J, et al. (2022) The Weak Supervision Landscape. https://doi.org/10.1109/PerComWorkshops53856.2022.9767420 |
| [52] | PWC (2009) Strategic Partnerships: The Real Deal? |
| [53] | Raghunathan TE, Lepkowski JM, Van Hoewyk J (2001) A multivariate technique for multiply imputing missing values using a sequence of regression models. Surv Methodol 27: 85–95. |
| [54] |
Raghuvanshi J, Shukla D, Dahiya R (2022) Parameters for selecting the partners in locally owned renewable energy small-scale project for achieving energy security in Atlantic Canada. Energy Sustain Dev 68: 512–524. https://doi.org/10.1016/j.esd.2022.04.016 doi: 10.1016/j.esd.2022.04.016
|
| [55] |
Ratner A, Bach SH, Ehrenberg H, et al. (2017) Snorkel: rapid training data creation with weak supervision. Proc VLDB Endow 11: 269–282. https://doi.org/10.14778/3157794.3157797 doi: 10.14778/3157794.3157797
|
| [56] |
Riepe J, Uhl K (2020) Startups' demand for non-financial resources: Descriptive evidence from an international corporate venture capitalist. Financ Res Lett 36: 101321. https://doi.org/10.1016/j.frl.2019.101321 doi: 10.1016/j.frl.2019.101321
|
| [57] |
Rothaermel FT (2001) Complementary assets, strategic alliances, and the incumbent's advantage: an empirical study of industry and firm effects in the biopharmaceutical industry. Res Policy 30: 1235–1251. https://doi.org/10.1016/s0048-7333(00)00142-6 doi: 10.1016/s0048-7333(00)00142-6
|
| [58] |
Rubin DB (1976) Inference and missing data. Biometrika 63: 581–592. https://doi.org/10.1093/biomet/63.3.581 doi: 10.1093/biomet/63.3.581
|
| [59] |
Rybnicek R, Königsgruber R (2019) What makes industry–university collaboration succeed? A systematic review of the literature. J Bus Econ 89: 221–250. https://doi.org/10.1007/s11573-018-0916-6 doi: 10.1007/s11573-018-0916-6
|
| [60] |
Sebo P (2022) Are Accuracy Parameters Useful for Improving the Performance of Gender Detection Tools? A Comparative Study with Western and Chinese Names. J Gen Intern Med 37: 4024–4027. https://doi.org/10.1007/s11606-022-07469-6 doi: 10.1007/s11606-022-07469-6
|
| [61] | Sinha RK (2024) Book review: Christoph Molnar. 2020. Interpretable Machine Learning: A Guide for Making Black Box Models Explainable. Metamorph. J Manag Res 23: 92–93. https://doi.org/10.1177/09726225241252009 |
| [62] |
Solesvik M, Gulbrandsen M (2013) Partner Selection for Open Innovation. Technol Innov Manag Rev 3: 11–16. https://doi.org/10.22215/timreview/674 doi: 10.22215/timreview/674
|
| [63] | Steve F, Michael S, Susanne H, et al. (2025) 2025 technology industry outlook. Deloitte. Available from: https://www2.deloitte.com/us/en/insights/industry/technology/technology-media-telecom-outlooks/technology-industry-outlook.html. |
| [64] | Sun H, Peng X, Tang X, et al. (2024) The mechanism of high carbon economy, local tax system and urban environmental pollution: Insights from China. Indoor Built Environ. https://doi.org/10.1177/1420326X241258869 |
| [65] | Tello S, Latham S, Kijewski V (2010) Individual choice or institutional practice: Which guides the technology transfer decision‐making process? Manag Decis 48: 1261–1281. https://doi.org/10.1108/00251741011076780 |
| [66] |
Trela K, Campbell Y, Dornbusch F, et al. (2021) How to Find New Industry Partners for Public Research: A Classification Approach. IEEE Trans Eng Manag 68: 1214–1231. https://doi.org/10.1109/TEM.2020.2992060 doi: 10.1109/TEM.2020.2992060
|
| [67] |
Weiblen T, Chesbrough HW (2015) Engaging with Startups to Enhance Corporate Innovation. Calif Manage Rev 57: 66–90. https://doi.org/10.1525/cmr.2015.57.2.66 doi: 10.1525/cmr.2015.57.2.66
|
| [68] |
Williamson OE (1979) Transaction-Cost Economics: The Governance of Contractual Relations. J Law Econ 22: 233–261. https://doi.org/10.1086/466942 doi: 10.1086/466942
|
| [69] |
Wongvorachan T, He S, Bulut O (2023) A Comparison of Undersampling, Oversampling, and SMOTE Methods for Dealing with Imbalanced Classification in Educational Data Mining. Information 14: 54. https://doi.org/10.3390/info14010054 doi: 10.3390/info14010054
|
| [70] | World Economic Forum (2018) Collaboration between Start-Ups and Corporates A Practical Guide for Mutual Understanding (White Paper). World Economic Forum. |
| [71] |
Wu C, Barnes D (2011) A literature review of decision-making models and approaches for partner selection in agile supply chains. J Purch Supply Manag 17: 256–274. https://doi.org/10.1016/j.pursup.2011.09.002 doi: 10.1016/j.pursup.2011.09.002
|
| [72] |
Wu C, Barnes D (2014) Partner selection in agile supply chains: a fuzzy intelligent approach. Prod Plan Control 25: 821–839. https://doi.org/10.1080/09537287.2013.766037 doi: 10.1080/09537287.2013.766037
|
| [73] |
Wu L, Sun L, Chang Q, et al. (2022) How do digitalization capabilities enable open innovation in manufacturing enterprises? A multiple case study based on resource integration perspective. Technol Forecast Soc Change 184: 122019. https://doi.org/10.1016/j.techfore.2022.122019 doi: 10.1016/j.techfore.2022.122019
|
| [74] |
Yan A, Ma H, Zhu D, et al. (2024) Digital transformation and corporate resilience: Evidence from China during the COVID-19 pandemic. Quant Financ Econ 8: 779–814. https://doi.org/10.3934/QFE.2024030 doi: 10.3934/QFE.2024030
|
| [75] |
Zhang ML, Zhou ZH (2014) A Review on Multi-Label Learning Algorithms. IEEE Trans Knowl Data Eng 26: 1819–1837. https://doi.org/10.1109/TKDE.2013.39 doi: 10.1109/TKDE.2013.39
|
DSFE-06-01-001-s001.pdf |
![]() |