This study examined whether a mature, empirically validated generative artificial intelligence (GenAI) intervention framework can produce reliable process evidence when deployed in unsupervised take-home assessments. Twenty-five group submissions from two cybersecurity management cohorts, designed under the Structured AI-Guided Education (SAGE) framework, were audited using the five-check SAGE Audit Protocol. Only 3 of 25 submissions (12%) produced evidence chains that were substantially auditable, and full traceability between documented AI outputs and human evaluation claims was not achieved in any submission. Across both cohorts, the remaining submissions showed mismatches between human-authored tables and AI outputs, generic compliance text, workflow-focused reflections, and process records that were often indistinguishable from reconstructed accounts. The paper identifies a compliance gradient in which conscientious students who follow the process in good faith incur a disproportionate documentation burden, while simulated compliance can produce comparable outputs with less effort. It also highlights a marker's dilemma, where auditing AI-supported process evidence approximately doubles marking time and shifts educators from assessing learning to interpreting logs. The paper argues that conventional unsupervised take-home assessment can no longer function as sufficient standalone assurance in the GenAI era. Rather than abandoning AI-integrated learning or defaulting to low-technology examinations, the findings support assessment architectures in which assurance is structurally embedded. The Defend step subsequently added to the SAGE framework operationalises this principle.
Citation: Mahmoud Elkhodr, Ergun Gide. Embedding assurance within learning: Empirical evidence from the SAGE framework for repositioning take-home assessment in AI-integrated higher education[J]. STEM Education, 2026, 6(4): 584-605. doi: 10.3934/steme.2026024
This study examined whether a mature, empirically validated generative artificial intelligence (GenAI) intervention framework can produce reliable process evidence when deployed in unsupervised take-home assessments. Twenty-five group submissions from two cybersecurity management cohorts, designed under the Structured AI-Guided Education (SAGE) framework, were audited using the five-check SAGE Audit Protocol. Only 3 of 25 submissions (12%) produced evidence chains that were substantially auditable, and full traceability between documented AI outputs and human evaluation claims was not achieved in any submission. Across both cohorts, the remaining submissions showed mismatches between human-authored tables and AI outputs, generic compliance text, workflow-focused reflections, and process records that were often indistinguishable from reconstructed accounts. The paper identifies a compliance gradient in which conscientious students who follow the process in good faith incur a disproportionate documentation burden, while simulated compliance can produce comparable outputs with less effort. It also highlights a marker's dilemma, where auditing AI-supported process evidence approximately doubles marking time and shifts educators from assessing learning to interpreting logs. The paper argues that conventional unsupervised take-home assessment can no longer function as sufficient standalone assurance in the GenAI era. Rather than abandoning AI-integrated learning or defaulting to low-technology examinations, the findings support assessment architectures in which assurance is structurally embedded. The Defend step subsequently added to the SAGE framework operationalises this principle.
| [1] | M. Elkhodr and E. Gide, AI leads, humans lead, or collaborate? empirical findings and the SAGE roadmap for embedding GenAI in systems analysis and design education, STEM Education, 6 (2026), 194–229. |
| [2] | S. Rafiq, Qurat-ul-Ain and A. Afzal, The role of AI detection tools in upholding academic integrity: An evaluation of their effectiveness, Contemporary Journal of Social Science Review, 3 (2025), 901–915. https://contemporaryjournal.com/index.php/14/article/view/379. |
| [3] | J. Fleckenstein, J. Meyer, T. Jansen, O. Köller, S. D. Keller and J. Möller, Do teachers spot AI? evaluating the detectability of AI-generated texts among student essays, Computers and Education: Artificial Intelligence, 6 (2024), 100209. |
| [4] | J. Luo, A critical review of GenAI policies in higher education assessment: a call to reconsider the "originality" of students' work, Assessment & Evaluation in Higher Education, 49 (2024), 651–664. |
| [5] | Y. An, J. H. Yu and S. James, Investigating the higher education institutions' guidelines and policies regarding the use of generative AI in teaching, learning, research, and administration, International Journal of Educational Technology in Higher Education, 22 (2025), 10. |
| [6] | Y. Jin, L. Yan, V. Echeverria, D. Gašević and R. Martinez-Maldonado, Generative AI in higher education: A global perspective of institutional adoption policies and guidelines, Computers and Education: Artificial Intelligence, 8 (2025), 100348. |
| [7] | Y. Dai, S. Lai, C. P. Lim and A. Liu, University policies on generative AI in Asia: Promising practices, gaps, and future directions, Journal of Asian Public Policy, 18 (2025), 260–281. |
| [8] | C. K. Y. Chan, A comprehensive AI policy education framework for university teaching and learning, International Journal of Educational Technology in Higher Education, 20 (2023), 38. |
| [9] | M. Perkins, L. Furze, J. Roe and J. MacVaugh, The artificial intelligence assessment scale (AIAS): A framework for ethical integration of generative AI in educational assessment, Journal of University Teaching and Learning Practice, 21 (2024), q3azde36. |
| [10] | Z. Quince, J. Munn and R. Greenaway, Adapting assessment in the age of generative AI: The AAM-GenAI framework (practice report), Scholarship of Learning and Teaching Paper 28, Southern Cross University, 2025. |
| [11] | M. Elkhodr, E. Gide, R. Wu and O. Darwish, ICT students' perceptions towards ChatGPT: An experimental reflective lab analysis, STEM Education, 3 (2023), 70–88. |
| [12] | M. Elkhodr and E. Gide, The SAGE framework for developing critical thinking and responsible generative AI use in cybersecurity education, Discover Education, 4 (2025), 517. |
| [13] | M. Elkhodr and E. Gide, Embedding generative AI into systems analysis and design curriculum: Framework, case study, and cross-campus empirical evidence, arXiv preprint arXiv: 2511.17515, 2025. https://arXiv.org/abs/2511.17515. |
| [14] | M. Elkhodr and E. Gide, AI as critic: Validating SAGE pedagogy for human authority and responsible GenAI use in systems analysis and design education, EdarXiv Preprints, 2025. https://osf.io/preprints/edarXiv/8j3xf. |
| [15] | M. Elkhodr, A. Azra and E. Gide, How first-year students actually use ChatGPT in permitted assessments: Empirical typologies, verification gaps, and the policy-practice divide, Research Square Preprints, 2026. |
| [16] | H. Ranasinghe, E. Gide and M. Elkhodr, The significance of GenAI empowered ERP systems course teaching in quality education, in 2024 21st International Conference on Information Technology Based Higher Education and Training (ITHET), 2024, 1–7. |
| [17] | R. Sandu, E. Gide and M. Elkhodr, The role and impact of ChatGPT in educational practices: insights from an Australian higher education case study, Discover Education, 3 (2024), 71. |
| [18] | T. Corbin, P. Dawson and D. Liu, Talk is cheap: why structural assessment changes are needed for a time of GenAI, Assessment & Evaluation in Higher Education, 50 (2025), 1087–1097. |
| [19] | S. Leaton Gray, D. Edsall and D. Parapadakis, AI-based digital cheating at university, and the case for new ethical pedagogies, Journal of Academic Ethics, 23 (2025), 2069–2086. |
| [20] | B. L. Moorhouse, M. A. Yeo and Y. Wan, Generative AI tools and assessment: Guidelines of the world's top-ranking universities, Computers and Education Open, 5 (2023), 100151. |
| [21] | H. Tomisu, J. Ueda and T. Yamanaka, The cognitive mirror: A framework for AI-powered metacognition and self-regulated learning, Frontiers in Education, 10 (2025), 1697554. |
| [22] | S. He and Y. Cui, A systematic review of the use of log-based process data in computer-based assessments, Computers & Education, 228 (2025), 105245. |
| [23] | M. Elkhodr and E. Gide, Embedding generative AI in curriculum: the SAGE framework and evidence-based implementation guide, 2026. https://doi.org/10.5281/zenodo.18383951. |
| [24] | M. Elkhodr and E. Gide, SAGE framework: structured AI-guided education, https://sage-framework.com, 2026, Accessed: 19 May 2026. |
| [25] | Australian Cyber Security Centre, Essential Eight Maturity Model, Australian Cyber Security Centre, 2023. https://www.cyber.gov.au/sites/default/files/2023-11/PROTECT%20-%20Essential%20Eight%20Maturity%20Model%20%28November%202023%29.pdf. |
| [26] | D.-W. Kim, J.-Y. Choi and K.-H. Han, Risk management-based security evaluation model for telemedicine systems, BMC Medical Informatics and Decision Making, 20 (2020), 106. |
| [27] | J. M. Lodge, S. Howard, M. Bearman, P. Dawson and Associates, Assessment reform for the age of artificial intelligence, Technical report, Tertiary Education Quality and Standards Agency, 2023. https://www.teqsa.gov.au/sites/default/files/2023-09/assessment-reform-age-artificial-intelligence-discussion-paper.pdf. |
| [28] | A. Bridgeman, D. Liu and R. Weeks, Program level assessment design and the two-lane approach, Teaching@Sydney, The University of Sydney, 2024. https://educational-innovation.sydney.edu.au/teaching%40sydney/program-level-assessment-two-lane/. |
| [29] | M. Elkhodr, AI era must not become excuse to default to low-tech exams, Future Campus, 2026. https://futurecampus.com.au/2026/04/11/ai-era-must-not-become-excuse-to-default-to-low-tech-exams/. |
| [30] | M. Elkhodr and E. Gide, Students are asking for AI guidance, not just policy, Times Higher Education Campus, 2026. https://www.timeshighereducation.com/campus/students-are-asking-ai-guidance-not-just-policy. |