1.
Introduction
With development over time and the advancement of technology, social platforms and search engines not only provide convenience, but also generate a large amount of information [1]. The amount of data that users face every day is exploding. The explosive growth of data on the one hand greatly enriches the user's life; on the other hand, the overly redundant data causes great interference in the user's behavioral decisions [2]. This phenomenon, commonly termed information overload, has prompted researchers to develop recommender systems as a countermeasure. By collecting and analyzing users' behavioral data—including browsing histories, click patterns, and evaluation records—these systems identify individual preferences to deliver targeted recommendations [3]. This methodology has become both a widely studied solution and an extensively implemented strategy in digital environments.
The core of recommendation algorithms is to actively provide personalized services to users based on the interaction history between users and resources [4]. Traditional recommendation algorithms include content-based recommendation algorithms and collaborative filtering recommendation algorithms [5]. With the development of artificial intelligence (AI), building recommendation algorithms based on deep learning has become the mainstream approach.
Currently, deep learning (DL)-based recommendation models show superior performance compared to most linear-based collaborative filtering (CF) methods, which mainly utilize deep neural networks (DNNs) to capture higher-order features to understand the complex relationships between users and resources. For example, Sinha and Dhanalakshmi [6] proposed neural network matrix decomposition by combining neural networks and matrix decomposition, He et al. [7] proposed neural collaborative filtering by combining neural networks and collaborative filtering, and Fu et al. [8] proposed to capture implicit user-item interactions with a feed-forward neural network, which is more accurate in acquiring co-occurring relationships between users and resources as compared to traditional processing methods. Pan et al. [9] added social relationships as auxiliary information to the idea of collaborative deep learning, and proposed to learn social representations via a sparse superposition denoising autoencoder to solve the problem of data sparsity in social networks. Feng et al. [10] combined rating-oriented probabilistic matrix factorization and a pairwise ranking-oriented Bayesian personalized ranking together to address cold-start scenarios. Bai et al. [11] proposed cold-start KT to address this problem, which guides learning from short sequences and ensures accurate predictions for longer sequences. It also introduces cone attention to better capture complex hierarchical relationships between knowledge components in cold-start scenarios. Alfarhood and Cheng [12] proposed using matrix decomposition on the information of the ratings matrices learned by multilayer perceptron machines with the information of the resource ratings learned by convolutional neural networks. Saifudin et al. [13] utilized a mixture of feedback behind users, items, and tags to recommend tags. These recommendation algorithms utilize deep learning techniques centered on learning feature vector representations of users and resources. The training phase usually uses pairwise or pointwise loss functions to optimize the network parameters, while the recommendation phase is based on the learned user and resource vectors for matching and recommendation.
Although deep learning-based recommender systems have achieved some results, there are still some problems. Some models simply equate user behavior with explicit preferences when dealing with implicit feedback data, ignoring the noise in the data. For example, in a student course recommendation scenario, a student may choose a course for credit rather than personal interest. Existing models fail to distinguish this effectively, leading to the misjudging of users' true preferences and affecting recommendation accuracy. In matrix decomposition techniques, some models rely on basic methods, for which it is difficult to capture the complex nonlinear relationships between users and resources, limiting the in-depth understanding of interactions. In addition, many models perform poorly on the cold-start problem and cannot fully utilize the limited user or course information for accurate recommendation. Implicit feedback data has more noise, which may mislead model learning if used directly for training. Therefore, how to reduce the noise and accurately extract the real user preferences has become an important challenge for recommender systems. Meanwhile, it is difficult for traditional algorithms to comprehensively capture complex user-resource interaction patterns, and how to optimize the model structure to improve its generalization ability remains a difficult problem in recommender systems.
This study aims to address these issues by focusing on how to effectively utilize implicit feedback data, optimize the model structure to better capture the complex interactions between users and resources, and improve the model's ability to generalize across different datasets and cold-start scenarios. Therefore, a personalized recommendation algorithm model based on implicit feedback, neural collaborative filtering with multiple attention (NCF-MAH), is proposed. First, the inner products of potential features of users and resources are taken using matrix factorization. At the embedding layer, user and resource IDs are mapped into a high-dimensional embedding vector space, and the embedding vectors are mapped into query vectors, key vectors, and value vectors. Then, weighted sum vectors are computed by calculating the corresponding scores for each attention header. Finally, the output vectors are combined with the results of the multilayer perceptron processing the implicit vectors in terms of weight ratios to produce a prediction of user preferences. The main contributions are as follows:
1) We present a novel neural network architecture for user and resource modeling. Departing from prior single architectures relying solely on matrix decomposition or MLP, it innovatively integrates generalized matrix facorization (GMF) and MLP. User-item data are one-hot encoded into sparse vectors and embedded in a low-dimensional space. GMF and MLP layers then process linear and nonlinear features respectively, with results fused for prediction. This design comprehensively captures complex user-resource interactions and enhances the model's feature-handling ability.
2) We introduce a multi-attention mechanism. Differing from traditional applications, this study linearly transforms the user, resource embedding vectors, and rating information via distinct weight matrices. Each attention head can focus on user-resource-rating interactions from diverse perspectives, capturing data patterns more precisely and thus improving recommendation accuracy and personalization.
3) We employ the binary cross-entropy loss function. Unlike previous simple uses, this study optimizes it by integrating the model's overall architecture and implicit feedback data characteristics. During training, it effectively measures the gap between predicted and real values, updating model parameters iteratively through gradient descent. This enables the model to better adapt to data and structure, enhancing prediction accuracy and generalization. Experiments show the model outperforms traditional methods significantly in metrics like hit rate (HR) and normalized discounted cumulative gain (NDCG).
2.
Preliminary
2.1. Explicit vs. implicit feedback
Early recommendation models primarily relied on explicit feedback, such as users' historical ratings of resources, to predict their ratings for target items. This approach was based on the idea that by estimating the ratings users might give to target resources, top-K recommendations could be made by ranking items according to these predicted scores [14]. However, the results were often unsatisfactory, as explicit feedback fails to account for negative feedback. Users may avoid rating resources they dislike, and ignoring this absence of feedback can lead to sub-optimal model performance [15]. In contrast, implicit feedback, while abundant and easy to collect, is noisy. For example, a user purchasing an item does not necessarily indicate that they like it; they may have bought it as a gift or later realized they do not like the product. Implicit feedback also introduces challenges due to the presence of negative samples, which are difficult to identify and account for. Most existing studies treat implicit feedback as mere additional input features, failing to fully explore its intrinsic value.
To overcome these limitations, this study introduces a multi-head attention mechanism designed to better process implicit feedback data. The multi-head attention mechanism allows the model to analyze implicit feedback from multiple perspectives by processing it in parallel across different subspaces. Each attention head focuses on distinct feature dimensions and interaction patterns, enabling the model to more accurately capture users' preferences and behavioral patterns. Moreover, the model distinguishes between two types of data by encoding missing values or browsing behaviors in the user-resource interaction matrix as binary signals. This approach enhances the model's ability to capture users' behaviors and intentions from various angles, improving recommendation accuracy and revealing hidden characteristics within implicit feedback data, as illustrated in the table below.
2.2. Multilayer perceptron
The multilayer perceptron was developed from the perceptron, and its main feature is that it has multiple neuron layers that can process nonlinear data [16]. The basic model structure includes an input layer, a hidden layer, and an output layer, where the number of hidden layers can be more or less, the input layer to the hidden layer can be regarded as a fully connected layer, and the hidden layer to the output layer can be regarded as a classifier. Ordinary recommendation algorithms apply vector multiplication for user features and resource features to predict ratings, and each user feature is multiplied with each resource feature one by one, which consumes time and occupies space [17], as shown in the following formula:
where W is the weight matrix, b is the bias vector, and f is the activation function.
Therefore, the model in this paper utilizes the advantages of multilayer perceptrons for nonlinear data processing, transforms the original vector multiplication through the multilayer perceptron, inputs the user features and resource features obtained from the model into the multilayer perceptron, and the final output value is the predicted rating value, as shown in the following figure.
2.3. Generalized matrix decomposition
The basic idea of matrix decomposition recommendation algorithms [18] is to decompose the user-resource rating matrix R into two low-dimensional user feature matrices U and resource feature matrices V as shown in the following Eq (2). In the process of rating prediction, the user and the resource are usually represented as a two-dimensional matrix form, i.e., the user-resource rating matrix. Koren [19], who introduced implicit feedback into recommendation systems through the SVD++ model, addressing cold-start issues caused by sparse explicit ratings. Subsequently, the temporal SVD++ algorithm extended this framework by incorporating time-sensitive mechanisms, dynamically adjusting recommendation weights through two key operations: decaying historical user behavior influence while amplifying recent neighbors' implicit feedback patterns. Different from traditional matrix decomposition models, this method deeply integrates generalized matrix factorization with a multilayer perceptron. Traditional matrix decomposition models mainly focus on mining potential features of users and items, with which it is difficult to deal with complex nonlinear interactions. In our model, on the other hand, the GMF layer is responsible for capturing linear and low-order nonlinear relationships, and the MLP layer handles high-order nonlinear relationships, which complement each other and enable the model to portray the complex user-resource interactions more comprehensively.
Here n denotes resources quantity and d the latent feature dimensions for users/resources. Matrices U (user preferences) and V (resource attributes) model observed rating while predicting unratedinteractions. To optimize formula alignment with real-world ratingdata, the matrix factorization algorithm employs linear regressionprinciples, constructing the following objective function:
In this Eq (3), m represents the number of users, Ii,j is an indicator parameter with a value of 1 if user i has ever rated resource j and 0 otherwise, is the actual rating of user i on resource j, is the predicted rating, and λ is a regularization parameter to prevent overfitting.
2.4. Multi-attention mechanisms
The attention mechanism [20] is one of the major breakthroughs in the field of deep learning, which has been widely used in computer vision, natural language processing, and other fields. As the process of the weighted transformation of features, the multi-head self-attention mechanism [21] is an attention mechanism in which each head is calculated in the same way, and only the parameters are different, so as to be able to represent features from multiple subspaces, and compared with the ordinary self-attention mechanism, it can obtain features in multiple dimensions. The user and resource embedding vectors are input to the attention module as query, key, and value, respectively. Since each user can be associated with multiple resources, the batch size can be larger than 1. The attention module computes the user's attention score for the resource, and the output represents an aggregated representation of the user's weighted resource embedding vector.
3.
Personalized recommendation algorithm model based on implicit feedback
To address the above issues, we propose a personalized recommendation algorithm model based on implicit feedback, named NCF-MAH, as shown in Figure 2. We will now detail its components.
3.1. Input and preprocessing layer
In the NCF-MAH model, the first type of data we need to process is the interaction information between users and resources. These data are typically presented in matrix form, where rows represent users, columns represent resources, and the element values in the matrix indicate the strength of interaction between users and resources, such as rating scores.
To effectively utilize these data in constructing a recommendation model, we convert the input User-Item data into sparse vectors through one-hot encoding, as defined below:
Specifically, for M users and N resource items, we can transform each user and item into 1 × M and 1 × N vectors, respectively. For example, the vector for the i-th user is ([0, 1, 0, ..., 0]) (where the i-th element is 1 and the rest are 0), indicating that this user has interacted with the i-th resource. Similarly, the vector for the j-th item is ([0, 0, ..., 0, 1]) (where the j-th element is 1 and the rest are 0), indicating that this item has interacted with the j-th user. Afterward, we embed the user and item vectors into a lower-dimensional space, multiplying the input vector N with the embedding matrix P to obtain the embedded vector of this vector.
3.2. Enhanced attention layer
In this paper, we improve the performance of NCF models by introducing a multi-head attention mechanism. The multi-head attention mechanism is an innovation in the Transformer architecture that allows for more complex linear transformations of the output vectors of the previous layer to capture the interactions between users, resources, and ratings in more detail. The multi-head attention layer first performs independent linear transformations on the user and item embedding vectors, the user embedding matrix is U of size m×du, and the item embedding martix is V of size n×dv, where m is the number of users, n is the number of items, du is the dimension of user embeddings, and dv is the dimension of item embeddings. For each user u and item i, the initial approach to acquire their embedding vectors u_vec and i_vec is through direct lookup from pre-trained embeddding tables. Then, the multi-head attention layer first performs independent linear transformations on the user embedding vector u_vec, the item embedding vector i_vec, and their rating information. These transformations are realized by different weight matrices, denoted as WQ, WK, and WV. To generate the query vector Qu for user u, we multiply the user embedding vectors u by WQ, i.e., Qu=WQ×u. Similarly, for the key vector Ki of item i, we have Ki=WK×i, and for the value vector Vi, Vi=WV×i, the item matrix V contributes to the computation of the key, query, and value matrices by providing the item embeddings. Each column of the item matrix represents an item's embedding in the low-dimensional space. When computing the key, query, and value vectors for a particular user-item pair, the corresponding item embedding from the item matrix is retrieved and linearly transformed using the weight matrices WK and WV along with the user embedding being transformed using WQ. Next, we compute the attention weights of user u and item i by computing the dot product of Qu and Ki, and then apply the scaling factor and softmax function. Subsequently, we merge the vectors of weighted values of all resources to obtain a weighted representation of user u. Finally, this weighted representation is then linearly transformed once more to generate the final output vector to be used as input for the next layer, as follows:
where dk is the dimension of the key vectors and WO is the weight matrix of the final linear transformation, and the specific principle is shown in Figure 3.
3.3. Output and prediction layer
To enhance the non-linear fitting ability of the attention mechanism, in the GMF model, to capture the interaction between users and items in a distinct way, we focus on the embedding multiplication of the user-resource vectors pGU and qGI to obtain the embedding matrices PG and QG. This element-wise multiplication operation is a fundamental step in GMF, aiming to highlight the specific relationships between user characteristics and item features encoded in the embeddings. Then, we calculate the scores through a linear layer and the sigmoid activation function, resulting in the vector ϕGMFui. In the MLP model, we concatenate the user-resource vectors pMu and qMi to obtain PM and QM matrices, then calculate the scores through fully connected layers and ReLU activation functions, resulting in the vector ϕMLPui. We then connect these two vectors and pass them to the neural layer that maps the vectors to a one-dimensional space to get the final predicted score, as follows:
where α is a hyper-parameter that controls the contribution ratio of the GMF and MLP parts, and h is the weight vector of the final neural layer. This combination allows the model to leverage the advantages of both GMF's simple interaction capture and MLP's complex pattern learning ability.
The model is a multi-layer fully connected neural network. Hidden layers use the ReLU activation function to boost non-linear expressiveness and learn complex feature relationships. The output layer applies the sigmoid function to map results to probability scores for predictions. The binary cross-entropy loss function (BCE loss) measures the gap between predicted and actual values. A gradient descent algorithm iteratively updates model parameters to enhance prediction accuracy and generalization. This iterative process enables the model to adapt to various datasets and make more accurate predictions.
3.4. Improved model pseudo-code
4.
Experiments
4.1. Dataset
To critically evaluate the proposed methodology's effectiveness, this paper experiments on two real-world datasets: MOOC [22] and EdX (https://www.kaggle.com/datasets/edx/course-study). The MOOC Cube dataset from X School has 706 MOOC courses, 38,181 videos, 114,563 interactions, and 199,199 users. The EdX dataset is based on 290 edX online courses from Harvard and MIT, with 250,000 certifications, 4.5 million participants, and 28 million hours of data. Table 2 shows detailed dataset statistics. To fairly assess comparison methods, we adopt the same data-processing as in [23]. Each instance in the training or test set is a sequence of historical lessons paired with a target lesson. In training, the last course in a sequence is the target, and the others are historical. Each positive example pairs with 1000 randomly sampled negative ones. In testing, each test-set history course is the target, and the corresponding training-set course of the same user is historical. As in [24], each positive instance pairs with 100 randomly sampled negative ones to form the test data.
4.2. Experimental setup
This study completed comparative experiments of the improved NCF algorithm and baseline algorithms in an environment based on Python 3.8, PyTorch 1.11.0, CUDA 11.3, and on an RTX 3090. The specific parameter settings for both datasets are shown in the following figure, where "-" indicates the same parameters as the previous column.
4.3. Evaluation metrics
The following metrics are used in this paper to evaluate the performance of all models, which are widely used in other related work.
The hit rate (HR) is mainly used to measure whether the recommended list contains resources that users are really interested in. Specifically, the hit rate of the first K resources is a recall-based metric, and HR@K, which represents the percentage of resources successfully recommended to users, is defined as follows:
where GT refers to the set of basic facts for all users in the test set, which the number of resources in the top-K recommendation list for the u-th user belonging to the test set, and | | denotes the size of the set.
Normalized dicounted cumulative gain (NDCG) evaluates the ranking performance by considering the position of the correct resource. Specifically, NDCG@K for the top K resources is an accuracy-based metric that considers the predicted positions of different user recommendation lists. The specific definitions are as follows:
DCGu@K denotes the ideal discounted cumulative revenue realized by the best top-k recommendation list for the u-th user, and reliu is the hierarchical correlation between the i-th recommendation result and the u-th user.
Precision, a metric used to evaluate the performance of a recommender system, focuses on the number of correct recommendations in the recommendation results. Specifically, for the first K recommendation results, Precision@K is the ratio of the number of correctly recommended items to the total number of recommendations, calculated as follows:
where U is the total number of users, Relu denotes the set of relevant resources for the user u, and Recu (K) denotes the set of the top-k items recommended for the user.
5.
Results and analysis
5.1. Comparison experiment
To verify the effectiveness of the proposed model, the NCF-MAH model was compared with other baseline models, including the classic matrix factorization model (GMF), supervised learning model (MLP), and neural network-based collaborative filtering model (NeuMF). A brief overview of these models is as follows:
MLP [25]: It uses MLP on a pair of user and course embeddings to generate recommendation probabilities.
GMF [26]: It directly models the linear relationship between users and resources through the dot product of their low-dimensional embedding vectors, achieving personalized recommendations.
NeuMF [7]: It utilizes neural networks to merge user and resource embedding representations, learning their non-linear interactions for user recommendations.
FM [27]: The interactions between features are modeled by mapping each feature to a low-dimensional vector space and computing the inner product between these vectors, thus effectively capturing the nonlinear relationships implicit in the data.
Wide&Deep [28]: Combining the memory capabilities of generalized linear models and the learning capabilities of deep neural networks, it can simultaneously capture known feature interactions and discover new complex patterns to provide more accurate personalized recommendations.
Comparative experiments between the MLP, GMF, NeuMF models, and the NCF-MAH model were conducted using the MOOC and EdX datasets. The top-k values were set at 10 and 20, separately, and the model experimental curves are shown in the following figures, with specific values presented in the tables below.
From the experimental results, the NCF-MAH model shows higher prediction accuracy compared with MLP, GMF, and traditional NCF models on two different sized datasets. As can be seen from Figures 4–7, on the MOOC dataset containing more than 600,000 pieces of data, the improved model improves the HR@10, HR@20, NDCG@10, and NDCG@20 metrics by 13%, 9.8%, 10.3%, and 10.2%, respectively. In contrast, as can be seen in Figures 8–11 that the small EdX dataset with about 300 data entries shows greater improvements of 15.7%, 12.8%, 7.5%, and 10.9%, respectively.
This is mainly due to the addition of the multi-head attention mechanism in the NCF model, which allows the model to simultaneously focus on multiple different information dimensions, with each dimension handled by a separate attention head. This way, different heads can learn various aspects of the user-resource interactions, enriching the model's understanding of the data. For instance, one head might focus on the user's interest in learning course categories, while another head could concentrate on the duration of the user's engagement with learning courses. Through this approach, the model gains a more comprehensive understanding of user behavior and preferences.
During this process, each head independently calculates the attention scores between user embedding vectors and item embedding vectors, and then these scores are summed up. This total score is weighted and summed with the vectors calculated using the GMF and MLP methods. This structure enables the model to better handle the complex relationships between different users and resources while maintaining personalized recommendations.
Additionally, we use the BCE loss function as the model's loss function to optimize its performance. The BCE loss effectively measures the difference between the model's predictions and the true values, allowing the model to update parameters more efficiently during training, thus improving prediction accuracy.
These results clearly indicate that the proposed NCF-MAH model demonstrates superior performance compared to traditional baseline models when handling datasets of different scales.
From Tables 4 and 5, it is more intuitive to see that under the same experimental parameters for both datasets, the NCF-MHA model exhibits superior performance metrics. This is due to the consideration of users' implicit interaction data, and through the multi-head attention mechanism, combined with the use of BCE as the model's loss function, it better captures user behavior and improves recommendation accuracy and performance.
5.2. Ablation experiments
To test the contribution of each component in the NCF-MAH model to the recommended performance, it was evaluated by the following ablation experiments on both datasets.
In order to deeply investigate the effects of different layer architectures of the multilayer perceptron on the performance of the neural collaborative filtering and multi-head attention mechanism fusion model, we carried out detailed experiments on two datasets. We set up three different MLP layer architectures, namely (128, 64, 32), (256,128, 64, 32), and (64, 32, 16), and the corresponding models are denoted as NCF-MAH_128_64_32, NCF-MAH_256_128_64_32, and NCF-MAH_64_32_16, and the performance of these models is evaluated by the hit rate and normalized discounted cumulative gain in the Top-10 and Top-20 recommended scenarios to evaluate the model performance, and the results are shown in Tables 6 and 7.
The experimental results clearly show that the differences in the MLP layer architectures have a significant impact on the performance of the NCF-MAH model on both datasets. Among them, the NCF-MAH_64_32_16 model with the (64, 32, 16) architecture shows obvious advantages, which fully highlights the superiority of this architecture.
On the MOOC dataset, the NCF-MAH_64_32_16 model performs well with significant performance improvement. Compared with the NCF-MAH_128_64_32 and NCF-MAH_256_128_64_32 models, the NCF-MAH_128_64_32 model achieves considerable improvement in the recommendation hit rate measured by the HR@10 and HR@20 metrics, as well as the quality of the recommendations embodied by the NDCG@10 and NDCG@20 metrics. This indicates that the architecture can capture the interaction information between users and items more accurately and effectively improve the accuracy and effectiveness of recommendations. Similarly, the NCF-MAH_64_32_16 model also performs well on the EdX dataset. Compared with the NCF-MAH_128_64_32 and NCF-MAH_256_128_64_32 models, there is a significant improvement in the hit rate and recommendation quality indexes, which further proves the adaptability and effectiveness of the (64, 32, 16) architecture on different datasets.
Combining the experimental results of the two datasets, the (64, 32, 16) architecture excels in balancing model complexity and performance, and is able to efficiently capture the complex interaction information between users and items, thus significantly improving the accuracy and quality of recommendations.
In addition, three variant models of NCF-MAH are described, namely PureMLP, PureMF, and NCF, and these three variant models are compared with the NCF-MAH model. The specific results are shown in Tables 8 and 9.
1) Comparing the performance of the PureMLP algorithm and the NCF-MAH algorithm on the Precision@10 and Precision@20 recommendation metrics, the algorithms that use the matrix decomposition and multi-head attention mechanism in personalized recommendation improve 28% and 15% on the EdX dataset compared to the PureMLP algorithm that uses only multi-layer perceptron, and an 18% and 11% improvement on the MOOC dataset, respectively. Thus, the personalized recommendation algorithm proposed in this paper represents the user-resource interaction in a way that allows for a more comprehensive learning of the user's behavior.
2) Comparing the performance of the PureMF algorithm and the NCF-MAH algorithm on Precision@10 and Precision@20 recommendation metrics, the algorithms that use a multilayer perceptron and multi-head attention mechanism in personalized recommendation improve 28% and 15% on the EdX dataset compared to the PureMF algorithm that only uses matrix decomposition, while on the MOOC dataset by 30% and 70%, respectively. Therefore, the personalized recommendation algorithm proposed in this paper can better improve the recommendation accuracy by capturing higher-order features between users and resources.
3) Comparing the performance of the NCF algorithm and the NCF-MAH algorithm on Precision@10 and Precision@20 recommendation metrics, the addition of a multi-attention mechanism to personalized recommendations improves the performance of the NCF algorithm over the traditional NCF recommendation algorithm by 19% and 7% on the EdX dataset, and by 20% and 22% on the MOOC dataset, respectively. Therefore, the personalized recommendation algorithm proposed in this paper introduces the multi-head attention mechanism, which significantly enhances the ability to capture the complex relationship between users and resources, and thus achieves significant improvement in recommendation accuracy.
It can also be seen from Tables 6 and 7 that the NCF-MAH algorithm, which also considers generalized matrix decomposition, a multilayer perceptron, and a multi-head attention mechanism, has a greater improvement in the Precision@10 and Precision@20 metrics than the three variants of the algorithm. From the principles of the ablation experiments, it is known that the experimental results are better than the model that only considers the explicit interaction between users and resources by considering the addition of attentional mechanisms and the capture of complex relationships with implicit feedback information about user resources. Therefore, the combined use of training can better recommend resources of interest to the user.
6.
Conclusions
In this paper, a personalized recommendation algorithm model of NCF-MAH incorporating implicit feedback is proposed, which combines the ideas of matrix decomposition and multilayer perceptrons, and introduces the multi-head attention mechanism to enhance the model's learning ability and expressive ability. The binary cross-entropy loss function is used for training by optimally adjusting the parameters of the connection layer. The model maps the high-dimensional feature vectors of users and items to the low-dimensional embedding space by deeply analyzing the implicit interaction data between users and resources and using matrix decomposition technology. Combined with the multi-attention mechanism, the model can effectively capture complex feature relationships and ensure the effectiveness and stability of training by optimizing the negative sample selection strategy. Ultimately, the model integrates matrix decomposition and multilayer perceptron methods to improve the prediction accuracy, thus achieving a more accurate personalized recommendation effect. The core advantage of the model is that it can learn the implicit cross-features of users, capture the interaction between low-order and high-order features, further extract user behavioral preferences, improve the prediction performance and the generalization ability of the model, and effectively alleviate the data sparsity problem. Comparison experiments on the MOOC dataset and the EdX dataset show that the model outperforms the comparison model in both HR and NDCG evaluation metrics and achieves good recommendation results, proving the effectiveness of the proposed model.
In the future, we can try to further optimize the overall structure of the model, add a convolutional neural network for contextual feature extraction and learning, and utilize multiple types of data information, such as audio, image, etc., to perform multimodal feature fusion, so as to improve the recommendation effect of its model.
Use of AI tools declaration
The authors declare they have not used Artificial Intelligence (AI) tools in the creation of this article.
Acknowledgments
The work was supported by the National Natural Science Foundation of China: 72174079, Lianyungang sixth "521" project: LYG06521202351, Lianyungang Science and Technology Program: CG2325. The material in this paper was not presented at any conference.
Conflict of interest
The authors declare there are no conflicts of interest.