Working with Graphs: Pre-training and Application
Latest findings in pre-training graphs and using them for link recommendation
Introduction
A graph, in short, is a description of items linked by relations, where the items of a graph are called nodes (or vertices) and their relations are called edges (or links). Examples of graphs can include social networks (e.g. Instagram) or knowledge graphs (e.g. Wikipedia).
In Instagram for instance, a graph might be used to represent users as nodes and their relationships(who follows whom) as edges. This graph can then be used to construct a friend recommendation model to suggest users one might want to follow based on the connections they have. In the case of Wikipedia, a knowledge graph can use nodes to represent articles and use edges to represent connections between these articles.
Nowadays, There is a rising trend in the research of using Machine Learning techniques on graphs to solve various kinds of problems, such as friend recommendations on a social network. In this post, we introduce two latest findings about the pre-training of graphs and how to use them for link recommendation.
WAS
In recent years, graph pre-training has become very successful in learning how to represent data in graphs. This method learns from large amounts of data that don’t have labels and uses this knowledge to help with specific tasks that come later. In the case of social networks like Instagram, pre-training enables the graph to learn from vast amounts of untagged user interactions to enhance some specific tasks such as friend recommendation or post recommendation.
However, existing methods only have focused on importance weighing (how to weigh the importance of the selected tasks) and have all overlooked task selecting (how to select an optimal combination of different tasks from a given task pool based on task compatibility). Importance weighting involves evaluating how important different tasks are. In the scenario of the Instagram graph, one can weigh the “friend recommendation” task as more important than the “post recommendation” task to train the graph to become more suitable for friend recommendation.
As for the compatibility issue from task selecting, it means the possible conflicts between different tasks, which cannot be resolved by simply weighing all tasks. For example, imagine two tasks:
- Task A: Identify influential users in the network.
- Task B: Recommend posts with high potential for virality. Task A might benefit from analyzing the entire network structure, identifying users with many connections, and high engagement. However, Task B might focus on specific features of posts themselves (hashtags, visuals) and user interactions with similar posts. While some overlap might exist, these tasks rely on partially independent data within the graph. Training for both simultaneously might not leverage the full potential of each task’s specific data requirements and can deprive the existing methods of the ability to keep gaining performance growth as the task pool grows larger. In this case, we need to select the best sets of tasks we want considering their compatibility with each other.
To solve this issue, a novel framework, Weigh And Select (WAS) for task selecting and importance weighing was proposed. The two collaborative processes are combined in decoupled siamese networks (a neural network uses the same weights while working in tandem on two different input vectors to compute comparable output vectors), where:
- An optimal combination of tasks is selected for each instance based on a sampling distribution calculated based on task compatibility.
- Task weights are then calculated for the selected tasks according to their importance.
Extensive experiments on 16 datasets show that WAS can achieve comparable performance to other leading counterparts for both node-level and graph-level tasks, opening up exciting possibilities for future research.
AIS
Online social networks (OSNs) are becoming an increasingly powerful medium for disseminating useful content. An important scientific problem related to OSN is Influence Maximisation (IM), which aims to select a set of nodes in a social network as the sources of influence spread to maximise the expected number of influenced nodes. For instance, one can imagine selecting a popular Instagram influencer as a source to post about a new product where the goal is to reach as many followers as possible, who are the influenced nodes in this context. One way to boost the influence spread in an OSN is by increasing the connectivity among users using a recommendation system. For example, OSN platforms like Twitter use “people recommendations” to increase connectivity.
In contrast to the previous works of influence maximisation where the network topology remains unchanged, a new algorithm called AIS (Augmenting the Influence of Seeds) was proposed recently that focuses on recommending links that can augment the social influence of a target group of users. This method is inspired by the groundbreaking Reverse Influence Sampling (RIS) method and uses two newly developed innovative techniques, including an efficient estimator (estimate the influence of the node) and an accelerated sampling approach (sampling a set of nodes and trying to maximise the influence on these nodes) tailored for this Influence Maximisation with Augmentation (IMA) problem. These enhancements make the method more effective and quicker than existing strategies, allowing it to spread influence more efficiently across networks.
Extensive experiments on various datasets show that this algorithm outperforms the baselines in terms of approximation ratio and running time, and is the first method that can be applied to large datasets without any compromise of theoretical assurance.
Conclusion
The usage of graphs in Data Science witnessed significant advancement, which can open multiple exciting possibilities for different research directions. In this post, we discuss two latest findings about the pre-training of graphs and how to use them for link recommendation. Moreover, we have also shared some promising scopes that could be used for future exploration of these approaches. Through continuous investigation and refinement, we believe that the use of graphs can open up exciting opportunities for us.
References
- Fan, T., Wu, L., Huang, Y., Lin, H., Tan, C., Gao, Z., & Li, S. Z. (2024). Decoupling Weighing and Selecting for Integrating Multiple Graph Pre-training Tasks (Version 1). arXiv. https://doi.org/10.48550/ARXIV.2403.01400
- Chen, X., Song, Y., & Tang, J. (2024). Link Recommendation to Augment Influence Diffusion with Provable Guarantees (Version 1). arXiv. https://doi.org/10.48550/ARXIV.2402.19189
Catch the latest version of this article over on Medium.com. Hit the button below to join our readers there.