Introduction

A graph, in short, is a description of items linked by relations, where the items of a graph are called nodes (or vertices) and their relations are called edges (or links). Examples of graphs can include social networks (e.g. Instagram) or knowledge graphs (e.g. Wikipedia). Nowadays, there is a rising trend in the research of using Machine Learning techniques on graphs to solve various kinds of problems.

In this post, we introduce two latest findings where the technique of Machine Learning is applied to Graphs.

Uncertainty-aware Few-Shot Knowledge Graph Completion (UFKGC)

Knowledge Graphs represent information pieces consisting of entities and their relations, usually organised in the form of triples — head entityrelation, and tail entity. For example, a Knowledge Graph might show that “Chicago Bulls” (head entity) is “located” (relationship) in “America” (tail entity).

In real-world scenarios, relations in the already-existing Knowledge Graphs generally suffer from the so-called long-tail problem, meaning that a large number of relations only occur occasionally in a Knowledge Graph (limited number of sets of head and tail entities for these relations), which makes the traditional Knowledge Graph completion methods impractical with these insufficient training samples.

Most Knowledge Graphs contain many more rare relations than common relations.

  • A common relation, like “located in,” might connect countless cities and countries.
  • Rare relations, on the other hand, might describe very specific things, like “known for the invention of the stapler” (person, stapler). These relations only connect a handful of entities.
An example of the few-shot Knowledge Graph completion task. (a) The task is to predict tail entities on Knowledge Graph. (b) The task utilises only K triples for the specific relation to predicting others. (Qian Li et al., 2018) Article Link: https://arxiv.org/abs/2403.04521

In this case, the few-shot Knowledge Graph completion (FKGC) task, which predicts the tail entity from a given head entity and a specific relation as well as a few triples having the same relation, has gained wide attention. FKGC aims to predict the missing tail entity in a Knowledge Graph, but with a different approach: it only uses a few examples (called “few-shot”) of the specific relation it’s dealing with. These few examples are provided as a set of triples (head, relation, tail) that share the same relation you are trying to predict for.

However, uncertainty usually arises when the model is provided with a limited number of samples. To further enhance the performance of FKGC, a framework called Uncertainty-aware Few-Shot Knowledge Graph Completion (UFKGC) was proposed recently to model uncertainty for a better understanding of the limited data and expand the data under the same distribution.

The framework of UFKGC. (Qian Li et al., 2018) Article Link: https://arxiv.org/abs/2403.04521

The framework of UFKGC contains three different modules, including Uncertainty Representation, UR-GNN(uncertainty-aware relational graph neural network), and Uncertainty Optimisation.

Uncertainty Representation — Estimate the uncertainty of entities/relations in the form of Gaussian Distribution.

  • Consider a knowledge graph statement like “Albert Einstein wrote the Theory of Relativity.” There might be some uncertainty about other works Einstein wrote (less certain) compared to the well-established fact that he wrote the Theory of Relativity (more certain).
  • This module aims to quantify this uncertainty by representing it as a Gaussian Distribution, a mathematical way to describe the probability of different values occurring.

UR-GNN — Get better integration using the learned Gaussian Distribution

  • This module builds upon the concept of a Graph Neural Network (GNN), a type of AI model adept at learning from interconnected data like a knowledge graph.
  • The “uncertainty-aware” part comes in here. The UR-GNN incorporates the uncertainty information (Gaussian Distributions) estimated in the previous module.
  • By considering this uncertainty, the UR-GNN can potentially achieve a better understanding of the relationships between entities in the Knowledge Graph, even with limited data.

Uncertainty Optimisation — Using multiple sampling from the learned distributions to evaluate the completion prediction

  • This module leverages the uncertainty information to improve the final prediction of the missing entity.
  • It utilises a technique called “multiple sampling.” The UR-GNN model, informed by the uncertainty, proposes multiple possible tail entities (the missing part) for a given head entity and relation, considering the probability distribution learned earlier.
  • By evaluating these multiple proposals, the Uncertainty Optimisation module aims to identify the most likely tail entity to complete the knowledge graph statement accurately.

Experimental results show that UFKGC achieves state of the art results in terms of accuracy on two FKGC benchmark datasets, opening exciting possibilities for future research.

Bilevel Optimisation

Graph machine learning covers a wide range of modelling tasks involving graph-structured data, where cross-instance/node dependencies are reflected by edges. Cross-instance or node dependencies refer to the fact that information about one node can be crucial for understanding another node, especially if they’re connected by an edge. For instance, in a social network, knowing your friend’s interests (one node) might influence what content you see (another node). Among all the graph machine learning techniques, Graph Neural Networks (GNNs) have emerged as a promising class of predictive models for handling tasks such as node classification (assign labels to nodes in the graph) or link prediction (predict the existence of a link between two entities).

However, the graph machine learning models in the past have primarily been motivated from diverse perspectives, without necessarily a single, transparent lens with which to evaluate their commonalities. In this case, the idea of operating from a unifying vantage point afforded by bilevel optimisation was recently proposed.

Here, bilevel optimisation refers to an optimisation problem characterised by two interconnected levels, whereby the optimal solution of a lower-level objective serves as input features to an upper-level decision. For example, we can consider the following scenario:

Upper-level Problem — Teacher

  • A teacher wants to design a quiz that maximises the average score of the students in the class. The teacher can choose the difficulty level and the number of questions for the quiz.

Lower level Problem — Student

  • Each student aims to maximise their individual score on the quiz by choosing how much time to allocate for studying and which topics to focus on.

In this scenario, the study efforts and topic choices made by individual students can influence their quiz scores, thus impacting the teacher’s objective of maximising the class’s average score as input features.

Based on the same idea, a broadly applicable bilevel optimisation framework called BloomGML was proposed, which is capable of handling generic GNN tasks besides node classification. Experimental results show that BloomGML achieves state-of-the-art performance in terms of accuracy when given tasks such as node classification, which can provide avenues for future research.

Conclusion

The usage of machine learning has witnessed significant advancement, particularly when combined with various kinds of graphs. This can open multiple exciting possibilities for different research directions. In this post, we discussed two latest findings for the use of machine learning when combining different graphs. Moreover, we have also shared some promising scopes that could be used for future exploration of these approaches. Through continuous investigation and refinement, we believe that the use of graphs can open up exciting opportunities for us along with different machine learning techniques.

References

  • Li, Q., Guo, S., Chen, Y., Ji, C., Sheng, J., & Li, J. (2024). Uncertainty-Aware Relational Graph Neural Network for Few-Shot Knowledge Graph Completion (Version 2). arXiv. https://doi.org/10.48550/ARXIV.2403.04521
  • Zheng, A. Y., He, T., Qiu, Y., Wang, M., & Wipf, D. (2024). BloomGML: Graph Machine Learning through the Lens of Bilevel Optimization (Version 1). arXiv. https://doi.org/10.48550/ARXIV.2403.04763

Catch the latest version of this article over on Medium.com. Hit the button below to join our readers there.

Learn more on Medium