Paper Key : IRJ************197
Author: Apurva Kumar,Shilpa Priyadarshni
Date Published: 16 Oct 2023
Abstract
Entity resolution is the process of linking data references that refer to the same real-world entity. It is a problem that occurs in many large-scale processes and applications. The ambiguity in references comes from various networks such as social networks, biological networks, citation graphs, and many others. Ambiguity in references not only leads to data redundancy but also inaccuracies in knowledge representation, extraction, and query processing. Entity resolution is the solution to this problem. There have been many approaches such as pair-wise similarity over attributes of references, a parallel approach for morphing the graph data on to a cluster of nodes (P-Swoosh) 2, and relational clustering that makes use of relational information in addition to the attribute similarity. In this paper, we make use of a relational clustering algorithm to resolve author name ambiguities in a subset of a real-world dataset: a US patent network consisting of more than 650,000 author references. We evaluated this algorithm using both attribute and neighborhood similarity and we achieved significantly high precision (92%) if just rely on attribute similarity the precision is much lower (67%). Thus, in large graphs to resolve references to real-world entities, using the neighborhood similarity in addition to attribute similarity leads to higher precision resolution.
DOI LINK : 10.56726/IRJMETS45298 https://www.doi.org/10.56726/IRJMETS45298