Abstract
This paper presents CompanyKG (version 2), a large-scale heterogeneous graph developed for fine-grained company similarity quantification and relationship prediction, crucial for applications in the investment industry such as market mapping, competitor analysis, and mergers and acquisitions. CompanyKG comprises 1.17 million companies represented as graph nodes, enriched with company description embeddings, and 51.06 million weighted edges denoting 15 distinct inter-company relations. To facilitate a thorough evaluation of methods for company similarity quantification and relationship prediction, we have created four annotated evaluation tasks: similarity prediction, competitor retrieval, similarity ranking, and edge prediction. We offer extensive benchmarking results for 11 reproducible predictive methods, categorized into three groups: node-only, edge-only, and node-edge. To our knowledge, CompanyKG is the first large-scale heterogeneous graph dataset derived from a real-world investment platform, specifically tailored for quantifying inter-company similarity and relationships.
Keywords
Citation
@article{Cao2026CompanyKG,
title={CompanyKG: A Large-Scale Heterogeneous Graph for Company Similarity Quantification},
author={Lele Cao and Vilhelm von Ehrenheim and Mark Granroth-Wilding and Richard Anselmo Stahl and Andrew McCornack and Armin Catovic and Dhiana Deva Cavalcanti Rocha},
year={2026},
url={https://cspaper.org/openprint/20260401.0001v1},
journal={OpenPrint:20260401.0001v1}
}Version History
| Version | Archived Date | Submitter |
|---|---|---|
v1Current | Apr 1, 2026 | Vilhelm von Ehrenheim |
