CompanyKG: A Large-Scale Heterogeneous Graph for Company Similarity Quantification

Abstract

This paper presents CompanyKG (version 2), a large-scale heterogeneous graph developed for fine-grained company similarity quantification and relationship prediction, crucial for applications in the investment industry such as market mapping, competitor analysis, and mergers and acquisitions. CompanyKG comprises 1.17 million companies represented as graph nodes, enriched with company description embeddings, and 51.06 million weighted edges denoting 15 distinct inter-company relations. To facilitate a thorough evaluation of methods for company similarity quantification and relationship prediction, we have created four annotated evaluation tasks: similarity prediction, competitor retrieval, similarity ranking, and edge prediction. We offer extensive benchmarking results for 11 reproducible predictive methods, categorized into three groups: node-only, edge-only, and node-edge. To our knowledge, CompanyKG is the first large-scale heterogeneous graph dataset derived from a real-world investment platform, specifically tailored for quantifying inter-company similarity and relationships.

Citation

@article{Cao2026CompanyKG,
  title={CompanyKG: A Large-Scale Heterogeneous Graph for Company Similarity Quantification},
  author={Lele Cao and Vilhelm von Ehrenheim and Mark Granroth-Wilding and Richard Anselmo Stahl and Andrew McCornack and Armin Catovic and Dhiana Deva Cavalcanti Rocha},
  year={2026},
  url={https://cspaper.org/openprint/20260401.0001v1},
  journal={OpenPrint:20260401.0001v1}
}

Version	Archived Date	Submitter
v1Current	Apr 1, 2026	Vilhelm von Ehrenheim

Version

Archived Date

Submitter

v1Current

Apr 1, 2026

Vilhelm von Ehrenheim

CompanyKG: A Large-Scale Heterogeneous Graph for Company Similarity Quantification

Abstract

Keywords

Citation

Version History