Document network embedding aims at learning low dimensional representations of documents that are connected by links, taking their content and the network structure into account. These representations can then serve in downstream tasks, such as document clustering or link prediction.
First, we’ve designed a matrix factorization-based method that learns document and word embeddings jointly, GVNR-t (presented at WWW 2019). Second, we’ve developped a neural, attention-based, approach that also learns document and word embeddings jointly, while learning topics that help reducing the computational cost and help improving interpretability, IDNE (presented at ECIR 2020). We’ve also studied how to efficiently project documents in a pre-trained word embedding space (work presented at ECIR 2020), and how to embed documents when csome ontent and/or links are partially missing (work presented at SAC 2020).