wikinet

wikinet is a Python package. With wikinet, you can

  • read Wikipedia articles from zipped XML dumps,

  • read through Wikipedia articles as a gensim corpus,

  • create networkx networks from a list of article names, and

  • run persistent homology on network.

See Usage for details.

Data

Wikipedia XML dumps are available at https://dumps.wikimedia.org/enwiki. Only two files are required for reproduction: (1) enwiki-DATE-pages-articles-multistream.xml.bz2 and (2) enwiki-DATE-pages-articles-multistream-index.txt.bz2, where DATE is the date of the dump. Both files are multistreamed versions of the zipped files, which allow the user to access an article without unpacking the whole file. In the article, we used the archived zipped file from August 1, 2019, which is available in Dropbox.

Citation

To cite wikinet please use the following publication: https://arxiv.org/abs/2010.08381.

Contents

Index