wikinet
wikinet
is a Python package. With wikinet
, you can
read Wikipedia articles from zipped XML dumps,
read through Wikipedia articles as a
gensim
corpus,create
networkx
networks from a list of article names, andrun persistent homology on network.
See Usage for details.
Data
Wikipedia XML dumps are available at https://dumps.wikimedia.org/enwiki. Only two files are required for reproduction: (1) enwiki-DATE-pages-articles-multistream.xml.bz2
and (2) enwiki-DATE-pages-articles-multistream-index.txt.bz2
, where DATE is the date of the dump. Both files are multistreamed versions of the zipped files, which allow the user to access an article without unpacking the whole file. In the article, we used the archived zipped file from August 1, 2019, which is available in Dropbox.
Citation
To cite wikinet
please use the following publication: https://arxiv.org/abs/2010.08381.