Usage

Installation

Run pip install wikinet. Then, import wikinet as wiki.

Reading zipped Wikipedia XML dumps

dump = wiki.Dump(path_xml, path_index)
page = dump.load_page('Science')

Then, you can view the page and information about the page.

print(page)
print(dump.page)
print(dump.links) # all links
print(dump.article_links) # just links to articles
print(dump.years) # years in article (intro & history sections)

Creating a network of Wikipedia articles

network = wiki.Net.build_graph(
    name='my network', dump=dump, nodes=['Science', 'Mathematics', 'Philosophy']
)

Optionally, for edge weights with cosine distance between tf-idf vectors of articles

network = wiki.Net.build_graph(
    name='my network', dump=dump, nodes=['Science', 'Mathematics', 'Philosophy'],
    model=tfidf, # gensim.models
    dct=dct, # gensim.corpora.Dictionary
)

Then, network.graph gives you a networkx.DiGraph.