API Reference
Modules:
Corpusiterate through articlesPersistentHomologyrun persistent homology
- class wikinet.dump.Dump(path_xml, path_idx)
Dumploads and parses dumps from wikipedia frompath_xmlwith indexpath_idx.- idx: dictionary
{'page_name': (byte offset, page id, block size)}Cached. Lazy.- links: list of strings
All links.
- article_links: list of strings
Article links (not files, categories, etc.)
- years: list of int
Years in the History section of a wikipedia page BC denoted as negative values
- page: mwparserfromhell.wikicode
Current loaded wiki page
- path_xml: string
Path to the zipped XML dump file.
- path_idx: string
Path to the zipped index file.
- offset_max: int
Maximum offset. Set as the size of the zipped dump.
- cache: xml.etree.ElementTree.Node
Cache of the XML tree in current block
- load_page(page_name, filter_top=False)
Loads & returs page (
mwparserfromhell.wikicode) namedpage_namefrom dump file. Returns only the top section iffilter_top.
- class wikinet.corpus.Corpus(dump, output='doc', dct=None, load_index=True)
Corpusis aniterable& aniteratorthat usesDumpto iterate through articles.corpus = wikinet.Corpus(dump) print(corpus[100]) [c for c in corpus]
- dump:
wikinet.Dump a
Dumpobject- output:
string docfor array of documentstagforTaggedDocument(doc, [self.i])bowfor bag of words[(int, int)]- dct:
gensim.corpus.Dictionary used to create BoW representation
- dump:
- class wikinet.net.Net(path_graph='', path_barcodes='')
Netis a wrapper fornetworkx.DiGraph. Usesdionysusfor persistence homology.- tfidf: scipy.sparse.csc.csc_matrix
sparse column matrix of tfidfs, ordered by nodes, also stored in
self.graph.graph['tfidf'], lazy- MAX_YEAR: int
year = MAX_YEAR (2020)for nodes with parents without years- YEAR_FILLED_DELTA: int
year = year of parents + YEAR_FILLED_DELTA (1)
- static assign_communities(graph)
Compute modular communities of
graph(nx.DiGraph). Assign community numbercommunityto each node. Assignmodularitytograph. Seegreedy_modularity_communitiesinnetworkx.
- static assign_core_periphery(graph)
Compute core-periphery of
graph(nx.DiGraph; converted to symmetricnx.Graph). Assigncoreas1or0to each node. Assigncorenesstograph. Seecore_periphery_dir()inbctpy.
- static build_graph(name='', dump=None, nodes=None, depth_goal=1, filter_top=True, remove_isolates=True, add_years=True, fill_empty_years=True, model=None, dct=None, compute_core_periphery=True, compute_communities=True, compute_community_cores=True)
Builds
network.graph(networkx.Graph) from nodes (listofstring). Setmodel(fromgensim) anddct(gensim.corpora.Dictionary) for weighted edges. Setfilter_toptoTrueonly if you want the top “lead” section of the article.
- load_barcodes(path)
Loads
barcodesfrompickle.
- load_graph(path)
Loads
graphfrompath. Iffilename.gexfthen read asgexf. Else, usepickle.
- randomize(null_type, compute_core_periphery=True, compute_communities=True, compute_community_cores=True)
Returns a new
wiki.Netwith a randomized copy ofgraph. Setnull_typeas one of'year','target'.
- save_barcodes(path)
Saves
barcodesaspickle.
- save_graph(path)
Saves
graphatpath. Iffilename.gexfthen save asgexf. Else, usepickle.
- class wikinet.persistent_homology.PersistentHomology
Netis a child ofPersistentHomology. So you can call any of the following with anywikinet.Netobject.- cliques: list of lists
lazy
- filtration: dionysus.filtration
lazy
- persistence: dionysus.reduced_matrix
lazy
- barcodes: pandas.DataFrame
lazy
- static compute_barcodes(f, m, graph, names)
Uses dionysus filtration & persistence (in reduced matrix form) to compute barcodes.
- f: dionysus.Filtration
filtration
- m: dionysus.ReducedMatrix
(see homology_persistence)
- names: list of strings
names of node indices
- Returns
pandas.DataFrame