five

Code of word network topic model

收藏
DataCite Commons2020-09-01 更新2024-07-25 收录
下载链接:
https://figshare.com/articles/dataset/Code_of_word_network_topic_model/5572591/1
下载链接
链接失效反馈
官方服务:
资源简介:
<pre>1. Introduction --------------- <br></pre><pre><br></pre><pre>Class PrepareInput can convert the original documents into word co-occurrence network and re-weight it, then save it as pseudo documents. The output of PrepareInput can be used directly as the input of jGibbsLDA. Class InferenceTopicsForOrgDocs is used to inference topics of original documents, after the execution of jGibbsLDA. * Note that jGibbsLDA is a free software written by Xuan-Hieu Phan. More details can be found in http://jgibblda.sourceforge.net. <br></pre><pre><br></pre><pre>2. Installation --------------- Straightforward Java compilation can be done with the following commands: &gt; tar -xzcf wntm.tar.gz &gt; cd wntm &gt; javac *.java <br></pre><pre><br></pre><pre>3. Usage -------- &gt; java PrepareInput Example usage: &gt; java PrepareInput sample.txt ./ sample 10 * Note that constructing word network might require lots of memory, especially when the original input file is large. For example, original file size of 250MB needs 7GB memory in our experiment. In this case, one might use following command to configure the maximum memory can be used by Java. &gt; java -Xmx10g PrepareInput sample.txt ./ sample 10 <br></pre><pre><br></pre><pre>4. Output --------- When PrepareInput completes, it will output two files. The file named with suffix ".word" stores all the nodes in the word network. The file named with suffix ".adjacent" stores pseudo documents ready to be used in jGibbsLDA. <br></pre><pre><br></pre><pre>5. Inference ------------ Since WNTM models topics for a word's adjacent node list. Therefore, we need to inference the topic distribution of original documents, when jGibbsLDA finished training pseudo documents. One can use following command to do inference: &gt; java InferenceTopicsForOrgDocs to get topics of original documents, aka the .theta file for original corpus. <br></pre><pre><br></pre><pre>6. Contact ---------- If you have any problem, please feel free to contact Jichang Zhao(jichang@buaa.edu.cn) </pre>
提供机构:
figshare
创建时间:
2017-11-05
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作