English language Web page dataset
收藏Figshare2017-01-27 更新2026-04-08 收录
下载链接:
https://figshare.com/articles/dataset/English_language_Web_page_dataset/4588729/1
下载链接
链接失效反馈官方服务:
资源简介:
<sub>This dataset is considered a list of English Language Web pages, after randomly collecting 10,000 unique URLs from DMOZ. Then, testing the content of the Web page for its language, using four language detection methods (http content, title tag, trigram, and langID). We decided to consider the Web page part of the English Web if it passed any one of the language tests. </sub><br>Note this dataset was live as of Dec 2015-March 2016.<br><br><b><sub>This data set is part of the Journal:</sub></b><sup>Lulwah M. Alkwai, Michael L. Nelson, and Michele C. Weigle. 2017. Comparing the Archival Rate of Arabic, English, Danish, and Korean Language Web Pages. TOIS.</sup><br><sup></sup><b><sup>This work was an extension of the paper:</sup></b> <sub>Lulwah M. Alkwai, Michael L. Nelson, and Michele C. Weigle. 2015. How Well Are Arabic Websites Archived?. In Proceedings of the 15th IEEE/ACM Joint Conference on Digital Libraries (JCDL). ACM</sub><sub><br></sub>
提供机构:
Lulwah Alkwai
创建时间:
2017-01-27



