five

Database of Indian Political Speeches (DIPS) dataset, current version, clean & subsetted [dips-v-b1-2]

收藏
NIAID Data Ecosystem2026-03-12 收录
下载链接:
https://doi.org/10.7910/DVN/R59EUT
下载链接
链接失效反馈
官方服务:
资源简介:
Contents of the DIPS dataset (1888-20), version beta 1.2 Locutors: 28 Locutors+type of addresses (clean): 70 Number of words of the subsetted corpus: 66,917,278 Combined codes are in column GE [[187]] of "metadata.odf" G: M.K. Gandhi (1888-48)*1, 1 54 82 932*2 A: B.R. Ambedkar (1913-56), 32 08 826 N: Jawaharlal Nehru (1946-64), 9 03 746 I: Indira Gandhi (1966-84), 15 94 216 D: M.R. Desai (1977-79), 1 01 056 C: Charan Singh (1979), 16 814 L: Rajiv Gandhi (1984-89), 10 06 819 P: V.P. Singh (1989-90), 1 01 464 Q: Chandra Shekhar Singh (1990-91), 45 252 R: P.V. Rao (1991-95), 8 07 382 V: A.B. Vajpayee (1998-04), 3 26 142 S: Manmohan Singh (2004-14), 17 58 132 M: Narendra Modi (2010*3-20), 43 85 892 W: 13th Lok Sabha (1999-04), 17 68 345*4 X: 14th Lok Sabha (2004-09), 14 80 312*4 Y: 15th Lok Sabha (2009-14), 17 66 763*4 Z: 16th Lok Sabha (2014-19), 24 27 617*4 H: Budget speeches (1947-19), 11 65 566 O: Union Budget*5 (1947-1900), 54 42 555 F: Five Year Plans (1951-12), 42 52 477 E: Economic Survey (1958-20), 55 03 946 U: RBI speeches (1990-20), 43 02 044 T: RBI*6 Annual reports (1936-2019), 43 02 044 &: Financial statements (2004-2015), 229 765 J: J.V. Sadhguru (2013-20), 19 46 249 K: Ravi Shankar (2009-20), 1 61 067 $: Baba Ramdev (2011-18), 99 824 B: B.K Shivani (2011-20), 17 18 211 s: Speech (public/TV address), 2 13 91 598 n: Independence/Rep. Day speech, 60 795 m: Reg. radio show (Mann ki baat), 2 80 646 i: Interview, 9 27 164 j: Debate (public/commission), 10 94 723 q: Q&A (commission, parliament), 74 74 035 t: Statement (no audience), 16 08 431 o: Report (general)*7, 1 95 01 022 f: Financial declaration*8, 2 29 765 a: Private correspondence, 59 49 577 d: Diary entry, 46 617 c: Column/editorial, 55 36 630 b: Book (general), 20 34 938 y: Poetry*9, — r: Resolution, 66 828 e: Petition, 80 208 x: Draft, 2 558 v: Trial, 20 856 w: Tweet, 6 10 887 l: Post-liberalization*10 (>1991), 3 50 62 898 p: Pre-liberalization (<1991), 3 18 54 380 Notes: *1 Years covered in the database for a particular locutor. *2 Total No. of words contained in a particular corpus section. The subsetted database used here contains 6 69 17 278 words. *3 Few entries (n=88) in N. Modi books are dated prior to 2010. *4 To limit computing load, the Lok Sabha corpus was subsetted (approx. 1/10th of its original size) [data courtesy TCPD]. *5 Budget documents prior to 1997 – except budget speeches – have been removed from the analysis due to the low quality of text extraction (dirty OCR). *6 Reserve Bank of India. *7 Masked due to redundancy. ‘Administrative documents’ are always reports except for H & T. *8 Masked due to redundancy with the locutor category ‘Financial statements’. *9 Not in use, included in the format category ‘book’. *10 Label applied only to administrative documents and speeches spanning over a period of 20 years or more.
创建时间:
2020-11-21
二维码
社区交流群
二维码
科研交流群
商业服务