five

Self-Reported Myers-Briggs Personality Types on Twitter

收藏
Figshare2023-07-03 更新2026-04-28 收录
下载链接:
https://figshare.com/articles/dataset/Self-Reported_Myers-Briggs_Personality_Types_on_Twitter/23620554
下载链接
链接失效反馈
官方服务:
资源简介:
We collected the data for our analysis by utilising the academic Twitter API (V2). The four-letter acronyms associated with the Myers-Briggs Type Indicator (MBTI) give people a short categorisation of their personality that is easily self-reported on social media in the form of a regular expression. As a result, people are much more likely to self-report their categorical MBTI rather than other personality types. The four letter MBTI acronyms are also unique to the Myers-Briggs questionnaire, meaning they can be easily queried using the Twitter API. This also means these personality types won't be confused with any other acronym or word, reducing the likelihood we incorrectly classify any users. When we initially explored Twitter, we found that some users self-reported their personality type in their biography and other users would self-report their personality types in their tweets. As a result, we formulated two methods for querying and labelling the Myers-Briggs personality type of accounts. We describe the two methods below: Firstly, we used Tweepy's 'search_users' endpoint to obtain the set of users who currently self-report their MBTI in their username or biography. Due to the rate limits associated with this endpoint we were limited to obtaining no more than 1000 users for each unique search query. Secondly, we used the Twitter API's 'full_archive_search' endpoint to obtain the set of users who self-reported their Myers-Briggs personality type in a Tweet since Twitter's creation (March 26, 2006). We searched for users who tweeted any of the three regular expressions, followed by their personality type: 'I am...', 'I am a...' or 'I am an...'. Note that we only searched for self-reports in Tweets and excluded Retweets, Quotes and Replies in our query due to these having a much higher potential of incorrectly labelling an account. Furthermore, we were bound by rate limits of 300 requests per 15-minute window, however there were no hard bounds on the number of tweets or users we could obtain. As a result, we ran this query for each personality type until the search was exhausted. Note that in both cases, the queries were not case-sensitive. In the attached dataset, we provide both the Twitter User IDs and the Myers-Briggs Personality Types associated with the 68,958 users obtained using the two methods discussed above. We provide this dataset prior to any preprocessing steps performed in our paper.

本研究通过学术版Twitter API(V2)收集分析所用数据集。迈尔斯-布里格斯类型指标(Myers-Briggs Type Indicator, MBTI)所对应的四字母缩写,可对人格进行简短分类,用户可通过正则表达式形式在社交媒体上便捷地自我报告此类人格类型。因此,相较其他人格类型,用户更倾向于自我报告其MBTI四字母分类结果。MBTI四字母缩写同时也是迈尔斯-布里格斯问卷的专属标识,因此可通过Twitter API轻松检索;且此类缩写不会与其他缩写或词汇混淆,降低了对用户进行错误分类的可能性。在初步调研Twitter平台时,我们发现部分用户会在个人简介中自我报告人格类型,另有部分用户会在推文中披露此类信息。据此,我们设计了两种用于检索并标记账号MBTI人格类型的方法,具体如下:其一,我们使用Tweepy的`search_users`接口,获取当前在用户名或个人简介中自我报告MBTI类型的用户集合。受该接口的调用频率限制,每个独立检索查询最多仅能获取1000名用户。其二,我们使用Twitter API的`full_archive_search`接口,获取自Twitter创立(2006年3月26日)以来,所有在推文中自我报告MBTI人格类型的用户集合。我们检索了发布过以下三类句式的用户:句式后接其人格类型,即"I am..."、"I am a..."及"I am an..."。需注意,我们仅检索推文中的自我报告内容,且在查询中排除了转发(Retweet)、引用推文(Quote Tweet)及回复推文(Reply),因为此类内容错误标记账号的概率更高。此外,该接口受每15分钟最多300次调用请求的频率限制,但对于可获取的推文或用户数量无硬性上限。因此,我们针对每种人格类型执行该检索,直至检索资源耗尽。需说明,两种检索方法均不区分大小写。本附件数据集包含通过上述两种方法获取的68958名用户的Twitter用户ID及其对应的MBTI人格类型,且未经过论文中提及的任何预处理步骤。
创建时间:
2023-07-03
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作