five

Self-Reported Myers-Briggs Personality Types on Twitter

收藏
figshare.com2023-07-03 更新2025-03-25 收录
下载链接:
https://figshare.com/articles/dataset/Self-Reported_Myers-Briggs_Personality_Types_on_Twitter/23620554/1
下载链接
链接失效反馈
官方服务:
资源简介:
We collected the data for our analysis by utilising the academic Twitter API (V2). The four-letter acronyms associated with the Myers-Briggs Type Indicator (MBTI) give people a short categorisation of their personality that is easily self-reported on social media in the form of a regular expression. As a result, people are much more likely to self-report their categorical MBTI rather than other personality types. The four letter MBTI acronyms are also unique to the Myers-Briggs questionnaire, meaning they can be easily queried using the Twitter API. This also means these personality types won't be confused with any other acronym or word, reducing the likelihood we incorrectly classify any users. When we initially explored Twitter, we found that some users self-reported their personality type in their biography and other users would self-report their personality types in their tweets. As a result, we formulated two methods for querying and labelling the Myers-Briggs personality type of accounts. We describe the two methods below: Firstly, we used Tweepy's 'search_users' endpoint to obtain the set of users who currently self-report their MBTI in their username or biography. Due to the rate limits associated with this endpoint we were limited to obtaining no more than 1000 users for each unique search query. Secondly, we used the Twitter API's 'full_archive_search' endpoint to obtain the set of users who self-reported their Myers-Briggs personality type in a Tweet since Twitter's creation (March 26, 2006). We searched for users who tweeted any of the three regular expressions, followed by their personality type:  'I am...', 'I am a...' or 'I am an...'. Note that we only searched for self-reports in Tweets and excluded Retweets, Quotes and Replies in our query due to these having a much higher potential of incorrectly labelling an account. Furthermore, we were bound by rate limits of 300 requests per 15-minute window, however there were no hard bounds on the number of tweets or users we could obtain. As a result, we ran this query for each personality type until the search was exhausted. Note that in both cases, the queries were not case-sensitive. In the attached dataset, we provide both the Twitter User IDs and the Myers-Briggs Personality Types associated with the 68,958 users obtained using the two methods discussed above. We provide this dataset prior to any preprocessing steps performed in our paper.

本团队通过运用学术Twitter API(第二版)收集了分析所需数据。与迈尔斯-布里格斯性格类型指标(MBTI)相关的四字母缩写为人们提供了一种简洁的人格分类方式,便于在社交媒体上通过正则表达式形式进行自我报告。因此,人们更有可能自我报告其MBTI分类而非其他性格类型。这些四字母的MBTI缩写也专属于迈尔斯-布里格斯问卷,这意味着它们可以借助Twitter API轻松查询。这也意味着这些性格类型不会与其他缩写或词汇混淆,降低了我们错误分类用户的风险。在初期对Twitter的探索中,我们发现一些用户会在其个人简介中报告自己的性格类型,而其他用户则会在其推文中报告。鉴于此,我们制定了两种方法来查询和标注账户的MBTI性格类型。以下是对这两种方法的描述: 首先,我们利用Tweepy的'search_users'端点获取了当前在用户名或个人简介中报告MBTI的用户的集合。由于该端点存在速率限制,我们每个独特的搜索查询仅限于获取不超过1000个用户。 其次,我们使用Twitter API的'full_archive_search'端点获取了自Twitter创立(2006年3月26日)以来在推文中报告其迈尔斯-布里格斯性格类型的用户集合。我们搜索了使用以下三个正则表达式之一后跟其性格类型的用户:"我是…"、"我是一名…"或"我是一位…"。请注意,我们仅搜索了推文中的自我报告,并排除了转发、引用和回复,因为这些内容更有可能错误地标注账户。此外,我们受到每15分钟窗口内300个请求的速率限制,但对于推文或用户的数量没有硬性限制。因此,我们针对每种性格类型分别运行查询,直到搜索耗尽。 需要注意的是,在两种情况下,查询均不区分大小写。 在所附数据集中,我们提供了使用上述两种方法获得的68,958个用户的Twitter用户ID及其关联的MBTI性格类型。在论文中任何预处理步骤之前,我们提供此数据集。
提供机构:
figshare.com
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作