PhishDecloaker Datasets
收藏NIAID Data Ecosystem2026-05-02 收录
下载链接:
https://zenodo.org/record/11228973
下载链接
链接失效反馈官方服务:
资源简介:
This record contains datasets part of the paper: "PhishDecloaker: Detecting CAPTCHA-cloaked Phishing Websites via Hybrid
Vision-based Interactive Models", published at USENIX Security'24.
Phishing Kit Dataset
Section: 2
Description: For empirical study.
Contents: 100 defanged PHP phishing kits representing the following list of brands
1. Microsoft
2. Banco de Oro
3. Microsoft OneDrive
4. Deutsche Kreditbank
5. Adobe Acrobat
6. N26
7. Absa Group
8. DHL
9. Microsoft
10. Correos
11. Kempinski Summerland Hotel & Resort Beirut
12. Vantage West Credit Union
13. NetFlix
14. Agencia Tributaria
15. Square
16. Chronopost
17. PayPal
18. American Express
19. Allegro
20. LinkedIn
21. virtru
22. Citibank
23. AOL
24. Credit Agricole
25. Mercado Pago
26. Université de Pau et des Pays de l'Adour (UPPA)
27. Fifth Third Banki
28. Columbia Bank
29. Alibaba Mail
30. Microsoft OneDrive
31. Intesa Sanpaolo
32. Santander
33. America First Credit Union
34. Barclays
35. Interac
36. USPS
37. Wells Fargo
38. Yahoo
39. XFINITY
40. Berliner Sparkasse
41. OneDrive
42. Standard Bank
43. Wells Fargo
44. aruba.it
45. Bancolombia
46. Caisse d’Epargne
47. DubaiPay
48. Chase Bank
49. M&T Bank
50. Postmaster
51. Volksbanken Raiffeisenbanken
52. Facebook
53. Huntington Bank
54. Commonwealth Bank of Australia
55. Orange
56. shopify
57. Google Drive
58. WalletConnect
59. Meritrust Credit Union
60. Credit Agricole
61. Desjardins
62. Postbank
63. Dropbox
64. DocuSign
65. dpdgroup
66. L'Assurance Maladie
67. Adobe Acrobat
68. Global Sources
69. Microsoft Excel
70. SFR
71. FedEx
72. Citibank
73. Royal Credit Union
74. GoDaddy
75. ADP
76. International Card Services
77. Israeli Post
78. UNI Financial Cooperation
79. TD Bank
80. ATB Mobile
81. HSBC
82. Bank of Montreal
83. RBC Royal Bank
84. IONOS
85. AlaskaUSA Federal Credit Union
86. French Government
87. UOL SAC
88. Banco Itaú Paraguay
89. Amazon
90. Apple
91. AT&T
92. Australian Government
93. Bank of America
94. BNP Paribas
95. eBay
96. ING Group
97. Instagram
98. MetaMask
99. SingTel
100. Société Générale
Landscape Dataset
Section: 4.3
Description: For training the rotation CAPTCHA solver model.
Contents: 7,268 natural and man-made landscape images (320×180).
Format: JPEG images.
CAPTCHA Detection Dataset
Section: 5.2.1
Description: For training the CAPTCHA detection model.
Contents: 19,680 webpage screenshots (1920×1080), 10,680 with annotated CAPTCHA bounding boxes, 9,000 without.
Format: PNG images with annotations in PASCAL VOC and COCO format.All bounding boxes are labeled as the "CAPTCHA" class (no CAPTCHA type categorization).
CAPTCHA Recognition Dataset
Section: 5.2.2
Description: For training the CAPTCHA recognition model
Contents: 6,612 CAPTCHA images distributed across 38 classes.
Format: PNG images with their corresponding class labels in CSV
CAPTCHA classes:
1. baidu_slide_rotate
2. dingxiang_audio
3. dingxiang_click_area
4. dingxiang_click_difference
5. dingxiang_click_font
6. dingxiang_click_icon
7. dingxiang_click_vr
8. dingxiang_click_word
9. dingxiang_drag
10. dingxiang_slide_puzzle
11. dingxiang_slide_puzzle2
12. dingxiang_slide_rotate
13. geetest_checkbox
14. geetest_click_icon
15. geetest_click_phrase
16. geetest_click_word
17. geetest_game_playing
18. geetest_game_playing2
19. geetest_select
20. geetest_slide_puzzle
21. hcaptcha
22. hcaptcha_checkbox
23. netease_click_icon
24. netease_click_phrase
25. netease_click_vr
26. netease_click_word
27. netease_drag
28. netease_slide
29. press_and_hold
30. recaptchav2
31. recaptchav2_checkbox
32. tencent_slide
33. text_1
34. text_2
35. text_3
36. text_4
37. text_5
38. text_6
CAPTCHA Open-set Dataset
Section: 5.2.2
Description: For testing the CAPTCHA detection and recognition pipeline.
Contents: 1,100 webpage screenshots (1920×1080), all of which have annotated CAPTCHA classes spanning 11 different categories.
Format: PNG CAPTCHA and screenshot images with their corresponding class labels in CSV
CAPTCHA classes:
1. arkose_select_2
2. capycaptcha_drag
3. dicecaptcha_qa
4. funcaptcha_select
5. funcaptcha_select_2
6. funcaptcha_select_3
7. funcaptcha_select_4
8. funcaptcha_select_5
9. funcaptcha_select_6
10. keycaptcha_drag
11. mtcaptcha_text
Ablation Dataset
Section: 5.4
Description: For training the CAPTCHA recognition model
Contents: 722 webpage screenshots (1920×1080), 622 with CAPTCHAs spanning 38 classes, 100 without.
Format: PNG images with their corresponding bounding box and class labels in CSV. Class IDs 0-37 can be directly mapped to class names in CAPTCHA recognition dataset. Class ID 38 are samples without CAPTCHAs.
创建时间:
2024-05-21



