Content-based file identification (512-byte)
收藏DataCite Commons2022-07-22 更新2025-04-16 收录
下载链接:
https://ieee-dataport.org/documents/content-based-file-identification-512-byte
下载链接
链接失效反馈官方服务:
资源简介:
content-based dataset that composes of 12 features for eight common types of files (JPG, PNG, HTML, TXT, MP4, M4A, MOV, and MP3) to be suitable for file type identification (FTI). These features were extracted from pool of file fragment of size 512 byte each from all the prementioned eight types. This dataset is developed in such a way that can be used for supervised and unsupervised ML model. It provides the ability to classifying and clustering the above-mentioned type into two levels. As a fine grain level (by their file type exactly, JPG, PNG, HTML, TXT, MP4, M4A, MOV, and MP3) and as a coarse-grain level (by their broad type, image, text, audio, video).
提供机构:
IEEE DataPort
创建时间:
2022-07-22



