Diagnostic Evaluation Dataset for Web of Science Smart Search vs. Advanced Search: A Controlled, Tiered Comparative Analysis with Query Parsing and Relevance Judgment Data
收藏Mendeley Data2026-04-18 收录
下载链接:
https://data.mendeley.com/datasets/5bkxpvnktt
下载链接
链接失效反馈官方服务:
资源简介:
This dataset contains the complete evidence for a diagnostic evaluation of Web of Science (WoS) "Smart Search". The study hypothesized that Smart Search lacks the semantic understanding and precise control needed to substitute for the traditional Advanced Search in rigorous academic work.
Data Content & Structure:
The data is organized via a Three-Tier Model of Retrieval Intelligence:
Tier 1 (Lexical): Tests on keyword processing, wildcards (*), spelling correction, and cross-lingual mapping.
Tier 2 (Pattern): Tests on Boolean logic recognition (case-sensitive AND/OR/NOT), field recognition (author:), and complex query execution.
Tier 3 (Semantic): Natural language query results and manual relevance judgments for two case studies (using prototyping... and review of...), demonstrating semantic failure.
Key Findings (Data-Driven):
Wildcards fail completely (e.g., cell* is run as cell).
Boolean logic is brittle (only uppercase AND/OR/NOT recognized).
Query expansion distorts intent (e.g., adding 90 unrelated documents to a precise Boolean query).
Semantic understanding collapses: Natural language is reduced to a "bag-of-words AND" strategy. Manual assessment shows high rates of thematic deviation (36%) and strategy failure (4% precision).
Data Collection Method:
Controlled, comparative experiment. Each Smart Search query was benchmarked against an equivalent, precisely formulated query in WoS Advanced Search. For Tier 3, random samples (n=50) were assessed by two independent coders. All searches were limited to the WoS Core Collection (pre-2025 publications).
Reuse & Interpretation:
Data supports the associated paper's findings and is reusable for:
Verification & Replication: Repeat tests on WoS or apply the framework to other databases.
Methodology Template: The Three-Tier Model offers a structured approach for evaluating "smart" search interfaces.
Information Literacy: Demonstrates practical limits of automated search tools.
Note: Result counts are a snapshot; behavioral patterns (e.g., wildcard failure) are stable design features.
创建时间:
2026-02-02



