Dataset Search Command¶
The dataset-search
command finds papers that use specific datasets, including benchmark results and performance metrics.
Basic Usage¶
Examples¶
Simple Dataset Search¶
# Find papers using ImageNet
scoutml dataset-search "ImageNet"
# Papers using specific dataset
scoutml dataset-search "COCO"
# Domain-specific datasets
scoutml dataset-search "GLUE"
With Benchmark Results¶
# Include benchmark scores
scoutml dataset-search "ImageNet" --include-benchmarks
# Exclude benchmark tables
scoutml dataset-search "CIFAR-10" --no-benchmarks
Options¶
Option | Type | Default | Description |
---|---|---|---|
--limit |
INTEGER | 20 | Number of results |
--include-benchmarks |
FLAG | False | Include benchmark results |
--no-benchmarks |
FLAG | False | Exclude benchmark results |
--year-min |
INTEGER | None | Minimum publication year |
--year-max |
INTEGER | None | Maximum publication year |
--output |
CHOICE | table | Output format: table/json/csv |
--export |
PATH | None | Export results to file |
Popular Datasets¶
Computer Vision¶
# Classification
scoutml dataset-search "ImageNet" --include-benchmarks
scoutml dataset-search "CIFAR-10"
scoutml dataset-search "CIFAR-100"
# Object Detection
scoutml dataset-search "COCO"
scoutml dataset-search "Pascal VOC"
scoutml dataset-search "Open Images"
# Segmentation
scoutml dataset-search "ADE20K"
scoutml dataset-search "Cityscapes"
Natural Language Processing¶
# Benchmarks
scoutml dataset-search "GLUE"
scoutml dataset-search "SuperGLUE"
scoutml dataset-search "SQuAD"
# Language Modeling
scoutml dataset-search "WikiText-103"
scoutml dataset-search "BookCorpus"
scoutml dataset-search "Common Crawl"
Multimodal¶
# Vision-Language
scoutml dataset-search "MS-COCO Captions"
scoutml dataset-search "Conceptual Captions"
scoutml dataset-search "LAION-400M"
Benchmark Analysis¶
Finding SOTA Results¶
# Get current SOTA on ImageNet
scoutml dataset-search "ImageNet" \
--include-benchmarks \
--year-min 2022 \
--limit 10
Tracking Progress¶
# See improvement over time
scoutml dataset-search "GLUE" \
--include-benchmarks \
--year-min 2018 \
--output json \
--export glue_progress.json
Advanced Usage¶
Comparing Methods on Same Dataset¶
# Find diverse approaches
scoutml dataset-search "CIFAR-10" \
--include-benchmarks \
--limit 30 \
--year-min 2021
Dataset Combinations¶
Some papers use multiple datasets:
# Pre-training datasets
scoutml dataset-search "ImageNet-21K"
# Fine-tuning datasets
scoutml dataset-search "iNaturalist"
Domain Transfer Studies¶
Output Formats¶
With Benchmarks¶
When using --include-benchmarks
:
Shows: - Paper details - Model/method used - Top-1 accuracy - Top-5 accuracy - Other metrics (FLOPs, parameters, etc.)
JSON Format¶
Returns:
[
{
"arxiv_id": "2201.03545",
"title": "DETReg: Unsupervised Pretraining with...",
"dataset_usage": "COCO object detection",
"benchmark_results": {
"mAP": 45.5,
"AP50": 64.3,
"AP75": 49.2
}
}
]
Finding Datasets by Task¶
Classification Datasets¶
# Image Classification
scoutml dataset-search "ImageNet"
scoutml dataset-search "Places365"
scoutml dataset-search "iNaturalist"
# Fine-grained Classification
scoutml dataset-search "CUB-200"
scoutml dataset-search "Stanford Cars"
scoutml dataset-search "FGVC Aircraft"
Detection Datasets¶
# General Object Detection
scoutml dataset-search "COCO"
scoutml dataset-search "Objects365"
# Specific Domains
scoutml dataset-search "KITTI" # Autonomous driving
scoutml dataset-search "WiderFace" # Face detection
Segmentation Datasets¶
# Semantic Segmentation
scoutml dataset-search "ADE20K"
scoutml dataset-search "Cityscapes"
# Instance Segmentation
scoutml dataset-search "COCO" --include-benchmarks
Dataset-Specific Insights¶
Low-Resource Datasets¶
# Few-shot learning datasets
scoutml dataset-search "miniImageNet"
scoutml dataset-search "Omniglot"
Synthetic Datasets¶
Video Datasets¶
Best Practices¶
- Use exact names: "MS-COCO" or "COCO", not "coco dataset"
- Check variants: Some datasets have multiple versions
- Include benchmarks: For comparing performance across papers
- Filter by year: Recent papers often have better results
- Export results: For tracking SOTA progression
Benchmark Tracking¶
Creating SOTA Tables¶
# Export benchmark results
scoutml dataset-search "ImageNet" \
--include-benchmarks \
--year-min 2020 \
--output csv \
--export imagenet_sota.csv
Analyzing Trends¶
# JSON export for analysis
scoutml dataset-search "GLUE" \
--include-benchmarks \
--output json | \
jq '.[] | {paper: .title, score: .benchmark_results.average}'
Common Issues¶
Dataset Name Variations¶
Common alternatives: - "MS-COCO" vs "COCO" - "ImageNet" vs "ILSVRC" - "Pascal VOC" vs "VOC2012"
Try different variations if no results.
Missing Benchmarks¶
If --include-benchmarks
returns no results:
1. The dataset might not have standardized metrics
2. Papers might not report comparable numbers
3. Try without benchmark filter first
Related Commands¶
search
- General paper searchmethod-search
- Search by methods