Similar Command¶
The similar
command finds papers similar to a given paper or abstract text, using semantic similarity matching.
Basic Usage¶
Examples¶
Find Similar Papers by ID¶
# Basic similarity search
scoutml similar --paper-id 1810.04805
# More results
scoutml similar --paper-id 1810.04805 --limit 20
# Higher similarity threshold
scoutml similar --paper-id 1810.04805 --threshold 0.8
Find Similar Papers by Abstract¶
# Using abstract text
scoutml similar --abstract "We propose a new method for self-supervised learning that..."
# From file
scoutml similar --abstract-file my_abstract.txt
Options¶
Option | Type | Default | Description |
---|---|---|---|
--paper-id |
TEXT | None | ArXiv ID of source paper |
--abstract |
TEXT | None | Abstract text to match |
--abstract-file |
PATH | None | File containing abstract |
--limit |
INTEGER | 10 | Number of results |
--threshold |
FLOAT | 0.7 | Similarity threshold (0-1) |
--output |
CHOICE | table | Output format: table/json |
--export |
PATH | None | Export results to file |
Input Methods¶
Using Paper ID¶
Most common approach:
Using Abstract Text¶
For unpublished work or ideas:
# Direct abstract
scoutml similar --abstract "This paper introduces a novel architecture for
efficient vision transformers that reduces computational complexity while
maintaining accuracy through adaptive token selection..."
Using Abstract File¶
For longer abstracts:
# Save abstract to file
cat > abstract.txt << EOF
We present a new approach to few-shot learning that combines
meta-learning with self-supervised pre-training. Our method...
EOF
# Search for similar papers
scoutml similar --abstract-file abstract.txt --limit 20
Understanding Similarity¶
Similarity Scores¶
- 0.9-1.0: Nearly identical research
- 0.8-0.9: Very similar approach/problem
- 0.7-0.8: Related work in same domain
- 0.6-0.7: Loosely related
- Below 0.6: Different domain/approach
Threshold Settings¶
# High threshold for very similar papers
scoutml similar --paper-id 2103.00020 --threshold 0.85
# Lower threshold for broader exploration
scoutml similar --paper-id 2103.00020 --threshold 0.65 --limit 30
Use Cases¶
Literature Review¶
Build comprehensive related work sections:
# Start with core paper
scoutml similar --paper-id 1706.03762 --limit 25 \
--export transformer_related.json
# Process results
cat transformer_related.json | \
jq -r '.[] | "\(.similarity)\t\(.arxiv_id)\t\(.title)"' | \
sort -rn
Research Validation¶
Check if your idea already exists:
# Write your abstract
cat > my_idea.txt << EOF
We propose using diffusion models for video generation by treating
temporal consistency as a denoising problem...
EOF
# Search for similar work
scoutml similar --abstract-file my_idea.txt --threshold 0.7
Finding Research Gaps¶
# Get similar papers
scoutml similar --paper-id 2010.11929 --limit 30 --output json > similar.json
# Analyze what's NOT covered
cat similar.json | jq '.[] | select(.similarity < 0.8) | .title'
Advanced Usage¶
Building Paper Networks¶
# Create network of related papers
explored=()
to_explore=("1810.04805")
while [ ${#to_explore[@]} -gt 0 ]; do
current="${to_explore[0]}"
to_explore=("${to_explore[@]:1}")
if [[ ! " ${explored[@]} " =~ " ${current} " ]]; then
explored+=("$current")
# Get similar papers
scoutml similar --paper-id "$current" --threshold 0.8 \
--output json | \
jq -r '.[].arxiv_id' >> network.txt
fi
done
Tracking Research Evolution¶
# Find papers building on specific work
BASE_PAPER="1706.03762" # Attention is All You Need
# Get direct extensions (high similarity)
scoutml similar --paper-id $BASE_PAPER --threshold 0.85 --limit 10
# Get inspired work (medium similarity)
scoutml similar --paper-id $BASE_PAPER --threshold 0.7 --limit 20
Competitive Analysis¶
# Your paper's abstract
MY_ABSTRACT="Our method combines..."
# Find competing approaches
scoutml similar --abstract "$MY_ABSTRACT" --threshold 0.75 \
--output json | \
jq '.[] | {paper: .arxiv_id, similarity: .similarity, title: .title}'
Output Formats¶
Table Format (Default)¶
Shows: - Similarity score - Paper title (linked) - Authors - Year - Brief description
JSON Format¶
Returns:
[
{
"arxiv_id": "1907.11692",
"title": "RoBERTa: A Robustly Optimized BERT...",
"similarity": 0.89,
"authors": ["Yinhan Liu", ...],
"year": 2019,
"abstract": "Language model pretraining has..."
}
]
Best Practices¶
For Literature Reviews¶
- Start with high threshold (0.8+) for directly related work
- Lower threshold (0.7) for broader context
- Export results for systematic analysis
- Check multiple papers to find all relevant work
For Idea Validation¶
- Write detailed abstract with key technical terms
- Use moderate threshold (0.7-0.75)
- Check high-similarity matches carefully
- Read abstracts of matches to verify
For Research Exploration¶
- Start with seminal papers
- Use lower thresholds (0.65-0.7)
- Explore iteratively using found papers
- Track similarity scores to understand relationships
Common Workflows¶
Pre-submission Check¶
# Before submitting your paper
echo "Your abstract here..." > my_abstract.txt
# Check for similar work
scoutml similar --abstract-file my_abstract.txt \
--threshold 0.75 \
--limit 20 \
--export similar_work.json
# Review high-similarity papers
cat similar_work.json | jq '.[] | select(.similarity > 0.8)'
Building Reading Lists¶
# Core papers to explore
papers=("1706.03762" "1810.04805" "2010.11929")
# Find related papers for each
for paper in "${papers[@]}"; do
echo "=== Papers similar to $paper ==="
scoutml similar --paper-id "$paper" \
--threshold 0.75 \
--limit 5
done
Troubleshooting¶
No Similar Papers Found¶
If no results: 1. Lower the threshold (try 0.6) 2. Check if paper ID is correct 3. For abstracts, ensure sufficient detail 4. Try different keywords/phrasing
Too Many Results¶
To focus results: 1. Increase threshold (0.8+) 2. Reduce limit 3. Be more specific in abstract 4. Filter by year after export