Similar Command¶

The similar command finds papers similar to a given paper or abstract text, using semantic similarity matching.

Basic Usage¶

scoutml similar [OPTIONS]

Examples¶

Find Similar Papers by ID¶

# Basic similarity search
scoutml similar --paper-id 1810.04805

# More results
scoutml similar --paper-id 1810.04805 --limit 20

# Higher similarity threshold
scoutml similar --paper-id 1810.04805 --threshold 0.8

Find Similar Papers by Abstract¶

# Using abstract text
scoutml similar --abstract "We propose a new method for self-supervised learning that..."

# From file
scoutml similar --abstract-file my_abstract.txt

Options¶

Option	Type	Default	Description
`--paper-id`	TEXT	None	ArXiv ID of source paper
`--abstract`	TEXT	None	Abstract text to match
`--abstract-file`	PATH	None	File containing abstract
`--limit`	INTEGER	10	Number of results
`--threshold`	FLOAT	0.7	Similarity threshold (0-1)
`--output`	CHOICE	table	Output format: table/json
`--export`	PATH	None	Export results to file

Input Methods¶

Using Paper ID¶

Most common approach:

# Find papers similar to BERT
scoutml similar --paper-id 1810.04805 --limit 15

Using Abstract Text¶

For unpublished work or ideas:

# Direct abstract
scoutml similar --abstract "This paper introduces a novel architecture for 
efficient vision transformers that reduces computational complexity while 
maintaining accuracy through adaptive token selection..."

Using Abstract File¶

For longer abstracts:

# Save abstract to file
cat > abstract.txt << EOF
We present a new approach to few-shot learning that combines 
meta-learning with self-supervised pre-training. Our method...
EOF

# Search for similar papers
scoutml similar --abstract-file abstract.txt --limit 20

Understanding Similarity¶

Similarity Scores¶

0.9-1.0: Nearly identical research
0.8-0.9: Very similar approach/problem
0.7-0.8: Related work in same domain
0.6-0.7: Loosely related
Below 0.6: Different domain/approach

Threshold Settings¶

# High threshold for very similar papers
scoutml similar --paper-id 2103.00020 --threshold 0.85

# Lower threshold for broader exploration
scoutml similar --paper-id 2103.00020 --threshold 0.65 --limit 30

Use Cases¶

Literature Review¶

Build comprehensive related work sections:

# Start with core paper
scoutml similar --paper-id 1706.03762 --limit 25 \
    --export transformer_related.json

# Process results
cat transformer_related.json | \
    jq -r '.[] | "\(.similarity)\t\(.arxiv_id)\t\(.title)"' | \
    sort -rn

Research Validation¶

Check if your idea already exists:

# Write your abstract
cat > my_idea.txt << EOF
We propose using diffusion models for video generation by treating 
temporal consistency as a denoising problem...
EOF

# Search for similar work
scoutml similar --abstract-file my_idea.txt --threshold 0.7

Finding Research Gaps¶

# Get similar papers
scoutml similar --paper-id 2010.11929 --limit 30 --output json > similar.json

# Analyze what's NOT covered
cat similar.json | jq '.[] | select(.similarity < 0.8) | .title'

Advanced Usage¶

Building Paper Networks¶

# Create network of related papers
explored=()
to_explore=("1810.04805")

while [ ${#to_explore[@]} -gt 0 ]; do
    current="${to_explore[0]}"
    to_explore=("${to_explore[@]:1}")

    if [[ ! " ${explored[@]} " =~ " ${current} " ]]; then
        explored+=("$current")

        # Get similar papers
        scoutml similar --paper-id "$current" --threshold 0.8 \
            --output json | \
            jq -r '.[].arxiv_id' >> network.txt
    fi
done

Tracking Research Evolution¶

# Find papers building on specific work
BASE_PAPER="1706.03762"  # Attention is All You Need

# Get direct extensions (high similarity)
scoutml similar --paper-id $BASE_PAPER --threshold 0.85 --limit 10

# Get inspired work (medium similarity)
scoutml similar --paper-id $BASE_PAPER --threshold 0.7 --limit 20

Competitive Analysis¶

# Your paper's abstract
MY_ABSTRACT="Our method combines..."

# Find competing approaches
scoutml similar --abstract "$MY_ABSTRACT" --threshold 0.75 \
    --output json | \
    jq '.[] | {paper: .arxiv_id, similarity: .similarity, title: .title}'

Output Formats¶

Table Format (Default)¶

scoutml similar --paper-id 1810.04805 --limit 5

Shows: - Similarity score - Paper title (linked) - Authors - Year - Brief description

JSON Format¶

scoutml similar --paper-id 1810.04805 --output json

Returns:

[
  {
    "arxiv_id": "1907.11692",
    "title": "RoBERTa: A Robustly Optimized BERT...",
    "similarity": 0.89,
    "authors": ["Yinhan Liu", ...],
    "year": 2019,
    "abstract": "Language model pretraining has..."
  }
]

Best Practices¶

For Literature Reviews¶

Start with high threshold (0.8+) for directly related work
Lower threshold (0.7) for broader context
Export results for systematic analysis
Check multiple papers to find all relevant work

For Idea Validation¶

Write detailed abstract with key technical terms
Use moderate threshold (0.7-0.75)
Check high-similarity matches carefully
Read abstracts of matches to verify

For Research Exploration¶

Start with seminal papers
Use lower thresholds (0.65-0.7)
Explore iteratively using found papers
Track similarity scores to understand relationships

Common Workflows¶

Pre-submission Check¶

# Before submitting your paper
echo "Your abstract here..." > my_abstract.txt

# Check for similar work
scoutml similar --abstract-file my_abstract.txt \
    --threshold 0.75 \
    --limit 20 \
    --export similar_work.json

# Review high-similarity papers
cat similar_work.json | jq '.[] | select(.similarity > 0.8)'

Building Reading Lists¶

# Core papers to explore
papers=("1706.03762" "1810.04805" "2010.11929")

# Find related papers for each
for paper in "${papers[@]}"; do
    echo "=== Papers similar to $paper ==="
    scoutml similar --paper-id "$paper" \
        --threshold 0.75 \
        --limit 5
done

Troubleshooting¶

No Similar Papers Found¶

If no results: 1. Lower the threshold (try 0.6) 2. Check if paper ID is correct 3. For abstracts, ensure sufficient detail 4. Try different keywords/phrasing

Too Many Results¶

To focus results: 1. Increase threshold (0.8+) 2. Reduce limit 3. Be more specific in abstract 4. Filter by year after export

paper - Get details on specific papers
compare - Compare multiple papers
search - Search with custom queries
review - Generate literature review