Skip to content

Similar Command

The similar command finds papers similar to a given paper or abstract text, using semantic similarity matching.

Basic Usage

scoutml similar [OPTIONS]

Examples

Find Similar Papers by ID

# Basic similarity search
scoutml similar --paper-id 1810.04805

# More results
scoutml similar --paper-id 1810.04805 --limit 20

# Higher similarity threshold
scoutml similar --paper-id 1810.04805 --threshold 0.8

Find Similar Papers by Abstract

# Using abstract text
scoutml similar --abstract "We propose a new method for self-supervised learning that..."

# From file
scoutml similar --abstract-file my_abstract.txt

Options

Option Type Default Description
--paper-id TEXT None ArXiv ID of source paper
--abstract TEXT None Abstract text to match
--abstract-file PATH None File containing abstract
--limit INTEGER 10 Number of results
--threshold FLOAT 0.7 Similarity threshold (0-1)
--output CHOICE table Output format: table/json
--export PATH None Export results to file

Input Methods

Using Paper ID

Most common approach:

# Find papers similar to BERT
scoutml similar --paper-id 1810.04805 --limit 15

Using Abstract Text

For unpublished work or ideas:

# Direct abstract
scoutml similar --abstract "This paper introduces a novel architecture for 
efficient vision transformers that reduces computational complexity while 
maintaining accuracy through adaptive token selection..."

Using Abstract File

For longer abstracts:

# Save abstract to file
cat > abstract.txt << EOF
We present a new approach to few-shot learning that combines 
meta-learning with self-supervised pre-training. Our method...
EOF

# Search for similar papers
scoutml similar --abstract-file abstract.txt --limit 20

Understanding Similarity

Similarity Scores

  • 0.9-1.0: Nearly identical research
  • 0.8-0.9: Very similar approach/problem
  • 0.7-0.8: Related work in same domain
  • 0.6-0.7: Loosely related
  • Below 0.6: Different domain/approach

Threshold Settings

# High threshold for very similar papers
scoutml similar --paper-id 2103.00020 --threshold 0.85

# Lower threshold for broader exploration
scoutml similar --paper-id 2103.00020 --threshold 0.65 --limit 30

Use Cases

Literature Review

Build comprehensive related work sections:

# Start with core paper
scoutml similar --paper-id 1706.03762 --limit 25 \
    --export transformer_related.json

# Process results
cat transformer_related.json | \
    jq -r '.[] | "\(.similarity)\t\(.arxiv_id)\t\(.title)"' | \
    sort -rn

Research Validation

Check if your idea already exists:

# Write your abstract
cat > my_idea.txt << EOF
We propose using diffusion models for video generation by treating 
temporal consistency as a denoising problem...
EOF

# Search for similar work
scoutml similar --abstract-file my_idea.txt --threshold 0.7

Finding Research Gaps

# Get similar papers
scoutml similar --paper-id 2010.11929 --limit 30 --output json > similar.json

# Analyze what's NOT covered
cat similar.json | jq '.[] | select(.similarity < 0.8) | .title'

Advanced Usage

Building Paper Networks

# Create network of related papers
explored=()
to_explore=("1810.04805")

while [ ${#to_explore[@]} -gt 0 ]; do
    current="${to_explore[0]}"
    to_explore=("${to_explore[@]:1}")

    if [[ ! " ${explored[@]} " =~ " ${current} " ]]; then
        explored+=("$current")

        # Get similar papers
        scoutml similar --paper-id "$current" --threshold 0.8 \
            --output json | \
            jq -r '.[].arxiv_id' >> network.txt
    fi
done

Tracking Research Evolution

# Find papers building on specific work
BASE_PAPER="1706.03762"  # Attention is All You Need

# Get direct extensions (high similarity)
scoutml similar --paper-id $BASE_PAPER --threshold 0.85 --limit 10

# Get inspired work (medium similarity)
scoutml similar --paper-id $BASE_PAPER --threshold 0.7 --limit 20

Competitive Analysis

# Your paper's abstract
MY_ABSTRACT="Our method combines..."

# Find competing approaches
scoutml similar --abstract "$MY_ABSTRACT" --threshold 0.75 \
    --output json | \
    jq '.[] | {paper: .arxiv_id, similarity: .similarity, title: .title}'

Output Formats

Table Format (Default)

scoutml similar --paper-id 1810.04805 --limit 5

Shows: - Similarity score - Paper title (linked) - Authors - Year - Brief description

JSON Format

scoutml similar --paper-id 1810.04805 --output json

Returns:

[
  {
    "arxiv_id": "1907.11692",
    "title": "RoBERTa: A Robustly Optimized BERT...",
    "similarity": 0.89,
    "authors": ["Yinhan Liu", ...],
    "year": 2019,
    "abstract": "Language model pretraining has..."
  }
]

Best Practices

For Literature Reviews

  1. Start with high threshold (0.8+) for directly related work
  2. Lower threshold (0.7) for broader context
  3. Export results for systematic analysis
  4. Check multiple papers to find all relevant work

For Idea Validation

  1. Write detailed abstract with key technical terms
  2. Use moderate threshold (0.7-0.75)
  3. Check high-similarity matches carefully
  4. Read abstracts of matches to verify

For Research Exploration

  1. Start with seminal papers
  2. Use lower thresholds (0.65-0.7)
  3. Explore iteratively using found papers
  4. Track similarity scores to understand relationships

Common Workflows

Pre-submission Check

# Before submitting your paper
echo "Your abstract here..." > my_abstract.txt

# Check for similar work
scoutml similar --abstract-file my_abstract.txt \
    --threshold 0.75 \
    --limit 20 \
    --export similar_work.json

# Review high-similarity papers
cat similar_work.json | jq '.[] | select(.similarity > 0.8)'

Building Reading Lists

# Core papers to explore
papers=("1706.03762" "1810.04805" "2010.11929")

# Find related papers for each
for paper in "${papers[@]}"; do
    echo "=== Papers similar to $paper ==="
    scoutml similar --paper-id "$paper" \
        --threshold 0.75 \
        --limit 5
done

Troubleshooting

No Similar Papers Found

If no results: 1. Lower the threshold (try 0.6) 2. Check if paper ID is correct 3. For abstracts, ensure sufficient detail 4. Try different keywords/phrasing

Too Many Results

To focus results: 1. Increase threshold (0.8+) 2. Reduce limit 3. Be more specific in abstract 4. Filter by year after export

  • paper - Get details on specific papers
  • compare - Compare multiple papers
  • search - Search with custom queries
  • review - Generate literature review