Best Practices¶

This guide covers best practices for using ScoutML effectively and efficiently.

Search Strategies¶

Start Broad, Then Narrow¶

Begin with general queries and progressively refine:

# Too specific initially
❌ scoutml search "BERT fine-tuning for biomedical NER with BiLSTM-CRF on PubMed abstracts"

# Better approach
✅ scoutml search "biomedical NER" --limit 50
✅ scoutml search "BERT biomedical" --limit 30  
✅ scoutml search "BERT NER PubMed" --limit 20

Use Technical Terminology¶

Use precise technical terms for better results:

# Vague terms
❌ scoutml search "AI for images"
❌ scoutml search "better neural networks"

# Specific technical terms
✅ scoutml search "convolutional neural networks"
✅ scoutml search "vision transformer architectures"

Combine Search Commands¶

Use different search commands for comprehensive results:

# Complete search strategy
TOPIC="attention mechanisms"

# 1. General search
scoutml search "$TOPIC" --limit 30

# 2. Method-specific search
scoutml method-search "self-attention" --sort-by citations

# 3. Dataset-based search
scoutml dataset-search "ImageNet" --include-benchmarks | grep -i attention

# 4. Find similar papers to best result
scoutml similar --paper-id 1706.03762 --threshold 0.75

Efficient Workflows¶

Batch Processing¶

Process multiple items efficiently:

# Inefficient: Sequential processing
❌ for paper in "${papers[@]}"; do
    scoutml paper "$paper"
    sleep 1
done

# Efficient: Parallel processing
✅ printf '%s\n' "${papers[@]}" | \
    xargs -P 4 -I {} scoutml paper {} --output json > results.json

Cache Intermediate Results¶

Save API calls by caching results:

# Create cache directory
CACHE_DIR="$HOME/.scoutml_cache"
mkdir -p "$CACHE_DIR"

# Function to cache results
cached_search() {
    local query="$1"
    local cache_file="$CACHE_DIR/$(echo "$query" | md5sum | cut -d' ' -f1).json"

    if [ -f "$cache_file" ] && [ $(find "$cache_file" -mtime -7 | wc -l) -gt 0 ]; then
        cat "$cache_file"
    else
        scoutml search "$query" --output json | tee "$cache_file"
    fi
}

Pipeline Design¶

Build efficient data pipelines:

# Good pipeline design
✅ scoutml search "transformer" --output json | \
    jq '.[] | select(.citations > 100)' | \
    jq -s 'sort_by(.year) | reverse' | \
    jq '.[0:10]' > top_transformers.json

# Avoid multiple API calls
❌ scoutml search "transformer" --limit 100
❌ # Then manually filter...
❌ # Then search again...

Data Management¶

Output Format Selection¶

Choose the right format for your use case:

Use Case	Recommended Format	Example
Visual inspection	`rich` or `table`	`--output rich`
Data processing	`json`	`--output json`
Spreadsheets	`csv`	`--output csv`
Documentation	`markdown`	`--output markdown`

Structured Data Storage¶

Organize outputs systematically:

# Good structure
PROJECT_DIR="research_project"
mkdir -p "$PROJECT_DIR"/{searches,papers,implementations,analysis}

# Save with descriptive names
scoutml search "federated learning" \
    --output json \
    --export "$PROJECT_DIR/searches/federated_learning_$(date +%Y%m%d).json"

Version Control¶

Track your research:

# Initialize git repository
cd "$PROJECT_DIR"
git init

# Create .gitignore
cat > .gitignore << EOF
*.log
.env
cache/
*.tmp
EOF

# Commit research artifacts
git add searches/*.json
git commit -m "Add federated learning search results"

API Key Management¶

Security Best Practices¶

# Never do this
❌ export SCOUTML_API_KEY="sk-abc123"  # In scripts
❌ scoutml configure --api-key sk-abc123  # In command history

# Use secure methods
✅ # Read from file
export SCOUTML_API_KEY=$(cat ~/.secrets/scoutml_key)

✅ # Use password manager
export SCOUTML_API_KEY=$(pass show scoutml/api_key)

✅ # Interactive prompt
read -s -p "Enter API key: " SCOUTML_API_KEY
export SCOUTML_API_KEY

Environment Management¶

# Development environment
cat > .env.development << EOF
SCOUTML_API_KEY=your-dev-key
SCOUTML_CACHE_DIR=/tmp/scoutml_cache
EOF

# Production environment  
cat > .env.production << EOF
SCOUTML_API_KEY=your-prod-key
SCOUTML_CACHE_DIR=/var/cache/scoutml
EOF

# Load environment
export $(cat .env.development | xargs)

Performance Optimization¶

Minimize API Calls¶

# Inefficient: Multiple small queries
❌ for year in {2020..2023}; do
    scoutml search "BERT" --year-min $year --year-max $year --limit 5
done

# Efficient: Single query with post-processing
✅ scoutml search "BERT" --year-min 2020 --limit 100 --output json | \
    jq 'group_by(.year) | map({year: .[0].year, papers: .[0:5]})'

Use Appropriate Limits¶

# For exploration
scoutml search "topic" --limit 20  # Default is usually enough

# For comprehensive analysis
scoutml search "topic" --limit 100 --export all_results.json

# For quick checks
scoutml search "topic" --limit 5

Filter Early¶

Apply filters at the API level:

# Inefficient: Get all, then filter
❌ scoutml search "transformer" --limit 200 --output json | \
    jq '.[] | select(.year >= 2022)'

# Efficient: Filter at source
✅ scoutml search "transformer" --year-min 2022 --limit 50

Research Quality¶

Validate Sources¶

Always verify important findings:

# Cross-reference papers
paper_id="2103.00020"

# Get multiple perspectives
scoutml paper "$paper_id"                          # Basic info
scoutml agent critique "$paper_id"                 # Critical analysis
scoutml similar --paper-id "$paper_id" --limit 5  # Related work
scoutml compare "$paper_id" "competing_paper_id"  # Direct comparison

Track Provenance¶

Document your search process:

# Create research log
cat > research_log.md << EOF
# Research Log: $(date)

## Search Parameters
- Query: "$SEARCH_QUERY"
- Filters: year >= 2022, citations >= 50
- Date: $(date)

## Key Findings
$(scoutml search "$SEARCH_QUERY" --year-min 2022 --min-citations 50 --limit 5)

## Next Steps
- Implement paper X
- Compare with approach Y
EOF

Reproducible Research¶

Make your research reproducible:

#!/bin/bash
# reproducible_analysis.sh

# Document environment
cat > environment.txt << EOF
ScoutML Version: $(scoutml --version)
Date: $(date)
Platform: $(uname -a)
EOF

# Fixed parameters
RANDOM_SEED=42
SEARCH_DATE="2024-01-01"

# Save exact commands
cat > commands.sh << 'EOF'
# Exact commands used
scoutml search "federated learning" \
    --year-min 2020 \
    --year-max 2023 \
    --min-citations 20 \
    --limit 50 \
    --output json > results.json
EOF

# Execute with logging
bash commands.sh 2>&1 | tee analysis.log

Error Handling¶

Robust Scripts¶

#!/bin/bash
set -euo pipefail  # Exit on error, undefined variables

# Error handling function
handle_error() {
    echo "Error on line $1" >&2
    exit 1
}
trap 'handle_error $LINENO' ERR

# Validate inputs
if [ -z "${1:-}" ]; then
    echo "Usage: $0 <search_term>" >&2
    exit 1
fi

# Check API key
if [ -z "${SCOUTML_API_KEY:-}" ]; then
    echo "Error: SCOUTML_API_KEY not set" >&2
    exit 1
fi

# Safe execution
if ! scoutml search "$1" --output json > results.json; then
    echo "Search failed" >&2
    exit 1
fi

Retry Logic¶

# Retry function
retry_command() {
    local max_attempts=3
    local delay=2
    local attempt=1

    while [ $attempt -le $max_attempts ]; do
        if "$@"; then
            return 0
        fi

        echo "Attempt $attempt failed. Retrying in $delay seconds..." >&2
        sleep $delay
        attempt=$((attempt + 1))
        delay=$((delay * 2))
    done

    echo "Command failed after $max_attempts attempts" >&2
    return 1
}

# Usage
retry_command scoutml paper "2103.00020" --output json

Collaboration¶

# Create shareable report
create_report() {
    local topic="$1"
    local output="report_${topic// /_}_$(date +%Y%m%d).md"

    cat > "$output" << EOF
# Research Report: $topic
Generated: $(date)
Author: $(whoami)

## Executive Summary
$(scoutml review "$topic" --year-min 2022 --limit 30 | head -20)

## Key Papers
$(scoutml search "$topic" --year-min 2022 --sota-only --limit 5)

## Reproducible Papers
$(scoutml insights reproducibility --domain "$topic" --limit 5)

## Next Steps
- [ ] Implement top paper
- [ ] Compare approaches
- [ ] Design experiments
EOF

    echo "Report created: $output"
}

Team Workflows¶

# Shared configuration
cat > team_config.sh << 'EOF'
# Team ScoutML Configuration
export SCOUTML_OUTPUT_DEFAULT="json"
export SCOUTML_CACHE_DIR="/shared/cache/scoutml"
export SCOUTML_LOG_LEVEL="INFO"

# Common functions
log_search() {
    echo "$(date),$(whoami),$*" >> /shared/logs/scoutml_searches.csv
}

# Wrapper function
team_search() {
    log_search "$@"
    scoutml search "$@"
}
EOF

Common Pitfalls to Avoid¶

1. Over-relying on Single Metrics¶

# Don't just look at citations
❌ scoutml search "topic" --min-citations 1000

# Consider multiple factors
✅ scoutml search "topic" --min-citations 50 --year-min 2022
✅ # Then check reproducibility, implementation quality, etc.

2. Ignoring Computational Constraints¶

# Check for computational limitations
✅ scoutml agent solve-limitations "paper_id" --focus computational

3. Not Validating Results¶

# Always verify important findings
✅ scoutml agent critique "paper_id"  # Check methodology
✅ scoutml compare "paper_id" "alternative_id"  # Compare approaches
✅ scoutml insights reproducibility  # Verify implementability

Conclusion¶

Following these best practices will help you:

🚀 Work more efficiently with ScoutML
📊 Produce higher quality research
🔄 Create reproducible workflows
👥 Collaborate effectively with teams
🛡️ Avoid common pitfalls

Remember: ScoutML is a powerful tool, but it's most effective when used thoughtfully as part of a comprehensive research workflow.