Best Practices¶
This guide covers best practices for using ScoutML effectively and efficiently.
Search Strategies¶
Start Broad, Then Narrow¶
Begin with general queries and progressively refine:
# Too specific initially
❌ scoutml search "BERT fine-tuning for biomedical NER with BiLSTM-CRF on PubMed abstracts"
# Better approach
✅ scoutml search "biomedical NER" --limit 50
✅ scoutml search "BERT biomedical" --limit 30
✅ scoutml search "BERT NER PubMed" --limit 20
Use Technical Terminology¶
Use precise technical terms for better results:
# Vague terms
❌ scoutml search "AI for images"
❌ scoutml search "better neural networks"
# Specific technical terms
✅ scoutml search "convolutional neural networks"
✅ scoutml search "vision transformer architectures"
Combine Search Commands¶
Use different search commands for comprehensive results:
# Complete search strategy
TOPIC="attention mechanisms"
# 1. General search
scoutml search "$TOPIC" --limit 30
# 2. Method-specific search
scoutml method-search "self-attention" --sort-by citations
# 3. Dataset-based search
scoutml dataset-search "ImageNet" --include-benchmarks | grep -i attention
# 4. Find similar papers to best result
scoutml similar --paper-id 1706.03762 --threshold 0.75
Efficient Workflows¶
Batch Processing¶
Process multiple items efficiently:
# Inefficient: Sequential processing
❌ for paper in "${papers[@]}"; do
scoutml paper "$paper"
sleep 1
done
# Efficient: Parallel processing
✅ printf '%s\n' "${papers[@]}" | \
xargs -P 4 -I {} scoutml paper {} --output json > results.json
Cache Intermediate Results¶
Save API calls by caching results:
# Create cache directory
CACHE_DIR="$HOME/.scoutml_cache"
mkdir -p "$CACHE_DIR"
# Function to cache results
cached_search() {
local query="$1"
local cache_file="$CACHE_DIR/$(echo "$query" | md5sum | cut -d' ' -f1).json"
if [ -f "$cache_file" ] && [ $(find "$cache_file" -mtime -7 | wc -l) -gt 0 ]; then
cat "$cache_file"
else
scoutml search "$query" --output json | tee "$cache_file"
fi
}
Pipeline Design¶
Build efficient data pipelines:
# Good pipeline design
✅ scoutml search "transformer" --output json | \
jq '.[] | select(.citations > 100)' | \
jq -s 'sort_by(.year) | reverse' | \
jq '.[0:10]' > top_transformers.json
# Avoid multiple API calls
❌ scoutml search "transformer" --limit 100
❌ # Then manually filter...
❌ # Then search again...
Data Management¶
Output Format Selection¶
Choose the right format for your use case:
Use Case | Recommended Format | Example |
---|---|---|
Visual inspection | rich or table |
--output rich |
Data processing | json |
--output json |
Spreadsheets | csv |
--output csv |
Documentation | markdown |
--output markdown |
Structured Data Storage¶
Organize outputs systematically:
# Good structure
PROJECT_DIR="research_project"
mkdir -p "$PROJECT_DIR"/{searches,papers,implementations,analysis}
# Save with descriptive names
scoutml search "federated learning" \
--output json \
--export "$PROJECT_DIR/searches/federated_learning_$(date +%Y%m%d).json"
Version Control¶
Track your research:
# Initialize git repository
cd "$PROJECT_DIR"
git init
# Create .gitignore
cat > .gitignore << EOF
*.log
.env
cache/
*.tmp
EOF
# Commit research artifacts
git add searches/*.json
git commit -m "Add federated learning search results"
API Key Management¶
Security Best Practices¶
# Never do this
❌ export SCOUTML_API_KEY="sk-abc123" # In scripts
❌ scoutml configure --api-key sk-abc123 # In command history
# Use secure methods
✅ # Read from file
export SCOUTML_API_KEY=$(cat ~/.secrets/scoutml_key)
✅ # Use password manager
export SCOUTML_API_KEY=$(pass show scoutml/api_key)
✅ # Interactive prompt
read -s -p "Enter API key: " SCOUTML_API_KEY
export SCOUTML_API_KEY
Environment Management¶
# Development environment
cat > .env.development << EOF
SCOUTML_API_KEY=your-dev-key
SCOUTML_CACHE_DIR=/tmp/scoutml_cache
EOF
# Production environment
cat > .env.production << EOF
SCOUTML_API_KEY=your-prod-key
SCOUTML_CACHE_DIR=/var/cache/scoutml
EOF
# Load environment
export $(cat .env.development | xargs)
Performance Optimization¶
Minimize API Calls¶
# Inefficient: Multiple small queries
❌ for year in {2020..2023}; do
scoutml search "BERT" --year-min $year --year-max $year --limit 5
done
# Efficient: Single query with post-processing
✅ scoutml search "BERT" --year-min 2020 --limit 100 --output json | \
jq 'group_by(.year) | map({year: .[0].year, papers: .[0:5]})'
Use Appropriate Limits¶
# For exploration
scoutml search "topic" --limit 20 # Default is usually enough
# For comprehensive analysis
scoutml search "topic" --limit 100 --export all_results.json
# For quick checks
scoutml search "topic" --limit 5
Filter Early¶
Apply filters at the API level:
# Inefficient: Get all, then filter
❌ scoutml search "transformer" --limit 200 --output json | \
jq '.[] | select(.year >= 2022)'
# Efficient: Filter at source
✅ scoutml search "transformer" --year-min 2022 --limit 50
Research Quality¶
Validate Sources¶
Always verify important findings:
# Cross-reference papers
paper_id="2103.00020"
# Get multiple perspectives
scoutml paper "$paper_id" # Basic info
scoutml agent critique "$paper_id" # Critical analysis
scoutml similar --paper-id "$paper_id" --limit 5 # Related work
scoutml compare "$paper_id" "competing_paper_id" # Direct comparison
Track Provenance¶
Document your search process:
# Create research log
cat > research_log.md << EOF
# Research Log: $(date)
## Search Parameters
- Query: "$SEARCH_QUERY"
- Filters: year >= 2022, citations >= 50
- Date: $(date)
## Key Findings
$(scoutml search "$SEARCH_QUERY" --year-min 2022 --min-citations 50 --limit 5)
## Next Steps
- Implement paper X
- Compare with approach Y
EOF
Reproducible Research¶
Make your research reproducible:
#!/bin/bash
# reproducible_analysis.sh
# Document environment
cat > environment.txt << EOF
ScoutML Version: $(scoutml --version)
Date: $(date)
Platform: $(uname -a)
EOF
# Fixed parameters
RANDOM_SEED=42
SEARCH_DATE="2024-01-01"
# Save exact commands
cat > commands.sh << 'EOF'
# Exact commands used
scoutml search "federated learning" \
--year-min 2020 \
--year-max 2023 \
--min-citations 20 \
--limit 50 \
--output json > results.json
EOF
# Execute with logging
bash commands.sh 2>&1 | tee analysis.log
Error Handling¶
Robust Scripts¶
#!/bin/bash
set -euo pipefail # Exit on error, undefined variables
# Error handling function
handle_error() {
echo "Error on line $1" >&2
exit 1
}
trap 'handle_error $LINENO' ERR
# Validate inputs
if [ -z "${1:-}" ]; then
echo "Usage: $0 <search_term>" >&2
exit 1
fi
# Check API key
if [ -z "${SCOUTML_API_KEY:-}" ]; then
echo "Error: SCOUTML_API_KEY not set" >&2
exit 1
fi
# Safe execution
if ! scoutml search "$1" --output json > results.json; then
echo "Search failed" >&2
exit 1
fi
Retry Logic¶
# Retry function
retry_command() {
local max_attempts=3
local delay=2
local attempt=1
while [ $attempt -le $max_attempts ]; do
if "$@"; then
return 0
fi
echo "Attempt $attempt failed. Retrying in $delay seconds..." >&2
sleep $delay
attempt=$((attempt + 1))
delay=$((delay * 2))
done
echo "Command failed after $max_attempts attempts" >&2
return 1
}
# Usage
retry_command scoutml paper "2103.00020" --output json
Collaboration¶
Sharing Results¶
# Create shareable report
create_report() {
local topic="$1"
local output="report_${topic// /_}_$(date +%Y%m%d).md"
cat > "$output" << EOF
# Research Report: $topic
Generated: $(date)
Author: $(whoami)
## Executive Summary
$(scoutml review "$topic" --year-min 2022 --limit 30 | head -20)
## Key Papers
$(scoutml search "$topic" --year-min 2022 --sota-only --limit 5)
## Reproducible Papers
$(scoutml insights reproducibility --domain "$topic" --limit 5)
## Next Steps
- [ ] Implement top paper
- [ ] Compare approaches
- [ ] Design experiments
EOF
echo "Report created: $output"
}
Team Workflows¶
# Shared configuration
cat > team_config.sh << 'EOF'
# Team ScoutML Configuration
export SCOUTML_OUTPUT_DEFAULT="json"
export SCOUTML_CACHE_DIR="/shared/cache/scoutml"
export SCOUTML_LOG_LEVEL="INFO"
# Common functions
log_search() {
echo "$(date),$(whoami),$*" >> /shared/logs/scoutml_searches.csv
}
# Wrapper function
team_search() {
log_search "$@"
scoutml search "$@"
}
EOF
Common Pitfalls to Avoid¶
1. Over-relying on Single Metrics¶
# Don't just look at citations
❌ scoutml search "topic" --min-citations 1000
# Consider multiple factors
✅ scoutml search "topic" --min-citations 50 --year-min 2022
✅ # Then check reproducibility, implementation quality, etc.
2. Ignoring Computational Constraints¶
# Check for computational limitations
✅ scoutml agent solve-limitations "paper_id" --focus computational
3. Not Validating Results¶
# Always verify important findings
✅ scoutml agent critique "paper_id" # Check methodology
✅ scoutml compare "paper_id" "alternative_id" # Compare approaches
✅ scoutml insights reproducibility # Verify implementability
Conclusion¶
Following these best practices will help you:
- 🚀 Work more efficiently with ScoutML
- 📊 Produce higher quality research
- 🔄 Create reproducible workflows
- 👥 Collaborate effectively with teams
- 🛡️ Avoid common pitfalls
Remember: ScoutML is a powerful tool, but it's most effective when used thoughtfully as part of a comprehensive research workflow.