Skip to content

Output Formats

ScoutML supports multiple output formats to suit different use cases. This guide explains each format and when to use them.

Available Formats

Format Use Case Supported Commands
table Terminal viewing Search commands
rich Enhanced terminal display Most commands
json Data processing, automation All commands
csv Spreadsheets, data analysis Search, insights
markdown Documentation, reports Compare, review

Format Details

Table Format

The default format for search commands, optimized for terminal viewing.

scoutml search "transformer" --output table

Features: - Clean ASCII tables - Truncated text for readability - Clickable links (in supported terminals) - Color-coded information

Example Output:

┏━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━┳━━━━━━┳━━━━━━━━━━┳━━━━━━┓
┃ ArXiv ID  ┃ Title                 ┃ Authors  ┃ Year ┃ Citations┃ Score┃
┡━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━╇━━━━━━╇━━━━━━━━━━╇━━━━━━┩
│ 1706.03762│ Attention Is All You  │ Vaswani  │ 2017 │ 50000    │ 98.5 │
│           │ Need                  │ et al.   │      │          │      │
└───────────┴───────────────────────┴──────────┴──────┴──────────┴──────┘

Rich Format

Enhanced terminal output with panels, colors, and formatting.

scoutml paper 1810.04805 --output rich

Features: - Syntax highlighting - Collapsible sections - Progress indicators - Formatted text (bold, italic) - Panels and boxes

Best for: - Interactive exploration - Detailed paper analysis - Agent command outputs - Review generation

JSON Format

Structured data format for programmatic use.

scoutml search "BERT" --output json

Features: - Complete data preservation - Machine-readable - Supports all fields - Enables complex processing

Example Output:

[
  {
    "arxiv_id": "1810.04805",
    "title": "BERT: Pre-training of Deep Bidirectional Transformers",
    "authors": ["Jacob Devlin", "Ming-Wei Chang", "Kenton Lee"],
    "year": 2018,
    "citations": 50000,
    "score": 95.2,
    "abstract": "We introduce a new language representation model...",
    "categories": ["cs.CL"],
    "url": "https://arxiv.org/abs/1810.04805"
  }
]

CSV Format

Comma-separated values for spreadsheet applications.

scoutml search "federated learning" --output csv --export results.csv

Features: - Excel/Google Sheets compatible - Statistical analysis ready - Easy filtering and sorting - Compact representation

Example Output:

arxiv_id,title,authors,year,citations,score
1810.04805,"BERT: Pre-training of Deep...","Devlin et al.",2018,50000,95.2
2103.00020,"Learning Transferable Visual...","Radford et al.",2021,5000,92.1

Markdown Format

Formatted text for documentation and reports.

scoutml compare 1810.04805 2005.14165 --output markdown

Features: - GitHub-compatible - Preserves formatting - Includes links - Ready for documentation

Example Output:

# Comparison: BERT vs GPT-2

## Overview

| Aspect | BERT (1810.04805) | GPT-2 (2005.14165) |
|--------|-------------------|---------------------|
| Year | 2018 | 2019 |
| Citations | 50000 | 25000 |
| Architecture | Bidirectional | Unidirectional |

## Key Differences
...

Processing Output

JSON Processing with jq

# Extract specific fields
scoutml search "transformer" --output json | \
  jq '.[] | {title: .title, citations: .citations}'

# Filter results
scoutml search "bert" --output json | \
  jq '.[] | select(.citations > 1000)'

# Sort by custom criteria
scoutml search "nlp" --output json | \
  jq 'sort_by(.citations / (.year - 2000))'

# Aggregate statistics
scoutml search "vision" --output json | \
  jq '{
    total: length,
    avg_citations: (map(.citations) | add / length),
    years: (map(.year) | unique)
  }'

CSV Processing

# With standard tools
scoutml search "ml" --output csv | \
  awk -F',' 'NR>1 {sum+=$5; count++} END {print "Avg citations:", sum/count}'

# Import to pandas
python3 << EOF
import pandas as pd
df = pd.read_csv('results.csv')
print(df.groupby('year')['citations'].mean())
EOF

# Quick analysis with csvkit
scoutml search "ai" --output csv | csvstat

Markdown Processing

# Convert to HTML
scoutml review "topic" --output markdown | pandoc -f markdown -t html > review.html

# Convert to PDF
scoutml compare 1 2 3 --output markdown | \
  pandoc -f markdown -t pdf -o comparison.pdf

# Extract sections
scoutml review "topic" --output markdown | \
  awk '/^## Key Papers/,/^##/ {print}'

Format Selection Guide

When to Use Table/Rich

Use for: - Interactive terminal sessions - Quick visual inspection - Demonstrations - Initial exploration

Avoid for: - Automated processing - Large result sets - Piping to other commands

When to Use JSON

Use for: - Automation scripts - Data pipelines - Complex filtering - API integration - Custom analysis

Avoid for: - Human reading - Quick checks

When to Use CSV

Use for: - Spreadsheet analysis - Statistical software - Data visualization - Simple databases

Avoid for: - Nested data - Long text fields - Complex relationships

When to Use Markdown

Use for: - Documentation - Reports - Blog posts - Team sharing - Version control

Avoid for: - Data processing - Automated workflows

Advanced Format Usage

Custom Formatting

# Create custom table from JSON
scoutml search "bert" --output json | \
  jq -r '.[] | [.arxiv_id, .title[0:50], .citations] | @tsv' | \
  column -t -s $'\t'

# Generate HTML report
cat > template.html << 'EOF'
<!DOCTYPE html>
<html>
<head><title>Research Report</title></head>
<body>
<h1>Papers</h1>
<ul>
{{PAPERS}}
</ul>
</body>
</html>
EOF

papers=$(scoutml search "ai" --output json | \
  jq -r '.[] | "<li><a href=\"https://arxiv.org/abs/\(.arxiv_id)\">\(.title)</a> - \(.citations) citations</li>"')

sed "s|{{PAPERS}}|$papers|" template.html > report.html

Format Conversion

# JSON to CSV
scoutml search "ml" --output json | \
  jq -r '["arxiv_id","title","year","citations"], 
         (.[] | [.arxiv_id, .title, .year, .citations]) | @csv' > data.csv

# CSV to Markdown table
csv2md() {
    local file=$1
    head -1 "$file" | sed 's/,/|/g' | sed 's/^/|/; s/$/|/'
    head -1 "$file" | sed 's/[^,]/-/g; s/,/|/g' | sed 's/^/|/; s/$/|/'
    tail -n +2 "$file" | sed 's/,/|/g' | sed 's/^/|/; s/$/|/'
}

# Rich output to plain text
scoutml paper 1810.04805 --output rich | \
  sed 's/\x1b\[[0-9;]*m//g' > plain.txt

Streaming Processing

# Process large result sets
scoutml search "deep learning" --limit 1000 --output json | \
  jq -c '.[]' | \
  while read -r paper; do
    # Process each paper
    echo "$paper" | jq -r '.arxiv_id'
  done

# Real-time filtering
scoutml search "ai" --output json --limit 100 | \
  jq --stream 'select(.[0][1] == "citations" and .[1] > 100)'

Export Options

Using --export Flag

# Export to specific file
scoutml search "bert" --output json --export results.json
scoutml review "transformers" --output markdown --export review.md

# Export with timestamp
scoutml search "ai" --output csv \
  --export "results_$(date +%Y%m%d_%H%M%S).csv"

Piping vs Export

# Piping (for immediate processing)
scoutml search "ml" --output json | jq '.[] | .title'

# Export (for storage)
scoutml search "ml" --output json --export data.json
jq '.[] | .title' data.json  # Process later

Format-Specific Tips

JSON Tips

  1. Use jq for processing - It's powerful and fast
  2. Validate JSON - jq empty file.json
  3. Pretty print - jq '.' file.json
  4. Compact output - jq -c '.'

CSV Tips

  1. Quote handling - Use proper CSV parsers
  2. Encoding - Ensure UTF-8 for international characters
  3. Large files - Use streaming tools like csvkit
  4. Headers - First row contains column names

Markdown Tips

  1. Pandoc conversion - Convert to any format
  2. GitHub rendering - Test in GitHub preview
  3. Table limits - Some renderers have column limits
  4. Link format - Use full URLs for compatibility

Performance Considerations

Format Performance

Format Speed Memory File Size
JSON Fast Medium Large
CSV Fastest Low Small
Table Slow High N/A
Rich Slowest Highest N/A
Markdown Medium Medium Medium

Optimization Tips

# For large datasets, use streaming
scoutml search "ai" --limit 10000 --output json | \
  jq -c '.[]' > large_dataset.jsonl  # JSON Lines format

# Process in chunks
split -l 1000 large_dataset.jsonl chunk_
for chunk in chunk_*; do
    process_chunk "$chunk" &
done
wait

Conclusion

Choosing the right output format is crucial for efficient workflows:

  • Use rich/table for human consumption
  • Use JSON for data processing
  • Use CSV for spreadsheet analysis
  • Use markdown for documentation

Always consider your end goal when selecting a format, and don't hesitate to convert between formats as needed.