Output Formats¶

ScoutML supports multiple output formats to suit different use cases. This guide explains each format and when to use them.

Available Formats¶

Format	Use Case	Supported Commands
`table`	Terminal viewing	Search commands
`rich`	Enhanced terminal display	Most commands
`json`	Data processing, automation	All commands
`csv`	Spreadsheets, data analysis	Search, insights
`markdown`	Documentation, reports	Compare, review

Format Details¶

Table Format¶

The default format for search commands, optimized for terminal viewing.

scoutml search "transformer" --output table

Features: - Clean ASCII tables - Truncated text for readability - Clickable links (in supported terminals) - Color-coded information

Example Output:

┏━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━┳━━━━━━┳━━━━━━━━━━┳━━━━━━┓
┃ ArXiv ID  ┃ Title                 ┃ Authors  ┃ Year ┃ Citations┃ Score┃
┡━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━╇━━━━━━╇━━━━━━━━━━╇━━━━━━┩
│ 1706.03762│ Attention Is All You  │ Vaswani  │ 2017 │ 50000    │ 98.5 │
│           │ Need                  │ et al.   │      │          │      │
└───────────┴───────────────────────┴──────────┴──────┴──────────┴──────┘

Rich Format¶

Enhanced terminal output with panels, colors, and formatting.

scoutml paper 1810.04805 --output rich

Features: - Syntax highlighting - Collapsible sections - Progress indicators - Formatted text (bold, italic) - Panels and boxes

Best for: - Interactive exploration - Detailed paper analysis - Agent command outputs - Review generation

JSON Format¶

Structured data format for programmatic use.

scoutml search "BERT" --output json

Features: - Complete data preservation - Machine-readable - Supports all fields - Enables complex processing

Example Output:

[
  {
    "arxiv_id": "1810.04805",
    "title": "BERT: Pre-training of Deep Bidirectional Transformers",
    "authors": ["Jacob Devlin", "Ming-Wei Chang", "Kenton Lee"],
    "year": 2018,
    "citations": 50000,
    "score": 95.2,
    "abstract": "We introduce a new language representation model...",
    "categories": ["cs.CL"],
    "url": "https://arxiv.org/abs/1810.04805"
  }
]

CSV Format¶

Comma-separated values for spreadsheet applications.

scoutml search "federated learning" --output csv --export results.csv

Features: - Excel/Google Sheets compatible - Statistical analysis ready - Easy filtering and sorting - Compact representation

Example Output:

arxiv_id,title,authors,year,citations,score
1810.04805,"BERT: Pre-training of Deep...","Devlin et al.",2018,50000,95.2
2103.00020,"Learning Transferable Visual...","Radford et al.",2021,5000,92.1

Markdown Format¶

Formatted text for documentation and reports.

scoutml compare 1810.04805 2005.14165 --output markdown

Features: - GitHub-compatible - Preserves formatting - Includes links - Ready for documentation

Example Output:

# Comparison: BERT vs GPT-2

## Overview

| Aspect | BERT (1810.04805) | GPT-2 (2005.14165) |
|--------|-------------------|---------------------|
| Year | 2018 | 2019 |
| Citations | 50000 | 25000 |
| Architecture | Bidirectional | Unidirectional |

## Key Differences
...

Processing Output¶

JSON Processing with jq¶

# Extract specific fields
scoutml search "transformer" --output json | \
  jq '.[] | {title: .title, citations: .citations}'

# Filter results
scoutml search "bert" --output json | \
  jq '.[] | select(.citations > 1000)'

# Sort by custom criteria
scoutml search "nlp" --output json | \
  jq 'sort_by(.citations / (.year - 2000))'

# Aggregate statistics
scoutml search "vision" --output json | \
  jq '{
    total: length,
    avg_citations: (map(.citations) | add / length),
    years: (map(.year) | unique)
  }'

CSV Processing¶

# With standard tools
scoutml search "ml" --output csv | \
  awk -F',' 'NR>1 {sum+=$5; count++} END {print "Avg citations:", sum/count}'

# Import to pandas
python3 << EOF
import pandas as pd
df = pd.read_csv('results.csv')
print(df.groupby('year')['citations'].mean())
EOF

# Quick analysis with csvkit
scoutml search "ai" --output csv | csvstat

Markdown Processing¶

# Convert to HTML
scoutml review "topic" --output markdown | pandoc -f markdown -t html > review.html

# Convert to PDF
scoutml compare 1 2 3 --output markdown | \
  pandoc -f markdown -t pdf -o comparison.pdf

# Extract sections
scoutml review "topic" --output markdown | \
  awk '/^## Key Papers/,/^##/ {print}'

Format Selection Guide¶

When to Use Table/Rich¶

✅ Use for: - Interactive terminal sessions - Quick visual inspection - Demonstrations - Initial exploration

❌ Avoid for: - Automated processing - Large result sets - Piping to other commands

When to Use JSON¶

✅ Use for: - Automation scripts - Data pipelines - Complex filtering - API integration - Custom analysis

❌ Avoid for: - Human reading - Quick checks

When to Use CSV¶

✅ Use for: - Spreadsheet analysis - Statistical software - Data visualization - Simple databases

❌ Avoid for: - Nested data - Long text fields - Complex relationships

When to Use Markdown¶

✅ Use for: - Documentation - Reports - Blog posts - Team sharing - Version control

❌ Avoid for: - Data processing - Automated workflows

Advanced Format Usage¶

Custom Formatting¶

# Create custom table from JSON
scoutml search "bert" --output json | \
  jq -r '.[] | [.arxiv_id, .title[0:50], .citations] | @tsv' | \
  column -t -s $'\t'

# Generate HTML report
cat > template.html << 'EOF'
<!DOCTYPE html>
<html>
<head><title>Research Report</title></head>
<body>
<h1>Papers</h1>
<ul>
{{PAPERS}}
</ul>
</body>
</html>
EOF

papers=$(scoutml search "ai" --output json | \
  jq -r '.[] | "<li><a href=\"https://arxiv.org/abs/\(.arxiv_id)\">\(.title)</a> - \(.citations) citations</li>"')

sed "s|{{PAPERS}}|$papers|" template.html > report.html

Format Conversion¶

# JSON to CSV
scoutml search "ml" --output json | \
  jq -r '["arxiv_id","title","year","citations"], 
         (.[] | [.arxiv_id, .title, .year, .citations]) | @csv' > data.csv

# CSV to Markdown table
csv2md() {
    local file=$1
    head -1 "$file" | sed 's/,/|/g' | sed 's/^/|/; s/$/|/'
    head -1 "$file" | sed 's/[^,]/-/g; s/,/|/g' | sed 's/^/|/; s/$/|/'
    tail -n +2 "$file" | sed 's/,/|/g' | sed 's/^/|/; s/$/|/'
}

# Rich output to plain text
scoutml paper 1810.04805 --output rich | \
  sed 's/\x1b\[[0-9;]*m//g' > plain.txt

Streaming Processing¶

# Process large result sets
scoutml search "deep learning" --limit 1000 --output json | \
  jq -c '.[]' | \
  while read -r paper; do
    # Process each paper
    echo "$paper" | jq -r '.arxiv_id'
  done

# Real-time filtering
scoutml search "ai" --output json --limit 100 | \
  jq --stream 'select(.[0][1] == "citations" and .[1] > 100)'

Export Options¶

Using --export Flag¶

# Export to specific file
scoutml search "bert" --output json --export results.json
scoutml review "transformers" --output markdown --export review.md

# Export with timestamp
scoutml search "ai" --output csv \
  --export "results_$(date +%Y%m%d_%H%M%S).csv"

Piping vs Export¶

# Piping (for immediate processing)
scoutml search "ml" --output json | jq '.[] | .title'

# Export (for storage)
scoutml search "ml" --output json --export data.json
jq '.[] | .title' data.json  # Process later

Format-Specific Tips¶

JSON Tips¶

Use jq for processing - It's powerful and fast
Validate JSON - jq empty file.json
Pretty print - jq '.' file.json
Compact output - jq -c '.'

CSV Tips¶

Quote handling - Use proper CSV parsers
Encoding - Ensure UTF-8 for international characters
Large files - Use streaming tools like csvkit
Headers - First row contains column names

Markdown Tips¶

Pandoc conversion - Convert to any format
GitHub rendering - Test in GitHub preview
Table limits - Some renderers have column limits
Link format - Use full URLs for compatibility

Performance Considerations¶

Format Performance¶

Format	Speed	Memory	File Size
JSON	Fast	Medium	Large
CSV	Fastest	Low	Small
Table	Slow	High	N/A
Rich	Slowest	Highest	N/A
Markdown	Medium	Medium	Medium

Optimization Tips¶

# For large datasets, use streaming
scoutml search "ai" --limit 10000 --output json | \
  jq -c '.[]' > large_dataset.jsonl  # JSON Lines format

# Process in chunks
split -l 1000 large_dataset.jsonl chunk_
for chunk in chunk_*; do
    process_chunk "$chunk" &
done
wait

Conclusion¶

Choosing the right output format is crucial for efficient workflows:

Use rich/table for human consumption
Use JSON for data processing
Use CSV for spreadsheet analysis
Use markdown for documentation

Always consider your end goal when selecting a format, and don't hesitate to convert between formats as needed.