Output Formats¶
ScoutML supports multiple output formats to suit different use cases. This guide explains each format and when to use them.
Available Formats¶
Format | Use Case | Supported Commands |
---|---|---|
table |
Terminal viewing | Search commands |
rich |
Enhanced terminal display | Most commands |
json |
Data processing, automation | All commands |
csv |
Spreadsheets, data analysis | Search, insights |
markdown |
Documentation, reports | Compare, review |
Format Details¶
Table Format¶
The default format for search commands, optimized for terminal viewing.
Features: - Clean ASCII tables - Truncated text for readability - Clickable links (in supported terminals) - Color-coded information
Example Output:
┏━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━┳━━━━━━┳━━━━━━━━━━┳━━━━━━┓
┃ ArXiv ID ┃ Title ┃ Authors ┃ Year ┃ Citations┃ Score┃
┡━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━╇━━━━━━╇━━━━━━━━━━╇━━━━━━┩
│ 1706.03762│ Attention Is All You │ Vaswani │ 2017 │ 50000 │ 98.5 │
│ │ Need │ et al. │ │ │ │
└───────────┴───────────────────────┴──────────┴──────┴──────────┴──────┘
Rich Format¶
Enhanced terminal output with panels, colors, and formatting.
Features: - Syntax highlighting - Collapsible sections - Progress indicators - Formatted text (bold, italic) - Panels and boxes
Best for: - Interactive exploration - Detailed paper analysis - Agent command outputs - Review generation
JSON Format¶
Structured data format for programmatic use.
Features: - Complete data preservation - Machine-readable - Supports all fields - Enables complex processing
Example Output:
[
{
"arxiv_id": "1810.04805",
"title": "BERT: Pre-training of Deep Bidirectional Transformers",
"authors": ["Jacob Devlin", "Ming-Wei Chang", "Kenton Lee"],
"year": 2018,
"citations": 50000,
"score": 95.2,
"abstract": "We introduce a new language representation model...",
"categories": ["cs.CL"],
"url": "https://arxiv.org/abs/1810.04805"
}
]
CSV Format¶
Comma-separated values for spreadsheet applications.
Features: - Excel/Google Sheets compatible - Statistical analysis ready - Easy filtering and sorting - Compact representation
Example Output:
arxiv_id,title,authors,year,citations,score
1810.04805,"BERT: Pre-training of Deep...","Devlin et al.",2018,50000,95.2
2103.00020,"Learning Transferable Visual...","Radford et al.",2021,5000,92.1
Markdown Format¶
Formatted text for documentation and reports.
Features: - GitHub-compatible - Preserves formatting - Includes links - Ready for documentation
Example Output:
# Comparison: BERT vs GPT-2
## Overview
| Aspect | BERT (1810.04805) | GPT-2 (2005.14165) |
|--------|-------------------|---------------------|
| Year | 2018 | 2019 |
| Citations | 50000 | 25000 |
| Architecture | Bidirectional | Unidirectional |
## Key Differences
...
Processing Output¶
JSON Processing with jq¶
# Extract specific fields
scoutml search "transformer" --output json | \
jq '.[] | {title: .title, citations: .citations}'
# Filter results
scoutml search "bert" --output json | \
jq '.[] | select(.citations > 1000)'
# Sort by custom criteria
scoutml search "nlp" --output json | \
jq 'sort_by(.citations / (.year - 2000))'
# Aggregate statistics
scoutml search "vision" --output json | \
jq '{
total: length,
avg_citations: (map(.citations) | add / length),
years: (map(.year) | unique)
}'
CSV Processing¶
# With standard tools
scoutml search "ml" --output csv | \
awk -F',' 'NR>1 {sum+=$5; count++} END {print "Avg citations:", sum/count}'
# Import to pandas
python3 << EOF
import pandas as pd
df = pd.read_csv('results.csv')
print(df.groupby('year')['citations'].mean())
EOF
# Quick analysis with csvkit
scoutml search "ai" --output csv | csvstat
Markdown Processing¶
# Convert to HTML
scoutml review "topic" --output markdown | pandoc -f markdown -t html > review.html
# Convert to PDF
scoutml compare 1 2 3 --output markdown | \
pandoc -f markdown -t pdf -o comparison.pdf
# Extract sections
scoutml review "topic" --output markdown | \
awk '/^## Key Papers/,/^##/ {print}'
Format Selection Guide¶
When to Use Table/Rich¶
✅ Use for: - Interactive terminal sessions - Quick visual inspection - Demonstrations - Initial exploration
❌ Avoid for: - Automated processing - Large result sets - Piping to other commands
When to Use JSON¶
✅ Use for: - Automation scripts - Data pipelines - Complex filtering - API integration - Custom analysis
❌ Avoid for: - Human reading - Quick checks
When to Use CSV¶
✅ Use for: - Spreadsheet analysis - Statistical software - Data visualization - Simple databases
❌ Avoid for: - Nested data - Long text fields - Complex relationships
When to Use Markdown¶
✅ Use for: - Documentation - Reports - Blog posts - Team sharing - Version control
❌ Avoid for: - Data processing - Automated workflows
Advanced Format Usage¶
Custom Formatting¶
# Create custom table from JSON
scoutml search "bert" --output json | \
jq -r '.[] | [.arxiv_id, .title[0:50], .citations] | @tsv' | \
column -t -s $'\t'
# Generate HTML report
cat > template.html << 'EOF'
<!DOCTYPE html>
<html>
<head><title>Research Report</title></head>
<body>
<h1>Papers</h1>
<ul>
{{PAPERS}}
</ul>
</body>
</html>
EOF
papers=$(scoutml search "ai" --output json | \
jq -r '.[] | "<li><a href=\"https://arxiv.org/abs/\(.arxiv_id)\">\(.title)</a> - \(.citations) citations</li>"')
sed "s|{{PAPERS}}|$papers|" template.html > report.html
Format Conversion¶
# JSON to CSV
scoutml search "ml" --output json | \
jq -r '["arxiv_id","title","year","citations"],
(.[] | [.arxiv_id, .title, .year, .citations]) | @csv' > data.csv
# CSV to Markdown table
csv2md() {
local file=$1
head -1 "$file" | sed 's/,/|/g' | sed 's/^/|/; s/$/|/'
head -1 "$file" | sed 's/[^,]/-/g; s/,/|/g' | sed 's/^/|/; s/$/|/'
tail -n +2 "$file" | sed 's/,/|/g' | sed 's/^/|/; s/$/|/'
}
# Rich output to plain text
scoutml paper 1810.04805 --output rich | \
sed 's/\x1b\[[0-9;]*m//g' > plain.txt
Streaming Processing¶
# Process large result sets
scoutml search "deep learning" --limit 1000 --output json | \
jq -c '.[]' | \
while read -r paper; do
# Process each paper
echo "$paper" | jq -r '.arxiv_id'
done
# Real-time filtering
scoutml search "ai" --output json --limit 100 | \
jq --stream 'select(.[0][1] == "citations" and .[1] > 100)'
Export Options¶
Using --export Flag¶
# Export to specific file
scoutml search "bert" --output json --export results.json
scoutml review "transformers" --output markdown --export review.md
# Export with timestamp
scoutml search "ai" --output csv \
--export "results_$(date +%Y%m%d_%H%M%S).csv"
Piping vs Export¶
# Piping (for immediate processing)
scoutml search "ml" --output json | jq '.[] | .title'
# Export (for storage)
scoutml search "ml" --output json --export data.json
jq '.[] | .title' data.json # Process later
Format-Specific Tips¶
JSON Tips¶
- Use jq for processing - It's powerful and fast
- Validate JSON -
jq empty file.json
- Pretty print -
jq '.' file.json
- Compact output -
jq -c '.'
CSV Tips¶
- Quote handling - Use proper CSV parsers
- Encoding - Ensure UTF-8 for international characters
- Large files - Use streaming tools like
csvkit
- Headers - First row contains column names
Markdown Tips¶
- Pandoc conversion - Convert to any format
- GitHub rendering - Test in GitHub preview
- Table limits - Some renderers have column limits
- Link format - Use full URLs for compatibility
Performance Considerations¶
Format Performance¶
Format | Speed | Memory | File Size |
---|---|---|---|
JSON | Fast | Medium | Large |
CSV | Fastest | Low | Small |
Table | Slow | High | N/A |
Rich | Slowest | Highest | N/A |
Markdown | Medium | Medium | Medium |
Optimization Tips¶
# For large datasets, use streaming
scoutml search "ai" --limit 10000 --output json | \
jq -c '.[]' > large_dataset.jsonl # JSON Lines format
# Process in chunks
split -l 1000 large_dataset.jsonl chunk_
for chunk in chunk_*; do
process_chunk "$chunk" &
done
wait
Conclusion¶
Choosing the right output format is crucial for efficient workflows:
- Use rich/table for human consumption
- Use JSON for data processing
- Use CSV for spreadsheet analysis
- Use markdown for documentation
Always consider your end goal when selecting a format, and don't hesitate to convert between formats as needed.