Skip to content

Quick Start Guide

New in v0.1.18.1: Complete Feature Parity & Test Framework

NetIntel-OCR v0.1.18.1 achieves 100% feature parity between CLI and API v2, includes a comprehensive test framework, and defaults to Milvus for vector operations!

All 30+ CLI options are now available through the API, with complete multi-model support for text, network, and flow diagram processing.

What's New → | API Feature Parity → | Testing Guide →

Installation

System Requirements

Operating System Support

Linux Only - NetIntel-OCR is currently tested and supported only on Linux distributions.

Windows and macOS support is planned for future releases but not currently available.

Python Version

Python 3.11 or 3.12 Required - NetIntel-OCR is tested and supported only on Python 3.11 and 3.12.

Other Python versions may work but are not officially supported.

Verified Configurations

  • OS: Ubuntu 20.04/22.04, RHEL 8/9, Debian 11/12
  • Python: 3.11.x or 3.12.x
  • RAM: 8GB minimum (16GB recommended)
  • Storage: 10GB for models + processing space
  • Ollama: Version 0.1.0 or higher

Python Setup

# Check Python version (must be 3.11 or 3.12)
python3 --version

# Install Python 3.11 on Ubuntu/Debian
sudo apt update
sudo apt install python3.11 python3.11-venv python3.11-dev

# Or install Python 3.12
sudo apt install python3.12 python3.12-venv python3.12-dev

# Create virtual environment with Python 3.11
python3.11 -m venv venv
source venv/bin/activate

# Or with Python 3.12
python3.12 -m venv venv
source venv/bin/activate

Install NetIntel-OCR

Choose Your Installation

# Option 1: Base installation (500MB) - Core OCR only
pip install netintel-ocr

# Option 2: With Knowledge Graph (+1.5GB) - Recommended
pip install "netintel-ocr[kg]"

# Option 3: Production setup (+2GB) - KG + Vector + API
pip install "netintel-ocr[production]"

# Option 4: Everything (+2.5GB) - All features
pip install "netintel-ocr[all]"

Verify Installation

# Check version and installed modules
netintel-ocr --version

# Example output showing what's installed:
# NetIntel-OCR v0.1.18.1
# ├── Core Components:
# │   ├── C++ Core: ✓ v1.0.1
# │   ├── AVX2: ✓
# │   └── Platform: Linux x86_64
# ├── Installed Modules:
# │   ├── [base] Core OCR: ✓ (always installed)
# │   ├── [kg] Knowledge Graph: ✓ (pykeen 1.10.1)
# │   └── [vector] Vector Store: ✗ (not installed)
# ├── Available for Install:
# │   └── [vector] Vector Store: pip install netintel-ocr[vector]
# └── Active Features:
#     ├── FalkorDB: ✓ (connected to localhost:6379)
#     └── Ollama: ✓ (connected to localhost:11434)

# Get detailed JSON output
netintel-ocr --version --json

Package Information

NetIntel-OCR v0.1.17.1 is available on PyPI with modular installation options.

See Installation Guide for all options.

Configure External Ollama Server

NetIntel-OCR requires an external Ollama server with different models for different purposes:

# Configure external Ollama server
export OLLAMA_HOST="http://your-ollama-server:11434"

# Verify Ollama has required models
curl $OLLAMA_HOST/api/tags | jq '.models[].name'

# IMPORTANT: Models serve different purposes
# ==========================================
# INGESTION MODELS (for processing PDFs):
# - qwen2.5vl:7b (network diagram analysis)
# - Nanonets-OCR-s:latest (OCR text extraction)
#
# MINIRAG MODELS (for Q&A after ingestion):
# - gemma3:4b-it-qat (answer generation)
# - qwen3-embedding:8b (semantic search)

# Pull INGESTION models (for PDF processing)
curl -X POST $OLLAMA_HOST/api/pull -d '{"name":"qwen2.5vl:7b"}'
curl -X POST $OLLAMA_HOST/api/pull -d '{"name":"Nanonets-OCR-s:latest"}'

# Pull MINIRAG models (for Q&A after ingestion)
curl -X POST $OLLAMA_HOST/api/pull -d '{"name":"gemma3:4b-it-qat"}'
curl -X POST $OLLAMA_HOST/api/pull -d '{"name":"qwen3-embedding:8b"}'

# Optional: Pull alternative models
curl -X POST $OLLAMA_HOST/api/pull -d '{"name":"llava:13b"}'        # Alternative vision model
curl -X POST $OLLAMA_HOST/api/pull -d '{"name":"minicpm-v:latest"}'  # Lightweight option

Quick Configuration

Using Configuration Templates (NEW in v0.1.17)

NetIntel-OCR v0.1.17 provides pre-configured templates for different scenarios:

# List available templates
netintel-ocr config template list
# Available templates:
# - minimal: Single-user local setup
# - development: Development environment
# - staging: Staging server configuration
# - production: Production deployment
# - enterprise: Full enterprise features
# - cloud: Cloud-native deployment

# Quick start with minimal template
netintel-ocr config template apply minimal --output config.json
netintel-ocr config use config.json

# Or use development template for more features
netintel-ocr config template apply development --output dev.json
netintel-ocr --config dev.json process pdf document.pdf

Model Selection

Ingestion Models vs MiniRAG Models

The models below are INGESTION MODELS used during PDF processing. They are completely separate from MiniRAG MODELS (gemma3:4b-it-qat, qwen3-embedding:8b) which are used for Q&A after ingestion.

Parameter Purpose Recommended Model Model Type
--model OCR and text extraction Nanonets-OCR-s:latest Ingestion
--network-model Network diagram analysis qwen2.5vl:7b Ingestion
--flow-model Flow chart processing qwen2.5vl:7b Ingestion

What's New in v0.1.18.1

Complete Multi-Model Support

# Use different models for different content types
netintel-ocr process file document.pdf \
    --model nanonets-ocr-s \        # For text extraction
    --network-model qwen2.5vl \      # For network diagrams
    --flow-model custom-flow         # For flow diagrams

API v2 with Full Feature Parity

# All CLI options now available in API
from netintel_ocr import APIClient

client = APIClient()
result = client.process_document(
    "document.pdf",
    model="nanonets-ocr-s",
    network_model="qwen2.5vl",
    confidence=0.8,
    fast_extraction=True,
    with_kg=True,
    vector_format="milvus"  # Milvus is now default
)

Basic Usage

Process a PDF Document

Enhanced in v0.1.18.1

All processing now defaults to Milvus for vector storage and supports complete multi-model configuration.

# Process entire document with default settings
netintel-ocr process file document.pdf

# Multi-model processing (NEW in v0.1.18.1!)
netintel-ocr process file document.pdf \
    --model nanonets-ocr-s \
    --network-model qwen2.5vl

# Process specific pages
netintel-ocr process file document.pdf --pages 5-10

# Extract only text
netintel-ocr process file document.pdf --text-only

# Extract only network diagrams
netintel-ocr process file document.pdf --network-only

# Extract tables
netintel-ocr process pdf document.pdf --extract-tables

# Enable debug output
netintel-ocr --debug process pdf document.pdf

Example: Network Architecture Document

# Using new v0.1.17 CLI structure
netintel-ocr process pdf cisco-sdwan-design-guide.pdf \
  --model Nanonets-OCR-s:latest \
  --network-model qwen2.5vl:7b \
  --start 5 --end 10 \
  --output results.json \
  --debug

Output structure:

output/
├── cisco-sdwan-design-guide/
│   ├── page_005.md          # Extracted text
│   ├── page_006_network.md  # Network diagram with Mermaid
│   ├── page_007.md          # Regular page
│   ├── summary.json         # Processing summary
│   ├── kg_entities.json     # Extracted entities (v0.1.17)
│   ├── kg_relations.cypher  # Graph relationships (v0.1.17)
│   └── kg_embeddings.npy    # Learned embeddings (v0.1.17)

Batch Processing

Process multiple PDFs efficiently:

# Process entire directory
netintel-ocr process batch /path/to/pdfs/ --output-dir results/

# Process with parallel workers
netintel-ocr process batch /path/to/pdfs/ --parallel 4

# Process from file list
echo "doc1.pdf\ndoc2.pdf\ndoc3.pdf" > file_list.txt
netintel-ocr process batch file_list.txt

# Watch directory for new PDFs
netintel-ocr process watch /input/folder --pattern "*.pdf"

Query Processed Data

Database Queries

# Search for specific content
netintel-ocr db query "firewall configuration"

# Query with filters
netintel-ocr db query "router" --limit 10 --threshold 0.8

# Export results
netintel-ocr db query "network topology" --format json > results.json

Knowledge Graph Queries (NEW)

# Query knowledge graph
netintel-ocr kg query "show all routers"

# RAG-enhanced query
netintel-ocr kg rag-query "What are the security vulnerabilities?"

# Visualize graph
netintel-ocr kg visualize --output network-graph.html

Start Server Services

Quick Development Server

# Start development server with hot reload
netintel-ocr server dev --reload

# Access at:
# - API: http://localhost:8000
# - MCP: http://localhost:8001

Production Deployment

# Start all services
netintel-ocr server all --api-port 8000 --mcp-port 8001

# Or start individually
netintel-ocr server api --port 8000 --workers 8 &
netintel-ocr server mcp --port 8001 --auth &
netintel-ocr server worker --count 4 --queue redis &

# Check health
netintel-ocr server health

Output Formats

Markdown Files

Each page generates a markdown file containing: - Extracted text content - Mermaid diagrams for network/flow charts - Context analysis and interpretations - Component and connection listings

Mermaid Diagrams

Network diagrams are converted to Mermaid syntax:

graph TB
    Router["Core Router"]
    Switch1["Access Switch 1"]
    Switch2["Access Switch 2"]

    Router --> Switch1
    Router --> Switch2

Knowledge Graph (v0.1.17)

Entities and relationships are automatically extracted: - Entities: Routers, Switches, Firewalls, Servers, etc. - Relationships: Connected_To, Routes_Through, Protects, etc. - Properties: IP addresses, VLANs, protocols, ports

Next Steps

  1. Explore Commands: Run netintel-ocr --help to see all command groups
  2. Configure Templates: Try different configuration templates for your use case
  3. Process Documents: Start with a sample network document
  4. Query Results: Use database and knowledge graph queries
  5. Visualize: Generate network visualizations from extracted data
  6. Deploy: Set up production services when ready

Getting Help

# View general help
netintel-ocr --help

# View help for command groups
netintel-ocr process --help
netintel-ocr server --help
netintel-ocr kg --help

# View help for specific commands
netintel-ocr process pdf --help
netintel-ocr kg query --help

# Check system status
netintel-ocr system check
netintel-ocr system diagnose

Common Issues

Command Not Found

If you see "command not found", update to the new v0.1.17 syntax: - Old: netintel-ocr file.pdf - New: netintel-ocr process pdf file.pdf

Ollama Connection

Ensure OLLAMA_HOST is set:

export OLLAMA_HOST="http://your-server:11434"
netintel-ocr system check

Missing Models

Pull required models:

curl -X POST $OLLAMA_HOST/api/pull -d '{"name":"qwen2.5vl:7b"}'

For more help, see the Troubleshooting Guide or CLI Reference.