MiniRAG - Enhanced Retrieval with Knowledge Graphs¶

Overview¶

MiniRAG is NetIntel-OCR's advanced Retrieval-Augmented Generation system that combines traditional vector search with Knowledge Graph context for more accurate, explainable, and context-aware answers. It leverages both structured graph data and unstructured text to provide comprehensive responses to complex queries.

MiniRAG vs Ingestion Models

MiniRAG models are used AFTER document ingestion for Q&A and retrieval: - gemma3:4b-it-qat - For generating answers to questions - qwen3-embedding:8b - For semantic search during retrieval

Ingestion models are used DURING PDF processing: - qwen2.5vl:7b - For analyzing network diagrams - Nanonets-OCR-s:latest - For OCR text extraction

These are completely separate model sets serving different purposes!

Architecture¶

┌────────────────────────────────────────┐
│           User Query                    │
└────────────┬───────────────────────────┘
             │
┌────────────▼───────────────────────────┐
│      Query Intent Classifier            │
│  (Determines optimal retrieval path)    │
└────────────┬───────────────────────────┘
             │
    ┌────────┴────────┬──────────┬──────────┐
    ▼                 ▼          ▼          ▼
┌─────────┐    ┌──────────┐  ┌─────────┐ ┌──────────┐
│ FalkorDB │    │  PyKEEN  │  │ Milvus  │ │  Ollama  │
│  Graph   │    │Embeddings│  │ Vectors │ │   LLM    │
└─────────┘    └──────────┘  └─────────┘ └──────────┘
    │                 │          │          │
    └────────┬────────┴──────────┴──────────┘
             │
┌────────────▼───────────────────────────┐
│        Result Fusion (RRF)              │
│   (Combines multiple retrieval paths)   │
└────────────┬───────────────────────────┘
             │
┌────────────▼───────────────────────────┐
│      Context-Enhanced Response          │
└────────────────────────────────────────┘

Installation and Setup¶

Prerequisites¶

# Ensure NetIntel-OCR v0.1.17+ is installed
pip install netintel-ocr>=0.1.17

# Configure external Ollama server
export OLLAMA_HOST="http://your-ollama-server:11434"

# Verify Ollama has MiniRAG models (NOT ingestion models)
curl $OLLAMA_HOST/api/tags | jq '.models[].name'

# MiniRAG MODELS ONLY (for Q&A after ingestion):
# - qwen3-embedding:8b (for semantic search embeddings)
# - gemma3:4b-it-qat (for answer generation)

# Pull MiniRAG models if missing
curl -X POST $OLLAMA_HOST/api/pull -d '{"name":"qwen3-embedding:8b"}'
curl -X POST $OLLAMA_HOST/api/pull -d '{"name":"gemma3:4b-it-qat"}'

# Note: Ingestion models (qwen2.5vl:7b, Nanonets-OCR-s) are separate!

# Start required graph/vector services
docker-compose up -d falkordb milvus

# Verify MiniRAG components
netintel-ocr rag check

# Output:
# ✅ FalkorDB: Connected
# ✅ Milvus: Connected  
# ✅ Ollama: Connected (http://your-ollama-server:11434)
# ✅ MiniRAG: Ready
# ✅ Available Models: qwen3-embedding:8b, gemma3:4b-it-qat

Initialize MiniRAG¶

# Initialize MiniRAG with Q&A models (NOT ingestion models)
netintel-ocr rag init \
  --collection network_docs \
  --llm-model gemma3:4b-it-qat \       # MiniRAG answer generation
  --embedding-model qwen3-embedding:8b  # MiniRAG semantic search

# Advanced initialization with custom settings
netintel-ocr rag init \
  --collection production_infrastructure \
  --llm-model gemma3:4b-it-qat \        # MiniRAG model (NOT ingestion)
  --embedding-model qwen3-embedding:8b \ # MiniRAG model (NOT ingestion)
  --chunk-size 512 \
  --chunk-overlap 50 \
  --temperature 0.7 \
  --max-tokens 2000 \
  --ollama-host $OLLAMA_HOST

Basic Usage¶

Simple Question Answering¶

# Ask a question about your processed telecom infrastructure
netintel-ocr rag query \
  --question "What are the main components of our 5G network architecture?" \
  --collection telecom_docs

# Output:
# Question: What are the main components of our 5G network architecture?
#
# Answer: Based on the analyzed documentation, your 5G network architecture consists of:
#
# RAN Infrastructure:
# • 12,000 gNodeBs (5G base stations) across 3 frequency bands
# • 8,000 Small cells for urban densification
# • 450 Distributed Units (DUs) for edge processing
# • 45 Centralized Units (CUs) for baseband processing
#
# 5G Core Network:
# • AMF (Access and Mobility Management Function) - 4 instances
# • SMF (Session Management Function) - 6 instances
# • UPF (User Plane Function) - 24 edge instances
# • PCF (Policy Control Function) - 2 instances
# • UDM (Unified Data Management) - 2 instances
#
# Transport Network:
# • MPLS backbone with 100G/400G links
# • Segment Routing for network slicing
# • Edge compute nodes at 150 locations
#
# Sources: network-topology.pdf (pages 3-5), infrastructure-design.pdf (pages 12-14)
# Confidence: 0.94

Query with Context¶

# Include surrounding context for better answers
netintel-ocr rag query \
  --question "How does traffic flow from a UE to the internet in our 5G network?" \
  --include-context \
  --context-window 3 \
  --collection telecom_docs

# Output includes:
# - Direct answer with traffic flow path
# - Network diagram visualization
# - Security checkpoints along the path
# - Relevant firewall rules
# - Performance metrics for each hop

Query Modes¶

1. Graph Mode - Structured Data Queries¶

Best for queries about relationships, dependencies, and network topology.

# Query using only Knowledge Graph
netintel-ocr rag query \
  --mode graph \
  --question "List all network functions in the 5G Core" \
  --collection telecom_infrastructure

# Output:
# Network Functions in 5G Core (from Knowledge Graph):
# 
# Control Plane Functions:
# • AMF-01, AMF-02 (Access & Mobility Management)
# • SMF-01 to SMF-06 (Session Management)
# • PCF-01, PCF-02 (Policy Control)
# • UDM-01, UDM-02 (Unified Data Management)
# • AUSF-01 (Authentication Server)
# • NSSF-01 (Network Slice Selection)
# 
# User Plane Functions:
# • UPF-EDGE-01 to UPF-EDGE-24 (Edge locations)
# • UPF-CORE-01 to UPF-CORE-04 (Core locations)
# 
# Supporting Functions:
# • NRF-01, NRF-02 (Network Repository)
# • NEF-01 (Network Exposure Function)
# 
# Total: 8 devices
# Graph traversal time: 12ms

2. Vector Mode - Unstructured Text Queries¶

Best for policy questions, procedures, and descriptive content.

# Query using only vector search
netintel-ocr rag query \
  --mode vector \
  --question "What are the password policy requirements?" \
  --collection security_policies

# Output:
# Password Policy Requirements (from document search):
# 
# According to the Security Policy document (v2.3):
# 
# 1. Minimum Length: 12 characters
# 2. Complexity Requirements:
#    • At least one uppercase letter
#    • At least one lowercase letter
#    • At least one number
#    • At least one special character
# 3. Password History: Cannot reuse last 12 passwords
# 4. Expiration: 90 days
# 5. Account Lockout: 5 failed attempts
# 6. Multi-Factor Authentication: Required for privileged accounts
# 
# Source: security-policy.pdf (page 23)
# Relevance Score: 0.96

3. Hybrid Mode - Combined Intelligence¶

Best for complex queries requiring both structured and unstructured data.

# Query using hybrid retrieval (default)
netintel-ocr rag query \
  --mode hybrid \
  --question "What would be the impact of upgrading the core router firmware?" \
  --collection network_docs

# Output:
# Impact Analysis for Core Router Firmware Upgrade:
# 
# From Knowledge Graph Analysis:
# • Affected Devices: 47 directly connected systems
# • Service Dependencies: 23 critical services rely on core router
# • Redundancy: Core-Router-02 can handle traffic during upgrade
# • Estimated Affected Users: 2,500
# 
# From Documentation:
# • Upgrade Window: 2-4 hours (per upgrade guide)
# • Required Downtime: 15-30 minutes for failover
# • Rollback Procedure: Available (documented in section 4.3)
# • Known Issues: Memory leak fixed in new version (CVE-2024-1234)
# 
# Risk Assessment: MEDIUM
# Recommendation: Schedule during maintenance window with failover testing
# 
# Sources: 
# - Knowledge Graph: 47 entities, 156 relationships analyzed
# - Documents: upgrade-guide.pdf (p.12), network-sop.pdf (p.45)
# Confidence: 0.91

4. Embedding Mode - Similarity Queries¶

Best for finding similar configurations or patterns.

# Query using KG embeddings
netintel-ocr rag query \
  --mode embeddings \
  --question "Find all firewalls with similar configurations to FW-PROD-01" \
  --similarity-threshold 0.85 \
  --collection network_configs

# Output:
# Firewalls Similar to FW-PROD-01:
# 
# 1. FW-PROD-02 (Similarity: 0.98)
#    • Location: Data Center 1
#    • Config Match: 98% identical rules
#    • Differences: NAT pool ranges
# 
# 2. FW-DR-01 (Similarity: 0.92)
#    • Location: DR Site
#    • Config Match: 85% identical rules
#    • Differences: IP ranges, VLAN assignments
# 
# 3. FW-PROD-03 (Similarity: 0.87)
#    • Location: Data Center 2
#    • Config Match: 82% identical rules
#    • Differences: Additional DMZ rules
# 
# Embedding Model: RotatE (200D)
# Comparison Time: 34ms

Advanced Features¶

Multi-Hop Reasoning¶

# Complex reasoning across multiple relationships
netintel-ocr rag query \
  --question "If the primary 5G Core AMF fails, what services are affected and what's the recovery plan?" \
  --reasoning-depth 4 \
  --include-alternatives \
  --collection telecom_infrastructure

# Output:
# Multi-Hop Analysis for AMF Failure:
# 
# Immediate Impact (1 hop):
# • UE Registration → Failed for new devices
# • Mobility Management → Handover failures
# • Session Management → Cannot establish new PDU sessions
# 
# Cascading Impact (2-3 hops):
# • Voice Services (VoNR) → New calls fail
# • Network Slicing → Slice selection unavailable
# • Roaming Services → Inbound roamers affected
# 
# Recovery Plan (from runbooks):
# 1. Automatic failover to DR database (RTO: 5 min)
# 2. If failover fails:
#    a. Start manual recovery procedure
#    b. Restore from backup (RPO: 1 hour)
#    c. Replay transaction logs
# 3. Notify stakeholders per escalation matrix
# 
# Alternative Paths:
# • Read-only mode using replica
# • Cache-based operations for 2 hours
# • Queue writes for later processing

Comparative Analysis¶

# Compare configurations or architectures
netintel-ocr rag compare \
  --entities "gNodeB-NYC-001,gNodeB-NYC-002" \
  --aspects "config,performance,capacity" \
  --collection telecom_configs

# Output:
# Comparative Analysis: gNodeB-NYC-001 vs gNodeB-NYC-002
# 
# ┌─────────────────┬───────────────┬───────────────┐
# │ Aspect          │ gNodeB-NYC-001 │ gNodeB-NYC-002 │
# ├─────────────────┼───────────────┼───────────────┤
# │ Frequency Bands │ n77, n41, n5   │ n77, n41      │
# │ MIMO Config     │ 64T64R         │ 32T32R        │
# │ Max UEs         │ 4,000          │ 2,000         │
# │ Active UEs      │ 3,421          │ 1,876         │
# │ Throughput      │ 8.2 Gbps       │ 4.7 Gbps      │
# │ PRB Utilization │ 78%            │ 82%           │
# │ Handover Success│ 99.2%          │ 98.7%         │
# │ CPU Usage       │ 62%            │ 71%           │
# │ Power Output    │ 40W            │ 20W           │
# │ Last Config     │ 2024-01-10     │ 2024-01-09    │
# └─────────────────┴───────────────┴───────────────┘
# 
# Key Differences:
# • Production has 45 additional rules for specific services
# • DR has simplified NAT configuration
# • Configuration sync lag: 2 days

Temporal Queries¶

# Query with time context
netintel-ocr rag query \
  --question "What changes were made to the network in the last 30 days?" \
  --temporal \
  --lookback-days 30 \
  --include-changelog \
  --collection network_docs

# Output:
# Network Changes (Last 30 Days):
# 
# Week 1 (Jan 1-7):
# • Added VLAN 245 for new development team
# • Updated firewall rules for cloud migration
# • Replaced Switch-Access-12 (hardware failure)
# 
# Week 2 (Jan 8-14):
# • Implemented new QoS policies
# • Added redundant link between DC1 and DC2
# • Updated routing tables for new subnet
# 
# Week 3 (Jan 15-21):
# • Patched 15 devices for CVE-2024-1234
# • Migrated 3 services to cloud
# • Decommissioned legacy mail server
# 
# Week 4 (Jan 22-28):
# • Upgraded core router firmware
# • Added new load balancer for web tier
# • Implemented zero-trust policies in DMZ
# 
# Total Changes: 23
# Change Frequency: Increasing trend (+35%)

Compliance and Security Queries¶

Compliance Checking¶

# Check compliance with specific framework
netintel-ocr rag compliance-check \
  --question "Does our network segmentation meet PCI-DSS requirements?" \
  --framework PCI-DSS-v4.0 \
  --include-evidence \
  --generate-report \
  --collection compliance_docs

# Output:
# PCI-DSS Network Segmentation Compliance Check:
# 
# ✅ Requirement 1.1: Network diagram documented
#    Evidence: network-topology.pdf, last updated 2024-01-15
# 
# ✅ Requirement 1.2: Firewall configuration standards
#    Evidence: 234 rules reviewed, all follow standard
# 
# ⚠️ Requirement 1.3: DMZ implementation
#    Issue: Direct route found between DMZ and Internal
#    Risk: Medium
#    Remediation: Add deny rule on FW-Internal
# 
# ✅ Requirement 1.4: Personal firewall software
#    Evidence: Endpoint protection policy enforced
# 
# ❌ Requirement 1.5: Security policy review
#    Issue: Policy last reviewed 13 months ago (requires annual)
#    Risk: High
#    Remediation: Schedule immediate policy review
# 
# Overall Compliance: 78%
# Critical Issues: 1
# Warnings: 1
# 
# Report saved to: pci_compliance_report_2024-01-30.pdf

Security Analysis¶

# Analyze security posture
netintel-ocr rag security-analysis \
  --question "What are the potential attack vectors to our database?" \
  --threat-model MITRE-ATT&CK \
  --include-mitigations \
  --collection security_docs

# Output:
# Attack Vector Analysis for Database Access:
# 
# Identified Attack Vectors:
# 
# 1. External Network Path (High Risk)
#    Path: Internet → Firewall → DMZ → App Server → Database
#    MITRE Techniques: T1190 (Exploit Public-Facing Application)
#    Current Mitigations:
#    • WAF in place
#    • IPS monitoring
#    • Rate limiting enabled
#    Gaps: No API gateway authentication
# 
# 2. Lateral Movement (Medium Risk)
#    Path: Compromised Workstation → Internal Network → Database
#    MITRE Techniques: T1021 (Remote Services)
#    Current Mitigations:
#    • Network segmentation
#    • MFA on privileged accounts
#    Gaps: Some service accounts without MFA
# 
# 3. Insider Threat (Medium Risk)
#    Path: Direct database access via admin credentials
#    MITRE Techniques: T1078 (Valid Accounts)
#    Current Mitigations:
#    • Audit logging
#    • Privileged access management
#    Gaps: No behavior analytics
# 
# Recommended Actions:
# 1. Implement API gateway with authentication
# 2. Enable MFA for all service accounts
# 3. Deploy user behavior analytics
# 4. Regular penetration testing

Batch Processing¶

Process Multiple Questions¶

# Create batch query file
cat > queries.txt << EOF
What is our current network capacity?
Which systems have no redundancy?
What are the critical single points of failure?
How many firewall rules allow any-to-any traffic?
What is the backup retention policy?
EOF

# Run batch queries
netintel-ocr rag batch \
  --input queries.txt \
  --collection network_docs \
  --output results.json \
  --parallel 4 \
  --format json

# View results
cat results.json | jq '.queries[0]'
# {
#   "question": "What is our current network capacity?",
#   "answer": "Current network capacity: Core: 40Gbps (60% utilized)...",
#   "confidence": 0.92,
#   "sources": ["capacity-report.pdf", "network-metrics.xlsx"],
#   "response_time_ms": 234
# }

Interactive Session¶

# Start interactive RAG session
netintel-ocr rag interactive \
  --collection network_docs \
  --history-file session.log \
  --context-memory 5

# Interactive prompt appears:
# MiniRAG Interactive Mode
# Type 'help' for commands, 'exit' to quit
# 
# rag> what is our primary data center location?
# Answer: The primary data center is located in Dallas, TX...
# 
# rag> how many servers are there?
# Answer: Based on the context, there are 47 servers total...
# 
# rag> show graph
# [Displays interactive network graph visualization]
# 
# rag> export session
# Session exported to: rag_session_2024-01-30.md

Performance Optimization¶

Caching Configuration¶

# Enable response caching
netintel-ocr rag config \
  --enable-cache \
  --cache-size 1000 \
  --cache-ttl 3600 \
  --collection network_docs

# Query with cache
netintel-ocr rag query \
  --question "What is the network topology?" \
  --use-cache \
  --collection network_docs

# Clear cache
netintel-ocr rag cache-clear --collection network_docs

Retrieval Tuning¶

# Optimize retrieval parameters
netintel-ocr rag tune \
  --test-queries evaluation_set.txt \
  --optimize-for accuracy \
  --collection network_docs

# Output:
# Optimization Results:
# 
# Best Parameters:
# • Chunk Size: 512
# • Chunk Overlap: 64
# • Top-K: 8
# • Temperature: 0.7
# • Retrieval Strategy: hybrid
# • Vector Weight: 0.4
# • Graph Weight: 0.6
# 
# Performance Improvement:
# • Accuracy: 87% → 94% (+7%)
# • Latency: 380ms → 290ms (-24%)
# • Relevance: 0.81 → 0.93 (+15%)

Monitoring and Metrics¶

# View RAG performance metrics
netintel-ocr rag metrics \
  --period 7d \
  --collection network_docs

# Output:
# MiniRAG Performance Metrics (7 days):
# 
# Query Statistics:
# • Total Queries: 1,847
# • Unique Questions: 423
# • Avg Response Time: 312ms
# • P95 Response Time: 780ms
# • Cache Hit Rate: 67%
# 
# Retrieval Performance:
# • Graph Queries: 34% (avg 120ms)
# • Vector Queries: 28% (avg 290ms)
# • Hybrid Queries: 38% (avg 410ms)
# 
# Accuracy Metrics:
# • User Satisfaction: 92%
# • Answer Relevance: 0.89
# • Source Accuracy: 94%
# 
# Top Query Categories:
# 1. Configuration (34%)
# 2. Troubleshooting (28%)
# 3. Compliance (22%)
# 4. Capacity Planning (16%)

Export and Integration¶

Export Conversations¶

# Export Q&A as documentation
netintel-ocr rag export \
  --format markdown \
  --include-sources \
  --include-confidence \
  --output network_qa.md \
  --collection network_docs

# Export as JSON for API integration
netintel-ocr rag export \
  --format json \
  --schema openapi \
  --output rag_api.json \
  --collection network_docs

Generate Knowledge Base¶

# Build comprehensive KB from queries
netintel-ocr rag build-kb \
  --min-confidence 0.8 \
  --categories "network,security,operations" \
  --format html \
  --output knowledge_base.html \
  --collection network_docs

# Generate FAQ
netintel-ocr rag generate-faq \
  --top-questions 50 \
  --group-by-category \
  --output faq.md \
  --collection network_docs

API Integration¶

REST API Usage¶

import requests

# Query via API
response = requests.post(
    "http://localhost:8000/rag/query",
    json={
        "question": "What is the database connection string?",
        "collection": "network_docs",
        "mode": "hybrid",
        "include_sources": True
    }
)

result = response.json()
print(f"Answer: {result['answer']}")
print(f"Confidence: {result['confidence']}")
print(f"Sources: {result['sources']}")

Python SDK¶

import os
from netintel_ocr.rag import MiniRAG

# Configure external Ollama
os.environ['OLLAMA_HOST'] = "http://your-ollama-server:11434"

# Initialize MiniRAG
rag = MiniRAG(
    collection="network_docs",
    llm_model="gemma3:4b-it-qat",
    embedding_model="qwen3-embedding:8b",
    retrieval_strategy="adaptive",
    ollama_host=os.environ.get('OLLAMA_HOST', 'http://localhost:11434')
)

# Query
result = rag.query(
    question="What are the backup procedures?",
    include_context=True,
    max_tokens=500
)

print(result.answer)
print(f"Retrieved from: {result.sources}")
print(f"Confidence: {result.confidence}")

# Batch processing
questions = [
    "What is the network capacity?",
    "How many servers do we have?",
    "What is the DR strategy?"
]

results = rag.batch_query(questions)
for q, r in zip(questions, results):
    print(f"Q: {q}")
    print(f"A: {r.answer}\n")

Troubleshooting¶

Common Issues¶

# Debug slow queries
netintel-ocr rag debug \
  --question "Your question here" \
  --show-retrieval \
  --show-timing \
  --show-reasoning \
  --collection network_docs

# Output:
# Query Debug Information:
# 
# 1. Query Classification (12ms)
#    Type: entity_centric
#    Strategy: graph-first
# 
# 2. Graph Retrieval (45ms)
#    Entities found: 12
#    Relationships: 34
# 
# 3. Vector Retrieval (89ms)
#    Documents: 5
#    Chunks: 15
# 
# 4. Context Building (23ms)
#    Context size: 2048 tokens
# 
# 5. LLM Generation (234ms)
#    Model: gemma3:4b-it-qat
#    Tokens: 450
# 
# Total Time: 403ms
# Bottleneck: LLM Generation (58%)

Improve Answer Quality¶

# Analyze answer quality
netintel-ocr rag analyze \
  --question "Your question" \
  --answer "Generated answer" \
  --check-hallucination \
  --check-completeness \
  --collection network_docs

# Re-index for better retrieval
netintel-ocr rag reindex \
  --optimize-embeddings \
  --update-graph \
  --collection network_docs

Best Practices¶

Choose the Right Mode:
Use graph mode for structural queries
Use vector mode for policy/procedure questions
Use hybrid mode for complex analysis
Use embeddings mode for similarity searches
Optimize for Your Use Case:
Tune chunk size based on document types
Adjust temperature for creativity vs accuracy
Use caching for frequently asked questions
Enable monitoring to track performance
Maintain Quality:
Regularly update your knowledge graph
Re-index after major document changes
Monitor confidence scores
Collect user feedback for improvements

Command Reference¶

# Essential MiniRAG commands
netintel-ocr rag init              # Initialize MiniRAG
netintel-ocr rag query             # Ask a question
netintel-ocr rag batch             # Process multiple questions
netintel-ocr rag interactive       # Start interactive session
netintel-ocr rag compare           # Compare entities
netintel-ocr rag compliance-check  # Check compliance
netintel-ocr rag metrics           # View performance metrics
netintel-ocr rag export            # Export Q&A pairs
netintel-ocr rag debug             # Debug queries
netintel-ocr rag tune              # Optimize parameters