Imouto

Imouto: OpenRouter Council Scoring

Scored all 18,615 remaining US tasks using a 3-model OpenRouter council (gemma4:e4b + llama3.2 + mistral) with majority voting, achieving 79% Sonnet agreement at a fraction of the cost.

5 Phases
14 Tasks
2 Days

Council Scoring at Scale

A 3-model council using OpenRouter cloud inference (gemma4:e4b, llama3.2, mistral) replaced the planned local Ollama scoring. Each task is scored by all three models concurrently via ThreadPoolExecutor, with majority voting on the automatable classification and merged/deduplicated ai_tools suggestions.

Provider Abstraction

A providers/ package with BaseProvider, AnthropicProvider, OllamaProvider, and OpenRouterProvider abstracts away the inference backend. The --provider and --council flags on score.py make switching seamless.

Bulk Run

18,615 US tasks were scored with checkpoint/resume support and progress display showing per-task timing and ETA. Post-bulk Sonnet-graded validation on 30 stratified tasks confirmed quality within 5pp of eval-time agreement. A SQLite database layer replaced the JSON file for atomic writes.

Features Delivered

Provider Package

  • Provider abstraction — BaseProvider, Anthropic, Ollama, OpenRouter implementations

Council Scorer

  • 3-model majority voting — Concurrent council calls with merged results
  • Checkpoint and resume — JSON checkpoints for long-running bulk jobs

Bulk Scoring

  • 18,615 tasks scored — Full US catalogue via OpenRouter council
  • SQLite database layer — Atomic writes replacing JSON file