Imouto

Imouto: AU Dataset Expansion

Expanded the Australian dataset from 20 hand-picked occupations to all 1,156 in the OSCA classification, with full AU-to-US matching via OpenRouter council.

5 Phases
15 Tasks
1 Days

Complete Australian Coverage

The Australian dataset grew from 20 carefully selected occupations to the full OSCA catalogue of 1,156. Scoring was effectively free thanks to the JSA bypass — pre-existing JSA Gen AI per-task scores were used instead of making Claude API calls.

Full-Catalogue Matching

The match.py matcher was refactored to use the OpenRouter 3-model council with SequenceMatcher pre-filtering (top 10 US candidates per AU occupation). All 1,156 AU occupations received bidirectional matched_id links to their US equivalents, enabling cross-country comparison for every occupation in the dataset.

Features Delivered

Pipeline Expansion

  • Full OSCA catalogue — 1,156 AU occupations loaded and validated

Scoring and Matching

  • Bulk AU scoring — All 1,156 occupations scored via JSA bypass
  • Full AU-US matching — Council-based matching with text similarity pre-filtering