Koji Operator

Development Progress — Fine-Tuned Instruction Model for CHO Platform

What is Koji?

Koji is the instruction-tuned operator model that powers CHO's decision-making. It translates natural language tasks into structured actions that control the desktop environment, manage memory, execute research, and communicate with users.

Built on an open-weight foundation model, Koji is fine-tuned specifically for agentic workflows—learning to think step-by-step, verify before acting, and chain multiple operations to complete complex tasks.

Pushing Technical Limits

Building an AI operator that controls a computer like a human requires solving problems that don't exist in traditional AI assistants. Koji is designed to operate at the edge of what's possible with current hardware.

Sub-Second Response

Traditional LLM inference is too slow for real-time desktop control. Koji runs locally on Apple Silicon with optimized quantization, achieving response times under one second for most actions. No cloud round-trip. No API latency. Pure local inference.

Human-Like Reasoning

Unlike chatbots that respond to single prompts, Koji maintains a continuous reasoning loop: observe → think → act → observe feedback → adapt. This mirrors how humans approach complex tasks—checking work, adjusting plans, recovering from errors.

Memory Architecture

Real intelligence requires memory. Koji implements a hierarchical memory system inspired by human cognition—short-term working memory for active tasks, long-term storage for learned preferences, and associative recall that surfaces relevant context automatically.

Research-First Learning

When encountering unfamiliar technology, Koji researches before acting—searching documentation, reading examples, storing knowledge, then proceeding with implementation. This prevents hallucination and ensures outputs reflect current best practices.

Cognitive Architecture

Koji's architecture is modeled after human cognition, with distinct subsystems for different types of processing:

SystemFunction
SensoryVisual processing, screen understanding
MemoryShort-term, long-term, episodic recall
AttentionFocus management, priority weighting
ReasoningPlanning, problem decomposition
MotorMouse, keyboard, gesture execution
LanguageUnderstanding and generating responses

Development Timeline

v0.1.0November 2025

Initial training phase. Single-turn instruction following. Basic command execution without multi-step reasoning.

v0.1.1December 2025

Introduced multi-turn trajectories. Model learns agentic loop: Act → Receive Feedback → Decide Next Step. Natural language prompts.

v0.1.2January 2026

Production-quality training data with comprehensive scenario coverage. Web development, authentication, deployment, research-first patterns, and multi-file project creation.

Training Approach

Unlike traditional fine-tuning that optimizes for single responses, Koji's training emphasizes multi-turn trajectory learning. Each training sample represents a complete task execution: intent → research → planning → execution → verification → response.

Training Coverage
v0.1.0
v0.1.1
v0.1.2

Behavioral Improvement

Before

User: "Check my files"

→ Generic response, no action

After

User: "Check my files"

→ Lists directory, reports results

Capability Coverage

  • Core system operations (file, memory, terminal)
  • Web development workflows (project scaffolding, components)
  • Research-first patterns (learn before implementing)
  • Multi-file project creation
  • Error recovery and debugging
  • Authentication and database integration
  • Deployment workflows
  • Cross-domain tasks (medical, robotics, science)

Performance Characteristics

MetricTarget
Inference Latency< 1 second (local)
Memory FootprintOptimized for 16GB+ RAM
Context Window32K tokens active
Action AccuracyContinuous improvement

Validation

After each training cycle, Koji is tested against real-world scenarios including multi-file projects, research tasks, error recovery, and memory recall to ensure production readiness.

Roadmap

PhaseFocus
CurrentProduction training, full coverage
NextReal-world fine-tuning from feedback
FutureExtended context, longer memory