Koji Operator

Development Progress — Fine-Tuned Instruction Model for CHO Platform

What is Koji?

Koji is the instruction-tuned operator model that powers CHO's decision-making. It translates natural language tasks into structured actions that control the desktop environment, manage memory, execute research, and communicate with users.

Built on an open-weight foundation model, Koji is fine-tuned specifically for agentic workflows—learning to think step-by-step, verify before acting, and chain multiple operations to complete complex tasks.

Pushing Technical Limits

Building an AI operator that controls a computer like a human requires solving problems that don't exist in traditional AI assistants. Koji is designed to operate at the edge of what's possible with current hardware.

Sub-Second Response

Traditional LLM inference is too slow for real-time desktop control. Koji runs locally on Apple Silicon with optimized quantization, achieving response times under one second for most actions. No cloud round-trip. No API latency. Pure local inference.

Human-Like Reasoning

Unlike chatbots that respond to single prompts, Koji maintains a continuous reasoning loop: observe → think → act → observe feedback → adapt. This mirrors how humans approach complex tasks—checking work, adjusting plans, recovering from errors.

Memory Architecture

Real intelligence requires memory. Koji implements a hierarchical memory system inspired by human cognition—short-term working memory for active tasks, long-term storage for learned preferences, and associative recall that surfaces relevant context automatically.

Research-First Learning

When encountering unfamiliar technology, Koji researches before acting—searching documentation, reading examples, storing knowledge, then proceeding with implementation. This prevents hallucination and ensures outputs reflect current best practices.

Cognitive Architecture

Koji's architecture is modeled after human cognition, with distinct subsystems for different types of processing:

System	Function
Sensory	Visual processing, screen understanding
Memory	Short-term, long-term, episodic recall
Attention	Focus management, priority weighting
Reasoning	Planning, problem decomposition
Motor	Mouse, keyboard, gesture execution
Language	Understanding and generating responses

Development Timeline

v0.1.0November 2025

Initial training phase. Single-turn instruction following. Basic command execution without multi-step reasoning.

v0.1.1December 2025

Introduced multi-turn trajectories. Model learns agentic loop: Act → Receive Feedback → Decide Next Step. Natural language prompts.

v0.1.2January 2026

Production-quality training data with comprehensive scenario coverage. Web development, authentication, deployment, research-first patterns, and multi-file project creation.

Training Approach

Unlike traditional fine-tuning that optimizes for single responses, Koji's training emphasizes multi-turn trajectory learning. Each training sample represents a complete task execution: intent → research → planning → execution → verification → response.

Training Coverage

v0.1.0

v0.1.1

v0.1.2

Behavioral Improvement

Before

User: "Check my files"

→ Generic response, no action

After

User: "Check my files"

→ Lists directory, reports results

Capability Coverage

Core system operations (file, memory, terminal)
Web development workflows (project scaffolding, components)
Research-first patterns (learn before implementing)
Multi-file project creation
Error recovery and debugging
Authentication and database integration
Deployment workflows
Cross-domain tasks (medical, robotics, science)

Performance Characteristics

Metric	Target
Inference Latency	< 1 second (local)
Memory Footprint	Optimized for 16GB+ RAM
Context Window	32K tokens active
Action Accuracy	Continuous improvement

Validation

After each training cycle, Koji is tested against real-world scenarios including multi-file projects, research tasks, error recovery, and memory recall to ensure production readiness.

Roadmap

Phase	Focus
Current	Production training, full coverage
Next	Real-world fine-tuning from feedback
Future	Extended context, longer memory

← Back to Platform