Koji Operator
Development Progress — Fine-Tuned Instruction Model for CHO Platform
What is Koji?
Koji is the instruction-tuned operator model that powers CHO's decision-making. It translates natural language tasks into structured actions that control the desktop environment, manage memory, execute research, and communicate with users.
Built on an open-weight foundation model, Koji is fine-tuned specifically for agentic workflows—learning to think step-by-step, verify before acting, and chain multiple operations to complete complex tasks.
Pushing Technical Limits
Building an AI operator that controls a computer like a human requires solving problems that don't exist in traditional AI assistants. Koji is designed to operate at the edge of what's possible with current hardware.
Traditional LLM inference is too slow for real-time desktop control. Koji runs locally on Apple Silicon with optimized quantization, achieving response times under one second for most actions. No cloud round-trip. No API latency. Pure local inference.
Unlike chatbots that respond to single prompts, Koji maintains a continuous reasoning loop: observe → think → act → observe feedback → adapt. This mirrors how humans approach complex tasks—checking work, adjusting plans, recovering from errors.
Real intelligence requires memory. Koji implements a hierarchical memory system inspired by human cognition—short-term working memory for active tasks, long-term storage for learned preferences, and associative recall that surfaces relevant context automatically.
When encountering unfamiliar technology, Koji researches before acting—searching documentation, reading examples, storing knowledge, then proceeding with implementation. This prevents hallucination and ensures outputs reflect current best practices.
Cognitive Architecture
Koji's architecture is modeled after human cognition, with distinct subsystems for different types of processing:
| System | Function |
|---|---|
| Sensory | Visual processing, screen understanding |
| Memory | Short-term, long-term, episodic recall |
| Attention | Focus management, priority weighting |
| Reasoning | Planning, problem decomposition |
| Motor | Mouse, keyboard, gesture execution |
| Language | Understanding and generating responses |
Development Timeline
Initial training phase. Single-turn instruction following. Basic command execution without multi-step reasoning.
Introduced multi-turn trajectories. Model learns agentic loop: Act → Receive Feedback → Decide Next Step. Natural language prompts.
Production-quality training data with comprehensive scenario coverage. Web development, authentication, deployment, research-first patterns, and multi-file project creation.
Training Approach
Unlike traditional fine-tuning that optimizes for single responses, Koji's training emphasizes multi-turn trajectory learning. Each training sample represents a complete task execution: intent → research → planning → execution → verification → response.
Behavioral Improvement
User: "Check my files"
→ Generic response, no action
User: "Check my files"
→ Lists directory, reports results
Capability Coverage
- Core system operations (file, memory, terminal)
- Web development workflows (project scaffolding, components)
- Research-first patterns (learn before implementing)
- Multi-file project creation
- Error recovery and debugging
- Authentication and database integration
- Deployment workflows
- Cross-domain tasks (medical, robotics, science)
Performance Characteristics
| Metric | Target |
|---|---|
| Inference Latency | < 1 second (local) |
| Memory Footprint | Optimized for 16GB+ RAM |
| Context Window | 32K tokens active |
| Action Accuracy | Continuous improvement |
Validation
After each training cycle, Koji is tested against real-world scenarios including multi-file projects, research tasks, error recovery, and memory recall to ensure production readiness.
Roadmap
| Phase | Focus |
|---|---|
| Current | Production training, full coverage |
| Next | Real-world fine-tuning from feedback |
| Future | Extended context, longer memory |