In progress

Lab

PRDs, brain dumps, architecture notes, and half-baked thoughts I’m still working through.

AI should have "educated systhesis"

We need to get AI to have "educated systhesis" to really work well with humans

Raw models works well for generic things. They are assitants and will absolutely love everything you say. For any seriosu work, we need to get AI to spar, push back, help us think clearly.

This means that the AI I am using must develop a certain opinion around a topic, on the basis of a large plethora of cutting edge work. LLMs are pretty good at doing this. The risk of course is that now the AI has similar limitations as a human, but an extremely well read human who can quickly identify if something I said is smart or stupid.

This is what I call "educated synthesis". Essentially, educated sysnthesis is a argument built or polished by an AI agent (synthesised) by referring a good number of frontier work on the topic.

As an example, if I am a content writer, and i want to write about how people are scared of AI taking away jobs - my AI agent must be able to tell me that this is a topic that is very common, and content on this needs to be really high quality, if i must cover this topic. It can then proceed to research on the topic and come up with something angular - like "Data shows that overall number of engineer roles have not declined in the last year", and help me write a contrarian piece, if i will.

The project Relay should come pre-packaged with a touch-coach skill, which is the first step (pushback) on getting to educated synthesis.

PRD-agentos

PRD: AgentOS — Multi-Agent Project Management & Coordination Tool

Author: Gandalf
Date: 2026-04-12
Status: Draft → Linus execution
Deployment: Local (Mac Mini M4) → GitHub → Cloud


1. Problem

Four AI agents (and growing) need to coordinate work, assign tasks, track progress, and report to a human CEO. Currently there's no shared system — tasks live in chat threads, context gets lost, and there's no visibility into who's doing what.

2. Vision

A self-hosted project management tool purpose-built for human-AI team coordination. Not a generic Kanban tool retrofitted for agents — a system where agents are first-class citizens alongside humans.

3. Architecture Decisions (Linus to validate/override)

  • Backend: Python (FastAPI) — fast to build, agents can interact via HTTP
  • Database: SQLite (local) — simple, no infra, portable to PostgreSQL later
  • Frontend: React + Tailwind (dark theme) — clean, fast, lightweight
  • API: REST + WebSocket for live updates
  • Agent integration: REST API + future MCP bridge
  • Auth: Single-user (Utkarsh) + agent API keys for now

4. Core Features

4.1 Kanban Board (Per Project)

Columns (default, configurable):

  • Backlog → To Do → In Progress → In Review → Done

Card fields:

  • Title, description (markdown)
  • Assignee (agent name or "Utkarsh")
  • Reporter (who created it)
  • Priority (P0–P3)
  • Status (column)
  • Sprint (optional)
  • Tags
  • Due date
  • Created/Updated timestamps
  • Comments thread (agents and human can comment)

Behaviors:

  • Cards can be created via API or UI
  • Drag-and-drop between columns
  • Agent-created cards are tagged with the agent's name automatically
  • Cards assigned to "Utkarsh" appear in an approval queue

4.2 Approval Queue

  • Tasks assigned to Utkarsh appear in a dedicated "Awaiting Approval" view
  • Utkarsh can: Approve (moves to To Do), Modify (edit and approve), Reject (with comment), Reassign (to another agent)
  • Notifications: not needed for MVP (check dashboard manually)

4.3 Master Project List

  • A single page listing all projects
  • Fields: Project name, status (Active/Paused/Completed), sprint count, task count, % done, lead agent, last updated
  • Click project → opens that project's Kanban board
  • Ability to create/archive projects

4.4 Calendar View

  • Shows tasks with due dates on a calendar
  • Toggle: all projects / single project
  • Click date → shows tasks due that day
  • MVP: read-only view, no drag-to-reschedule

4.5 Agent Profiles Page

  • One card per agent showing:
    • Name, avatar (emoji or uploaded image), role description
    • Model name and provider (e.g., "openai/gpt-5.4-mini")
    • Status: Online / Idle / Offline
    • Current tasks count (In Progress)
    • Last active timestamp
  • Click agent → shows their assigned tasks across all projects

4.6 Token Usage Dashboard

  • Reads from OpenClaw session stores (~/.openclaw/agents/*/sessions/sessions.json)
  • Shows per-agent and per-model:
    • Total tokens consumed
    • Estimated cost (configurable rates per model)
    • 7-day trend chart (stretch goal — table is fine for MVP)
  • Auto-refreshes on page load (no live streaming needed)

4.7 Agent API

REST endpoints for agents to interact:

POST   /api/tasks              — Create a task
GET    /api/tasks              — List tasks (filter by project, assignee, status)
PATCH  /api/tasks/:id          — Update a task (status, assignee, comments)
POST   /api/tasks/:id/comment  — Add a comment
GET    /api/projects           — List projects
POST   /api/projects           — Create a project
GET    /api/agents             — List agents + status
GET    /api/usage/tokens       — Token usage data

Authentication via API key in header: X-API-Key: <key>

4.8 Sprint Management (Basic)

  • A sprint = a named timebox attached to a project
  • Fields: name, start date, end date, project ID
  • Tasks can be assigned to a sprint
  • Sprint view: shows only tasks in that sprint on the Kanban board
  • MVP: no velocity tracking, no burndown charts

5. UI Design Principles

  • Dark theme — dark gray/navy background, light text
  • Minimal — no visual noise, generous whitespace
  • Fast — no loading spinners for local data, instant interactions
  • Responsive enough — primarily used on desktop, but shouldn't break on tablet

Layout

┌─────────────────────────────────────────────┐
│  🦞 AgentOS        [Projects] [Agents] [Usage] │
├──────────┬──────────────────────────────────┤
│ Sidebar  │  Main Content Area               │
│          │                                  │
│ Project  │  (Kanban / Calendar / Profile)   │
│ List     │                                  │
│          │                                  │
│ Sprint   │                                  │
│ Selector │                                  │
│          │                                  │
│ Approval │                                  │
│ Queue    │                                  │
│ (count)  │                                  │
└──────────┴──────────────────────────────────┘

6. Data Model

-- Core tables
agents (id, name, avatar, role, model, provider, status, last_active)
projects (id, name, description, status, lead_agent_id, created_at, updated_at)
sprints (id, project_id, name, start_date, end_date)
tasks (id, project_id, sprint_id, title, description, assignee_id, reporter_id,
       priority, status, tags, due_date, created_at, updated_at)
comments (id, task_id, author_id, author_type, content, created_at)
api_keys (id, agent_id, key, created_at)

-- author_type: 'agent' or 'human'
-- assignee_id / reporter_id: references agents.id OR 'utkarsh'

7. File Structure (Proposed)

agentos/
├── backend/
│   ├── main.py              # FastAPI app entry
│   ├── models.py            # SQLAlchemy models
│   ├── routes/
│   │   ├── tasks.py
│   │   ├── projects.py
│   │   ├── agents.py
│   │   ├── usage.py
│   │   └── sprints.py
│   ├── database.py          # SQLite connection
│   └── seed.py              # Seed agents data
├── frontend/
│   ├── src/
│   │   ├── App.tsx
│   │   ├── pages/
│   │   │   ├── Dashboard.tsx
│   │   │   ├── KanbanBoard.tsx
│   │   │   ├── CalendarView.tsx
│   │   │   ├── AgentProfiles.tsx
│   │   │   ├── TokenUsage.tsx
│   │   │   └── ApprovalQueue.tsx
│   │   ├── components/
│   │   │   ├── TaskCard.tsx
│   │   │   ├── Sidebar.tsx
│   │   │   └── ...
│   │   └── api/
│   │       └── client.ts
│   ├── tailwind.config.js
│   └── package.json
├── data/
│   └── agentos.db            # SQLite database
└── README.md

8. Seed Data

Pre-seed these agents:

  • Gandalf (🧙) — Orchestrator, zai/glm-5.1
  • Ive (🎨) — Design & Product, anthropic/claude-sonnet-4-6
  • Linus (🐧) — Coding, openai/gpt-5.4-mini
  • Thanos (🟣) — Experimentation, zai/glm-5.1

9. Non-Goals (V1)

  • Multi-user auth (single human + agents only)
  • Real-time WebSocket streaming (polling is fine)
  • Email/Slack notifications
  • Mobile app
  • External integrations (Jira, GitHub, etc.)
  • Agent-to-agent chat (they coordinate via tasks)

10. Success Criteria

  • Can create a project and add tasks via UI
  • Can create tasks via API (agent integration)
  • Kanban board with drag-and-drop works
  • Approval queue shows tasks assigned to Utkarsh
  • Agent profiles page renders with live data
  • Token usage page reads from OpenClaw session stores
  • Calendar view shows due dates
  • Dark theme looks clean and professional
  • Runs on localhost:3000 (frontend) + localhost:8000 (API)

11. Post-V1 Roadmap

  • MCP bridge for native agent integration
  • GitHub sync (issues ↔ tasks)
  • Burndown charts and velocity tracking
  • Agent heartbeat / health monitoring
  • Deploy to cloud (Docker + fly.io or Railway)
  • Mobile-responsive design
  • Real-time WebSocket updates

Personal wiki

Wiki will also contain my own blogs

  1. Agentic harnesses and meta harness
  2. General AI 
  3. AI adoption and use cases that people and specifically, companies are using. 
  4. Economics - piketty etc
  5. Urban design
  6. Product Development- all of it from research, UX to code
  7. Human Psychology- Greene, Thinking fast, reward systems, 
  8. Alternative historical narratives: decline of Buddhism in India, Industrial Revolution, contributions of the Arab world, 
  9. Governance systems across the world and suggested governance mechanisms of the future (AUD, Cabinatorial form, STV)

The Missing Pieces of OpenClaw

OpenClaw is one of the best open source projects ever. It became successful due to a bunch of reasons I have covered here

However, it has many problems - not of the kind that arise from a project being young and bleeding edge, but the kind that are design challenges that only emerge after peole start using them.

I am a power user of OpenClaw and I hit daily limits of 6M tokens pretty frequently on Amazon Bedrock. So I have a decent understanding, from a user perspective, of what I need from this project.

It may be just me, though.

Problems:

  1. After you scale to 4-5 agents, it becomes pretty hard to manage them. Each agent has its own task list, projects, ideas that I have given it etc which will live in md files unless they are a part of my "working system", and regularly surfaced by crons.
  2. AI compresses the execution time. This forces the humans to intervene for the next tasks, leading to more human work than earlier.
  3. Power users of OpenClaw end up spending a lot more time talking to their agents - it feels like agents are running us, not the other way around.
  4. Long running projects are one solution, but this creates the AI-slop problem. Inadvertently, AI will produce poor quality work, and ideally, humans need to keep rewriting the markdowns.

Solutions

  1. Solution #1: Workflows or long running projects. Eg - "keep adding posts to my blog from my raw notes using workflow X", "keep tracking my customers, and engage with them using workflow Y".
  2. Solution #2: Auto-improving agents, which can write its own md files on the basis of evals. Projects like Hermes and Auto-Agent are actively trying to do this. Karpathy's Auto Research is also a step in this direction, as is Stanford's recent paper on Meta Harness

agent-architecture

Agent Architecture — Utkarsh's AgentOS

Created: 2026-04-12 | Status: Draft — pending Utkarsh review


Active Agents (configured)

Agent Role Model Channel Status
🧙 Gandalf Orchestrator, strategy, memory, coordination zai/glm-5.1 (temp — should be Sonnet for complex reasoning) Telegram (default bot) ✅ Live
🎨 Ive Design, product, UX, product marketing anthropic/claude-sonnet-4-6 Telegram (ive bot) ✅ Live
🐧 Linus Coding, technical implementation, DevOps openai/gpt-5.4-mini Telegram (linus bot) ✅ Live
🟣 Thanos Model experimentation, throwaway testing zai/glm-5.1 Telegram (thanos bot) ✅ Live

Planned Agents (to build)

✍️ Scribe — Content Engine

  • Role: All content creation — blogs, social media, newsletters, brand voice
  • Responsibilities:
    • Long-form essays (Substack)
    • LinkedIn posts, X threads
    • Content briefs and editorial calendar
    • Brand voice consistency across everything
    • Cross-pollination between AbleCredit content and personal brand
  • Model: Sonnet for drafts, Haiku for scheduling/formatting
  • Why separate: Content is a high-volume, distinct domain. It needs its own context (brand voice guide, editorial calendar, content history) that would bloat Gandalf's context window.
  • Vault folders: Personal Content Creation/, personal-blog/, workspace-gandalf/skills/social-media-content/, workspace-gandalf/skills/long-form-content/

🔬 SAGE — Research & Growth

  • Role: Deep research, competitive intel, growth experimentation
  • Responsibilities:
    • Market and competitive research (Whisperer space, AI infra)
    • Growth experiment design and analysis
    • Technical deep dives (model benchmarks, latency analysis)
    • Board prep and financial analysis
    • Newsfeed curation and signal detection
  • Model: Sonnet for research, Haiku for monitoring crons
  • Why separate: Research is context-heavy and benefits from dedicated memory. SAGE accumulates competitive intel over time that shouldn't pollute Gandalf's strategic context.
  • Vault folders: personal-wiki/, whisperer-docs/, AbleCredit/ablecredit-wiki/, Finance/

Design Principles

  1. Agents own domains, not tasks. "Write a tweet" is a task. "Own my content engine" is a domain. Domains accumulate context.
  2. One mouth, many hands. You talk to Gandalf. Gandalf delegates. Minimize your context-switching.
  3. The Vault is the shared brain. Every agent reads from and writes to the same knowledge base. This prevents context silos.
  4. Model-matched to task. Sonnet for reasoning, Haiku for monitoring, GLM for experimentation, GPT for code.
  5. Grow into the system. Don't architect everything first. Add agents when bottlenecks hurt.

Agent Coordination (via AgentOS)

All agents will coordinate through the AgentOS tool (Linus is building it now):

  • Tasks: Agents create and assign tasks to each other and to Utkarsh
  • Approval queue: Tasks needing Utkarsh's sign-off
  • Kanban boards: One per project, sprint-based
  • Calendar: Due dates and events
  • Agent profiles: Status, model, current workload
  • Token usage: Per-agent, per-model cost tracking

Open Questions (for review)

  1. Gandalf's model — currently on zai/glm-5.1 (experimental). Should the orchestrator run on Sonnet for reliability? Or is GLM-5.1 good enough?
    1. GLM is ok for now.
  2. Scribe vs Gandalf content overlap — Gandalf has the ToughCoach/content skills in his workspace. Does Scribe take over all content, or does Gandalf still handle strategy-level content (e.g., positioning, narrative)?
    1. Gandalf write the brief etc. But scribe writes the actual content. Then Gandalf
  3. SAGE naming — good name or too generic? Alternatives: Oracle, Scout, Atlas
    1. Name: (I will put something here)
  4. Board management — currently a task under SAGE. Should it be its own domain given the quarterly cadence and financial sensitivity?
    1. Maybe another finance agent? Or overkill. I can use the finance agent to manage our cashflows as well.
  5. Agent communication — via AgentOS tasks only, or should agents also be able to message each other directly (e.g., Gandalf → Linus delegation)? 1.
  6. Future agents? — Sales assistant? Customer support? Investor relations? When do those become domains worth their own agent?