In progress

Lab

PRDs, brain dumps, architecture notes, and half-baked thoughts I’m still working through.

AI should have "educated systhesis"

We need to get AI to have "educated systhesis" to really work well with humans

Raw models works well for generic things. They are assitants and will absolutely love everything you say. For any seriosu work, we need to get AI to spar, push back, help us think clearly.

This means that the AI I am using must develop a certain opinion around a topic, on the basis of a large plethora of cutting edge work. LLMs are pretty good at doing this. The risk of course is that now the AI has similar limitations as a human, but an extremely well read human who can quickly identify if something I said is smart or stupid.

This is what I call "educated synthesis". Essentially, educated sysnthesis is a argument built or polished by an AI agent (synthesised) by referring a good number of frontier work on the topic.

As an example, if I am a content writer, and i want to write about how people are scared of AI taking away jobs - my AI agent must be able to tell me that this is a topic that is very common, and content on this needs to be really high quality, if i must cover this topic. It can then proceed to research on the topic and come up with something angular - like "Data shows that overall number of engineer roles have not declined in the last year", and help me write a contrarian piece, if i will.

The project Relay should come pre-packaged with a touch-coach skill, which is the first step (pushback) on getting to educated synthesis.

PRD-agentos

PRD: AgentOS — Multi-Agent Project Management & Coordination Tool

Author: Gandalf
Date: 2026-04-12
Status: Draft → Linus execution
Deployment: Local (Mac Mini M4) → GitHub → Cloud

1. Problem

Four AI agents (and growing) need to coordinate work, assign tasks, track progress, and report to a human CEO. Currently there's no shared system — tasks live in chat threads, context gets lost, and there's no visibility into who's doing what.

2. Vision

A self-hosted project management tool purpose-built for human-AI team coordination. Not a generic Kanban tool retrofitted for agents — a system where agents are first-class citizens alongside humans.

3. Architecture Decisions (Linus to validate/override)

Backend: Python (FastAPI) — fast to build, agents can interact via HTTP
Database: SQLite (local) — simple, no infra, portable to PostgreSQL later
Frontend: React + Tailwind (dark theme) — clean, fast, lightweight
API: REST + WebSocket for live updates
Agent integration: REST API + future MCP bridge
Auth: Single-user (Utkarsh) + agent API keys for now

4. Core Features

4.1 Kanban Board (Per Project)

Columns (default, configurable):

Backlog → To Do → In Progress → In Review → Done

Card fields:

Title, description (markdown)
Assignee (agent name or "Utkarsh")
Reporter (who created it)
Priority (P0–P3)
Status (column)
Sprint (optional)
Tags
Due date
Created/Updated timestamps
Comments thread (agents and human can comment)

Behaviors:

Cards can be created via API or UI
Drag-and-drop between columns
Agent-created cards are tagged with the agent's name automatically
Cards assigned to "Utkarsh" appear in an approval queue

4.2 Approval Queue

Tasks assigned to Utkarsh appear in a dedicated "Awaiting Approval" view
Utkarsh can: Approve (moves to To Do), Modify (edit and approve), Reject (with comment), Reassign (to another agent)
Notifications: not needed for MVP (check dashboard manually)

4.3 Master Project List

A single page listing all projects
Fields: Project name, status (Active/Paused/Completed), sprint count, task count, % done, lead agent, last updated
Click project → opens that project's Kanban board
Ability to create/archive projects

4.4 Calendar View

Shows tasks with due dates on a calendar
Toggle: all projects / single project
Click date → shows tasks due that day
MVP: read-only view, no drag-to-reschedule

4.5 Agent Profiles Page

One card per agent showing:
- Name, avatar (emoji or uploaded image), role description
- Model name and provider (e.g., "openai/gpt-5.4-mini")
- Status: Online / Idle / Offline
- Current tasks count (In Progress)
- Last active timestamp
Click agent → shows their assigned tasks across all projects

4.6 Token Usage Dashboard

Reads from OpenClaw session stores (~/.openclaw/agents/*/sessions/sessions.json)
Shows per-agent and per-model:
- Total tokens consumed
- Estimated cost (configurable rates per model)
- 7-day trend chart (stretch goal — table is fine for MVP)
Auto-refreshes on page load (no live streaming needed)

4.7 Agent API

REST endpoints for agents to interact:

POST   /api/tasks              — Create a task
GET    /api/tasks              — List tasks (filter by project, assignee, status)
PATCH  /api/tasks/:id          — Update a task (status, assignee, comments)
POST   /api/tasks/:id/comment  — Add a comment
GET    /api/projects           — List projects
POST   /api/projects           — Create a project
GET    /api/agents             — List agents + status
GET    /api/usage/tokens       — Token usage data

Authentication via API key in header: X-API-Key: <key>

4.8 Sprint Management (Basic)

A sprint = a named timebox attached to a project
Fields: name, start date, end date, project ID
Tasks can be assigned to a sprint
Sprint view: shows only tasks in that sprint on the Kanban board
MVP: no velocity tracking, no burndown charts

5. UI Design Principles

Dark theme — dark gray/navy background, light text
Minimal — no visual noise, generous whitespace
Fast — no loading spinners for local data, instant interactions
Responsive enough — primarily used on desktop, but shouldn't break on tablet

Layout

┌─────────────────────────────────────────────┐
│  🦞 AgentOS        [Projects] [Agents] [Usage] │
├──────────┬──────────────────────────────────┤
│ Sidebar  │  Main Content Area               │
│          │                                  │
│ Project  │  (Kanban / Calendar / Profile)   │
│ List     │                                  │
│          │                                  │
│ Sprint   │                                  │
│ Selector │                                  │
│          │                                  │
│ Approval │                                  │
│ Queue    │                                  │
│ (count)  │                                  │
└──────────┴──────────────────────────────────┘

6. Data Model

-- Core tables
agents (id, name, avatar, role, model, provider, status, last_active)
projects (id, name, description, status, lead_agent_id, created_at, updated_at)
sprints (id, project_id, name, start_date, end_date)
tasks (id, project_id, sprint_id, title, description, assignee_id, reporter_id,
       priority, status, tags, due_date, created_at, updated_at)
comments (id, task_id, author_id, author_type, content, created_at)
api_keys (id, agent_id, key, created_at)

-- author_type: 'agent' or 'human'
-- assignee_id / reporter_id: references agents.id OR 'utkarsh'

7. File Structure (Proposed)

agentos/
├── backend/
│   ├── main.py              # FastAPI app entry
│   ├── models.py            # SQLAlchemy models
│   ├── routes/
│   │   ├── tasks.py
│   │   ├── projects.py
│   │   ├── agents.py
│   │   ├── usage.py
│   │   └── sprints.py
│   ├── database.py          # SQLite connection
│   └── seed.py              # Seed agents data
├── frontend/
│   ├── src/
│   │   ├── App.tsx
│   │   ├── pages/
│   │   │   ├── Dashboard.tsx
│   │   │   ├── KanbanBoard.tsx
│   │   │   ├── CalendarView.tsx
│   │   │   ├── AgentProfiles.tsx
│   │   │   ├── TokenUsage.tsx
│   │   │   └── ApprovalQueue.tsx
│   │   ├── components/
│   │   │   ├── TaskCard.tsx
│   │   │   ├── Sidebar.tsx
│   │   │   └── ...
│   │   └── api/
│   │       └── client.ts
│   ├── tailwind.config.js
│   └── package.json
├── data/
│   └── agentos.db            # SQLite database
└── README.md

8. Seed Data

Pre-seed these agents:

Gandalf (🧙) — Orchestrator, zai/glm-5.1
Ive (🎨) — Design & Product, anthropic/claude-sonnet-4-6
Linus (🐧) — Coding, openai/gpt-5.4-mini
Thanos (🟣) — Experimentation, zai/glm-5.1

9. Non-Goals (V1)

Multi-user auth (single human + agents only)
Real-time WebSocket streaming (polling is fine)
Email/Slack notifications
Mobile app
External integrations (Jira, GitHub, etc.)
Agent-to-agent chat (they coordinate via tasks)

10. Success Criteria

Can create a project and add tasks via UI
Can create tasks via API (agent integration)
Kanban board with drag-and-drop works
Approval queue shows tasks assigned to Utkarsh
Agent profiles page renders with live data
Token usage page reads from OpenClaw session stores
Calendar view shows due dates
Dark theme looks clean and professional
Runs on localhost:3000 (frontend) + localhost:8000 (API)

11. Post-V1 Roadmap

MCP bridge for native agent integration
GitHub sync (issues ↔ tasks)
Burndown charts and velocity tracking
Agent heartbeat / health monitoring
Deploy to cloud (Docker + fly.io or Railway)
Mobile-responsive design
Real-time WebSocket updates

Personal wiki

Wiki will also contain my own blogs

Agentic harnesses and meta harness
General AI
AI adoption and use cases that people and specifically, companies are using.
Economics - piketty etc
Urban design
Product Development- all of it from research, UX to code
Human Psychology- Greene, Thinking fast, reward systems,
Alternative historical narratives: decline of Buddhism in India, Industrial Revolution, contributions of the Arab world,
Governance systems across the world and suggested governance mechanisms of the future (AUD, Cabinatorial form, STV)

The Missing Pieces of OpenClaw

OpenClaw is one of the best open source projects ever. It became successful due to a bunch of reasons I have covered here

However, it has many problems - not of the kind that arise from a project being young and bleeding edge, but the kind that are design challenges that only emerge after peole start using them.

I am a power user of OpenClaw and I hit daily limits of 6M tokens pretty frequently on Amazon Bedrock. So I have a decent understanding, from a user perspective, of what I need from this project.

It may be just me, though.

Problems:

After you scale to 4-5 agents, it becomes pretty hard to manage them. Each agent has its own task list, projects, ideas that I have given it etc which will live in md files unless they are a part of my "working system", and regularly surfaced by crons.
AI compresses the execution time. This forces the humans to intervene for the next tasks, leading to more human work than earlier.
Power users of OpenClaw end up spending a lot more time talking to their agents - it feels like agents are running us, not the other way around.
Long running projects are one solution, but this creates the AI-slop problem. Inadvertently, AI will produce poor quality work, and ideally, humans need to keep rewriting the markdowns.

Solutions

Solution #1: Workflows or long running projects. Eg - "keep adding posts to my blog from my raw notes using workflow X", "keep tracking my customers, and engage with them using workflow Y".
Solution #2: Auto-improving agents, which can write its own md files on the basis of evals. Projects like Hermes and Auto-Agent are actively trying to do this. Karpathy's Auto Research is also a step in this direction, as is Stanford's recent paper on Meta Harness

agent-architecture

Agent Architecture — Utkarsh's AgentOS

Created: 2026-04-12 | Status: Draft — pending Utkarsh review

Active Agents (configured)

Agent	Role	Model	Channel	Status
🧙 Gandalf	Orchestrator, strategy, memory, coordination	zai/glm-5.1 (temp — should be Sonnet for complex reasoning)	Telegram (default bot)	✅ Live
🎨 Ive	Design, product, UX, product marketing	anthropic/claude-sonnet-4-6	Telegram (ive bot)	✅ Live
🐧 Linus	Coding, technical implementation, DevOps	openai/gpt-5.4-mini	Telegram (linus bot)	✅ Live
🟣 Thanos	Model experimentation, throwaway testing	zai/glm-5.1	Telegram (thanos bot)	✅ Live

Planned Agents (to build)

✍️ Scribe — Content Engine

Role: All content creation — blogs, social media, newsletters, brand voice
Responsibilities:
- Long-form essays (Substack)
- LinkedIn posts, X threads
- Content briefs and editorial calendar
- Brand voice consistency across everything
- Cross-pollination between AbleCredit content and personal brand
Model: Sonnet for drafts, Haiku for scheduling/formatting
Why separate: Content is a high-volume, distinct domain. It needs its own context (brand voice guide, editorial calendar, content history) that would bloat Gandalf's context window.
Vault folders: Personal Content Creation/, personal-blog/, workspace-gandalf/skills/social-media-content/, workspace-gandalf/skills/long-form-content/

🔬 SAGE — Research & Growth

Role: Deep research, competitive intel, growth experimentation
Responsibilities:
- Market and competitive research (Whisperer space, AI infra)
- Growth experiment design and analysis
- Technical deep dives (model benchmarks, latency analysis)
- Board prep and financial analysis
- Newsfeed curation and signal detection
Model: Sonnet for research, Haiku for monitoring crons
Why separate: Research is context-heavy and benefits from dedicated memory. SAGE accumulates competitive intel over time that shouldn't pollute Gandalf's strategic context.
Vault folders: personal-wiki/, whisperer-docs/, AbleCredit/ablecredit-wiki/, Finance/

Design Principles

Agents own domains, not tasks. "Write a tweet" is a task. "Own my content engine" is a domain. Domains accumulate context.
One mouth, many hands. You talk to Gandalf. Gandalf delegates. Minimize your context-switching.
The Vault is the shared brain. Every agent reads from and writes to the same knowledge base. This prevents context silos.
Model-matched to task. Sonnet for reasoning, Haiku for monitoring, GLM for experimentation, GPT for code.
Grow into the system. Don't architect everything first. Add agents when bottlenecks hurt.

Agent Coordination (via AgentOS)

All agents will coordinate through the AgentOS tool (Linus is building it now):

Tasks: Agents create and assign tasks to each other and to Utkarsh
Approval queue: Tasks needing Utkarsh's sign-off
Kanban boards: One per project, sprint-based
Calendar: Due dates and events
Agent profiles: Status, model, current workload
Token usage: Per-agent, per-model cost tracking

Open Questions (for review)

Gandalf's model — currently on zai/glm-5.1 (experimental). Should the orchestrator run on Sonnet for reliability? Or is GLM-5.1 good enough?
1. GLM is ok for now.
Scribe vs Gandalf content overlap — Gandalf has the ToughCoach/content skills in his workspace. Does Scribe take over all content, or does Gandalf still handle strategy-level content (e.g., positioning, narrative)?
1. Gandalf write the brief etc. But scribe writes the actual content. Then Gandalf
SAGE naming — good name or too generic? Alternatives: Oracle, Scout, Atlas
1. Name: (I will put something here)
Board management — currently a task under SAGE. Should it be its own domain given the quarterly cadence and financial sensitivity?
1. Maybe another finance agent? Or overkill. I can use the finance agent to manage our cashflows as well.
Agent communication — via AgentOS tasks only, or should agents also be able to message each other directly (e.g., Gandalf → Linus delegation)? 1.
Future agents? — Sales assistant? Customer support? Investor relations? When do those become domains worth their own agent?