Agent RDP

What it does

Agent RDP is a CLI tool that lets AI agents connect to Windows machines over RDP and interact with them through two modes:

Computer use — screenshot the screen, then click/type/scroll based on what the agent sees (works with any application)
Accessibility-based interaction — read and manipulate UI elements directly via the Windows UI Automation tree (faster, more precise, no vision model needed)

It bridges the gap between language model tool-use and GUI-based desktop applications that have no API.

Why it exists

A surprising amount of enterprise software only has a GUI. No API, no CLI, no webhooks. If you want to automate it, you either reverse-engineer the internals or you drive the interface. Agent RDP takes the second approach and makes it available to any agent framework that supports tool calling.

How it works

The system uses a daemon-per-session architecture: the CLI communicates with persistent background daemons via IPC. Sessions are named and independently managed. Responses are JSON-structured for AI agent consumption, with WebSocket streaming for real-time desktop capture.

Beyond the basic screenshot → decide → act loop, it also supports:

Windows UI Automation — interacts with native UI elements using accessibility patterns (InvokePattern, SelectionItemPattern, TogglePattern)
OCR — uses the ocrs library to locate text on screen when UI Automation can’t reach an element
PowerShell agent injection — captures the accessibility tree via Dynamic Virtual Channel (DVC)

Technical details

Protocol: RDP with TLS and CredSSP, built on IronRDP (Rust)
Languages: TypeScript/Node.js CLI, Rust native binary
Input: Mouse, keyboard, key combos, clipboard sync, drag, scroll
Screen capture: Screenshots in PNG/JPEG with base64 encoding
Storage: Local directories can be mapped as network drives on the remote machine
Integration: Works as a tool provider for any AI agent framework

Status

Active development. Used in production for automating legacy Windows applications that resist every other form of integration.