In the previous post, I covered the reverse-engineering story: using Claude Code and Frida to crack open WeChat’s encryption, extract database keys from memory, and build a programmatic interface to the official client. This post is about how the system actually works, and how you’d use it.
Architecture: one API, many clients
Two problems drive the architecture:
WeChat needs a controlled environment. As we established in the previous post, UI automation, database reads, and memory instrumentation all need to run alongside the WeChat binary. A Docker container gives us that: one container = one isolated WeChat instance with everything it needs.
Different use cases need different interfaces. A CLI is great for quick control. OpenClaw needs a channel plugin. Wechaty users want to connect their existing bots. Rather than building each of these into the container, we put a REST + WebSocket API in front of everything and let each client talk to it in its own idiomatic way.
graph LR
subgraph Clients
CLI["CLI (wx)"]
WP["Wechaty Puppet"]
OC["OpenClaw Plugin"]
WC2["Wechaty Client"]
end
subgraph Gateway Container
GW["Wechaty Gateway<br/>(gRPC)"]
end
subgraph Agent Container
AS["agent-server<br/>(Rust/Axum)"]
WC["WeChat Binary"]
XV["Xvfb<br/>(Virtual Display)"]
DB["SQLCipher DBs"]
end
CLI -->|REST + WS| AS
WP -->|REST + WS| AS
OC -->|REST + WS| AS
WC2 -->|gRPC| GW
GW -->|REST + WS| AS
AS -->|AT-SPI| WC
AS -->|Frida| WC
AS -->|SQLCipher| DB
WC --> XV
WC -->|Read/Write| DBOne container = one WeChat instance. The design puts all the intelligence in the container: the agent-server handles database reads, memory instrumentation, and UI automation, so clients don’t need to know anything about WeChat internals. They just make API calls. This means you can swap clients freely, run the container locally or in the cloud, and the same server handles everything.
The agent-server is written in Rust to keep resource usage low, since it shares a container with WeChat itself. The Wechaty Gateway is a separate container that bridges existing Wechaty clients (over gRPC) to the agent-server’s REST API, so you can plug agent-wechat into a Wechaty codebase without changing anything.
A few examples of what the API looks like:
# Check login status
GET /api/status/auth
# List recent chats
GET /api/chats?limit=20
# Send a message
POST /api/messages/send { "chatId": "wxid_abc123", "text": "hello" }
# Get messages from a chat
GET /api/messages/wxid_abc123?limit=50
# Download media from a message
GET /api/messages/wxid_abc123/media/12345
Lifecycle
When a client issues a command, the agent-server coordinates between the client and WeChat:
sequenceDiagram
participant C as Client
participant S as agent-server
participant W as WeChat
Note over S,W: Container starts: Xvfb, D-Bus, WeChat boot up
C->>S: Command (e.g. login, send message)
S->>W: Read DB / instrument memory / drive UI
W-->>S: State changes, data
S-->>C: Result (or timeout)UI state automation: react to what you see, not what you expect
Two problems make UI-based RPA hard, and both need to be solved for automation to be reliable:
UI state is non-deterministic. You can’t predict what screen you’ll see next. Network errors cause popups. The user scans a QR code but doesn’t confirm on their phone. The app is already logged in from a previous session. These are all external factors outside the automation’s control, and any of them can derail a script that assumes a fixed sequence of screens.
Commands span multiple UI states. A single operation like “login” touches several screens: QR code, phone confirmation, possibly an error dialog, then the main chat window. The automation needs to track where it is in this multi-step sequence separately from what’s currently on screen, because the two don’t always align.
The solution borrows directly from Redux:
- There is a single, central
AppStatethat represents everything the automation knows about the UI. - State is immutable. You never mutate it directly.
- Changes happen through reducers: the loop observes the screen, and a pure function takes the previous state plus the observation and produces a new state.
- Actions are modeled after side effects: they describe what to do (click this button, type this text), and the execution engine carries them out.
The plan selects actions based on the current state, but never modifies state itself. This separation means the plan doesn’t assume a fixed sequence of screens; it just responds to whatever state the reducer produces.
graph TD
subgraph s1 [" "]
A["OBSERVE<br/>a11y tree + screenshot"] --> B["IDENTIFY<br/>match known UI state"]
subgraph s2 [" "]
B --> C["REDUCE<br/>update abstract AppState"]
C --> D["SELECT<br/>plan picks next action"]
D --> E["EXECUTE<br/>click / type / key / scroll"]
E --> F{"Goal<br/>reached?"}
F -->|Yes| G["Return result"]
end
end
F -->|No| A
classDef invisible fill:none,stroke:none
class s1,s2 invisible- Observe — capture the accessibility tree via AT-SPI and optionally a screenshot
- Identify — pattern-match against known states: QR login screen? Chat window? Popup?
- Reduce — update an abstract
AppState(pure function of previous state + observation, like a Redux reducer) - Select — the active plan picks the next action based on current state and its goal
- Execute — carry out the action (click a button, type text, press a key, scroll)
One iteration during login, when the QR code is on screen:
| Step | Inputs | Outputs | Side effects |
|---|---|---|---|
| Observe | Virtual display | A11y tree with QR image and “Scan to Log In” label | |
| Identify | A11y tree | Matched state: QR login screen | |
| Reduce | Previous state + identified state | LoginQr | |
| Select | Current state + login plan | Extract QR action | |
| Execute | Extract QR action | Send QR to client via WebSocket |
Next iteration, if the user scanned, the screen changed, the reducer produces a new state, and the plan adjusts. If a popup appeared instead, same thing.
The same pattern handles sending messages, opening chats, and logging out.
Using agent-wechat: four ways in
There are four ways to talk to agent-wechat. All of them end up hitting the same REST + WebSocket API on the container.
CLI
The wx command gives you direct access from your terminal. Start by pulling and running the container (requires Docker Desktop or Colima):
$ wx up
Pulling ghcr.io/thisnick/agent-wechat:latest...
Starting agent-wechat container...
Container running on http://localhost:6174
$ wx auth login
Scan the QR code with WeChat:
█████████████████████████████
█████████████████████████████
█████████████████████████████
Login successful. User: Nick
$ wx chats list --limit 3
wxid_abc123 Nick Hey! What's up? 22:53
wxid_def456 Workflowly OpenClaw's bug.. 21:58
wxid_ghi789 File Transfer 21:30
$ wx messages send wxid_abc123 "Meeting at 3pm tomorrow?"
Sent.
$ wx messages list wxid_abc123 --limit 3
[22:54] → Meeting at 3pm tomorrow?
[22:53] ← Hey! What's up?
[22:50] ← Here?
Install with npm install -g @agent-wechat/cli. By default, the CLI connects to http://localhost:6174 and reads your token from ~/.config/agent-wechat/token. To point it at a remote instance:
export AGENT_WECHAT_URL=https://your-instance.agent-wx.app
export AGENT_WECHAT_TOKEN=your-token-here
Wechaty Puppet
If you’re building a bot, the Wechaty puppet gives you an event-driven API. It implements Wechaty’s standard puppet interface, so you get message handlers, contact management, and room support.
import { WechatyBuilder } from 'wechaty'
import { PuppetAgentWeChat } from '@agent-wechat/wechaty-puppet'
const bot = WechatyBuilder.build({
puppet: new PuppetAgentWeChat({
serverUrl: 'http://localhost:6174',
token: process.env.AGENT_WECHAT_TOKEN,
}),
})
bot.on('scan', (qrcode, status) => {
console.log(`Scan QR: https://wechaty.js.org/qrcode/${encodeURIComponent(qrcode)}`)
})
bot.on('login', (user) => {
console.log(`Logged in as ${user.name()}`)
})
bot.on('message', async (msg) => {
if (msg.text() === 'ping') {
await msg.say('pong')
}
})
await bot.start()
The puppet polls for new messages every 2 seconds and streams login events over WebSocket. It works against a local container or a remote hosted instance; just change the serverUrl.
Wechaty Gateway
The gateway wraps the puppet in a gRPC service. If you don’t want to add @agent-wechat/wechaty-puppet as a dependency in your code, you can run the gateway as a sidecar and connect to it using the standard wechaty-puppet-service client. This is especially useful if you already have a Wechaty codebase and just want to point it at a hosted agent-wechat instance.
Connect to a remote gateway using wechaty-puppet-service:
import { existsSync, readFileSync } from 'fs'
import { PuppetService } from 'wechaty-puppet-service'
// Use system CAs for TLS verification
for (const p of ['/etc/ssl/cert.pem', '/etc/ssl/certs/ca-certificates.crt']) {
if (existsSync(p)) {
process.env.WECHATY_PUPPET_SERVICE_TLS_CA_CERT = readFileSync(p, 'utf-8')
break
}
}
const endpoint = 'your-instance.agent-wx.app:8443'
const token = process.env.WECHATY_TOKEN // provided as-is, includes SNI prefix
const puppet = new PuppetService({
endpoint,
token,
tls: { serverName: endpoint.split(':')[0] },
})
puppet.on('message', async (payload) => {
const msg = await puppet.messagePayload(payload.messageId)
console.log(`[${msg.talkerId}] ${msg.text}`)
})
await puppet.start()
If you’re using a hosted instance, the token we provide already includes the SNI prefix, so you can pass it directly.
OpenClaw
OpenClaw is an open-source AI personal assistant you interact with through messaging platforms. The agent-wechat plugin adds WeChat as a channel, so your assistant can send and receive WeChat messages, handle images, and manage group conversations.
Install and configure it:
# Install the extension
openclaw plugins install @agent-wechat/wechat
# Add WeChat as a channel (defaults to localhost:6174)
openclaw channels add --channel wechat
# Or with a remote server
openclaw channels add --channel wechat --url <url> --token <token>
# Restart the gateway to pick up the new channel
openclaw gateway restart
Once running, tell your agent “Log in to WeChat” in whatever channel you’ve set up (Slack, Telegram, etc.). The agent will generate a QR code image right in the chat. Scan it with WeChat on your phone, confirm, and the session is live. You only need to do this once; the session persists across container restarts.
The plugin supports DM and group chat policies (open, allowlist, or disabled), and per-group mention requirements.
Hosting: run it yourself or let me run it for you
Self-hosting
The simplest setup is Docker Compose:
# Generate a token first:
# mkdir -p ~/.config/agent-wechat
# openssl rand -hex 32 > ~/.config/agent-wechat/token
services:
agent-wechat:
image: ghcr.io/thisnick/agent-wechat:latest
security_opt:
- seccomp=unconfined
cap_add:
- SYS_PTRACE
- NET_ADMIN
ports:
- "6174:6174"
volumes:
- agent-wechat-data:/data
- agent-wechat-home:/home/wechat
- ~/.config/agent-wechat/token:/data/auth-token:ro
environment:
- PROXY=${PROXY:-}
restart: unless-stopped
volumes:
agent-wechat-data:
agent-wechat-home:
SYS_PTRACE and seccomp=unconfined mean you need a real VM or a container runtime that allows privileged capabilities. Cloud Run, Fargate, and similar serverless container platforms won’t work.
Avoiding datacenter IP detection
Cloud providers use datacenter IPs, and WeChat may flag them. If you’re hosting in the cloud, route your outgoing traffic through a residential proxy. Set the PROXY environment variable and the container uses redsocks to transparently route all traffic through it. A residential proxy with sticky sessions works best, since you want WeChat to see a consistent, residential IP.
Don’t want to self-host?
I can set up a hosted instance for you, running on GCE with residential proxy routing already configured. You get a URL and a token. Plug them into the CLI, Wechaty Puppet, or OpenClaw and you’re up. Wechaty gateway is available on a separate port for existing Wechaty users.
The hosting fee supports ongoing development. WeChat ships new binaries regularly, and each update means finding new memory offsets for key extraction.
Interested? Reach out on GitHub or at [email protected].
The code is open source at thisnick/agent-wechat. The previous post covers how the reverse engineering worked. Questions? [email protected].