BACK TO ENGINEERING
Real-time 6 min read

Porting Go-Rod to TypeScript: Browser Automation from Scratch

I needed browser automation for VHS — Charmbracelet's terminal recording tool that controls Chrome to generate screenshots and videos. VHS depends on go-rod. So I ported it.

Now three packages are complete, with two more to go.


I – What Shipped

The go-rod library is being ported as a monorepo of five independent packages:

Package Status Tests Expects
@oakoliver/cdp-input ✅ Complete 28 38
@oakoliver/cdp ✅ Complete 16 40
@oakoliver/browser-launcher ✅ Complete 63 101
@oakoliver/cdp-proto 🔄 Pending
@oakoliver/rod 🔄 Pending
npm install @oakoliver/cdp-input @oakoliver/cdp @oakoliver/browser-launcher

Zero runtime dependencies. Each package is independently usable. Total: 107 tests, 179 expects, running on Bun.


II – The Architecture

Go-rod is not a monolith. It's a layered system where each layer handles one concern:

Layer 1: Input Encoding (lib/input) — Converts human-readable keys like "Enter" or "Ctrl+A" into Chrome DevTools Protocol structures. Handles keyboard events, mouse buttons, and platform-specific behavior (macOS text editing commands differ from Linux).

Layer 2: CDP Client (lib/cdp) — A WebSocket client that speaks Chrome DevTools Protocol. Sends JSON-RPC calls, receives responses and events, manages session multiplexing.

Layer 3: Browser Launcher (lib/launcher) — Finds Chrome on your system, spawns it with correct flags, parses the WebSocket debug URL from stderr, manages cleanup.

Layer 4: Protocol Types (lib/proto) — 35,000 lines of auto-generated Go structs matching every CDP domain: DOM, Network, Page, Runtime, Input, etc.

Layer 5: High-Level API (rod.go) — The Browser, Page, and Element types that users interact with. Methods like page.Navigate(), element.Click(), page.Screenshot().

Each layer depends only on the layers below it. Input encoding has no dependencies. CDP client uses input encoding. Browser launcher is independent. The high-level API ties everything together.

flowchart BT
    INPUT["Layer 1: Input Encoding\n@oakoliver/cdp-input"]
    CDP["Layer 2: CDP Client\n@oakoliver/cdp"]
    LAUNCH["Layer 3: Browser Launcher\n@oakoliver/browser-launcher"]
    ROD["Layer 5: High-Level API\n@oakoliver/rod"]

    CDP --> INPUT
    ROD --> CDP
    ROD --> LAUNCH
    ROD --> INPUT

III – The Hard Parts

Key Encoding: A Clever Collision-Avoidance Formula

The keyboard encoding needed to handle a subtle problem: some keys have the same keyCode but different location values. The regular 1 key and numpad 1 key both send keyCode: 49, but one has location: 0 (standard) and the other has location: 3 (numpad).

Go-rod solves this with a formula:

// For non-printable keys, encode location into the key
const encodedKey = keyCode + (location + 1) * 256;

This produces unique lookup keys: regular 1 maps to 49 + (0+1)*256 = 305, while numpad 1 maps to 49 + (3+1)*256 = 1073. The keymap uses these encoded values as keys, avoiding collisions.

When encoding an input, the system reverses the process:

export function encodeKey(key: Key): KeyInput | null {
  // Try direct lookup first (printable chars)
  let def = keyDefinitions[key];
  if (def) return createKeyInput(key, def);

  // Try with Standard location
  def = keyDefinitions[(key as number) + 256];
  if (def) return createKeyInput(key, def);

  // Try with Numpad location
  def = keyDefinitions[(key as number) + 1024];
  if (def) return createKeyInput(key, def);

  return null;
}

This approach is simpler than maintaining separate keymap tables for each location.

macOS Text Editing Commands

Chrome on macOS supports native text editing shortcuts that don't exist on other platforms. Pressing Cmd+Left moves to the beginning of the line. Cmd+Shift+Left selects to the beginning. These are "editing commands" in CDP terminology.

Go-rod maintains a lookup table mapping key combinations to their macOS editing commands:

export const macCommands: Record<string, string> = {
  "Meta+ArrowLeft": "moveToBeginningOfLine:",
  "Meta+Shift+ArrowLeft": "moveToBeginningOfLineAndModifySelection:",
  "Meta+ArrowRight": "moveToEndOfLine:",
  "Meta+Shift+ArrowRight": "moveToEndOfLineAndModifySelection:",
  "Alt+ArrowLeft": "moveWordLeft:",
  "Alt+Shift+ArrowLeft": "moveWordLeftAndModifySelection:",
  // ... 16 more combinations
};

When generating a key press event on macOS, the system looks up the editing command and includes it in the CDP message. Without this, text selection and cursor movement break in headless Chrome on Mac.

CDP WebSocket Protocol

The CDP client handles async message correlation. Every outgoing call gets an incrementing ID. The client stores a pending callback keyed by that ID. When a response arrives, it looks up the callback and resolves it:

export class CDPClient {
  private pending = new Map<number, {
    resolve: (result: any) => void;
    reject: (error: Error) => void;
  }>();
  private nextId = 1;

  async call<T>(method: string, params?: object): Promise<T> {
    const id = this.nextId++;
    return new Promise((resolve, reject) => {
      this.pending.set(id, { resolve, reject });
      this.ws.send(JSON.stringify({ id, method, params }));
    });
  }

  private handleMessage(data: string) {
    const msg = JSON.parse(data);
    if (msg.id !== undefined) {
      const pending = this.pending.get(msg.id);
      if (pending) {
        this.pending.delete(msg.id);
        if (msg.error) {
          pending.reject(new CDPError(msg.error));
        } else {
          pending.resolve(msg.result);
        }
      }
    } else {
      // It's an event, emit to listeners
      this.emitEvent(msg.method, msg.params);
    }
  }
}

Events (messages without an id) are handled separately via an async iterator pattern, allowing consumers to for await over CDP events.

sequenceDiagram
    participant A as Caller A
    participant B as Caller B
    participant C as CDPClient
    participant Chrome

    A->>C: call("Page.navigate") → id=1
    C->>Chrome: {"id":1, ...}
    B->>C: call("DOM.getDocument") → id=2
    C->>Chrome: {"id":2, ...}

    Chrome-->>C: event (no id)
    C-->>A: emit → async iterator

    Chrome-->>C: {"id":2, result}
    C-->>B: resolve Promise (id=2)

    Chrome-->>C: {"id":1, result}
    C-->>A: resolve Promise (id=1)

Browser Detection

Finding Chrome is platform-specific. Go-rod maintains path lists for each OS:

const browserPaths: Record<string, string[]> = {
  darwin: [
    "/Applications/Google Chrome.app/Contents/MacOS/Google Chrome",
    "/Applications/Chromium.app/Contents/MacOS/Chromium",
    "/Applications/Microsoft Edge.app/Contents/MacOS/Microsoft Edge",
    // ...
  ],
  linux: [
    "/usr/bin/google-chrome",
    "/usr/bin/chromium",
    "/snap/bin/chromium",
    // ...
  ],
  win32: [
    "C:\\Program Files\\Google\\Chrome\\Application\\chrome.exe",
    `${process.env.LOCALAPPDATA}\\Google\\Chrome\\Application\\chrome.exe`,
    // ...
  ],
};

The lookPath() function iterates through these, checking each for existence and executability. First match wins.

Container Detection

In Docker containers, Chrome requires --no-sandbox to run. Go-rod auto-detects this by reading /proc/1/cgroup and checking for docker or kubepods:

function isInContainer(): boolean {
  if (platform === "linux") {
    try {
      const cgroup = fs.readFileSync("/proc/1/cgroup", "utf-8");
      return cgroup.includes("docker") || cgroup.includes("kubepods");
    } catch {
      return false;
    }
  }
  return false;
}

When detected, the launcher automatically adds --no-sandbox to the Chrome flags.


IV – Test Parity

Every Go test was ported one-to-one:

Package Go Tests TS Tests Expects
cdp-input 15 28 38
cdp 8 16 40
browser-launcher 27 63 101
Total 50 107 179

The TypeScript tests actually exceed the Go test count because I split some multi-assertion tests into separate test cases for clearer failure messages.

Some Go tests couldn't be directly ported. Tests that spawn real Chrome processes are marked as integration tests — they work locally but aren't run in CI without Chrome installed. Tests using Go-specific patterns (goroutines, channels) were adapted to JavaScript equivalents (Promises, async/await).


V – Default Automation Flags

When Chrome launches for automation, it needs specific flags to behave predictably:

// Disable features that interfere with automation
this.set("disable-background-networking");
this.set("disable-background-timer-throttling");
this.set("disable-backgrounding-occluded-windows");
this.set("disable-breakpad");
this.set("disable-component-extensions-with-background-pages");
this.set("disable-default-apps");
this.set("disable-dev-shm-usage");
this.set("disable-hang-monitor");
this.set("disable-ipc-flooding-protection");
this.set("disable-popup-blocking");
this.set("disable-prompt-on-repost");
this.set("disable-renderer-backgrounding");
this.set("disable-sync");

// Enable automation-specific features
this.set("enable-automation");
this.set("enable-features", "NetworkService", "NetworkServiceInProcess");
this.set("force-color-profile", "srgb");
this.set("metrics-recording-only");
this.set("use-mock-keychain");

Each flag prevents a different automation footgun. disable-backgrounding-occluded-windows prevents Chrome from throttling tabs that aren't visible. disable-ipc-flooding-protection prevents rate limiting when sending rapid input events. force-color-profile=srgb ensures consistent screenshot colors across systems.


VI – What's Next

Two packages remain:

@oakoliver/cdp-proto — The 35,000 lines of protocol types. These are auto-generated in Go from Chrome's protocol definition. I may write a generator that produces TypeScript from the same JSON schema, rather than hand-porting.

@oakoliver/rod — The high-level Browser, Page, and Element API. This is the user-facing interface that ties all layers together. Methods like page.navigate(url), page.click(selector), page.screenshot().

Once complete, VHS can be ported — the terminal recording tool that started this whole chain of dependencies.

"Simplicity is the ultimate sophistication."