All source listed below is under MIT license if no LICENSE file stating different is available.

WebKitGTK WebSocket Browser Automation Server

This project provides a lightweight, headless (or headful, depending on configuration) browser automation server built with WebKit2GTK and powered by a WebSocket interface. It allows external clients to control browser instances, navigate to URLs, execute JavaScript, take screenshots, and load custom HTML, making it suitable for web scraping, automated testing, or other browser-driven tasks.

The server is implemented in C using GTK3, WebKit2GTK-4.1, Libsoup, and Jansson. The provided client examples are in Python using websockets and asyncio.

Features

  • Lightweight: Built directly on WebKit2GTK, which can be more resource-efficient than full browser automation frameworks for certain tasks.
  • WebSocket Control: A simple, message-based protocol over WebSockets for client-server communication.
  • JavaScript Execution: Execute arbitrary JavaScript in the browser context.
  • Navigation: Load URLs and local HTML content.
  • Screenshot Capability: Capture PNG screenshots of the current page (requires additional server-side implementation).
  • Parallel Window Support: Each WebSocket connection can control a separate browser window, enabling parallel automation.

Server Setup (C Application)

Prerequisites

You need the following libraries and their development headers installed on your Debian/Ubuntu-based system:

  • build-essential (for gcc, make, etc.)
  • pkg-config
  • libgtk-3-dev
  • libwebkit2gtk-4.1-dev
  • libsoup-2.4-dev
  • libjansson-dev

Install them using apt:

sudo apt update
sudo apt install build-essential pkg-config libgtk-3-dev libwebkit2gtk-4.1-dev libsoup-2.4-dev libjansson-dev

Compilation

Navigate to the directory containing webapp.c and compile using gcc:

gcc webapp.c $(pkg-config --cflags --libs gtk+-3.0 webkit2gtk-4.1 libsoup-2.4 jansson) -o webapp

If you encounter issues related to webkit2gtk-4.1 package names, you might need to adjust the pkg-config name to webkit2gtk-4.0 or another version if your system's package provides it under a different name. Use pkg-config --list-all | grep webkit2gtk to find available versions.

Running the Server

Execute the compiled binary:

./webapp

The server will start listening for WebSocket connections on ws://localhost:8080/. You will also see a GTK window appear, which is the browser instance.

Note on Sandbox: If you encounter issues related to the WebKit sandbox (e.g., crashes or non-functional browser), you might need to adjust your system's sysctl settings (often needed on Linux for unprivileged user namespaces):

sudo sysctl -w kernel.unprivileged_userns_clone=1

This is a common requirement for modern WebKit versions using sandboxing.

Client Usage (Python Examples)

The provided client.py demonstrates how to interact with the server using asyncio and websockets.

Prerequisites for Client

pip install websockets

Running the Python Client Demos

python client.py

The client script will present a menu allowing you to choose from different automation scenarios:

  1. Parallel Browsers: Opens multiple browser windows concurrently, navigates to different sites, and executes basic JavaScript in each.
  2. Automated Testing: Navigates to a list of URLs, extracts specific content (e.g., heading text), and takes screenshots.
  3. Form Automation: Loads a custom HTML page with a form, then uses JavaScript to programmatically fill and submit the form fields.
  4. Run All Demos: Executes all the above demos sequentially.

WebSocket Protocol

The communication between the client and the server happens over a standard WebSocket connection. All messages are JSON-encoded.

1. Initial Connection Handshake (Server to Client)

Upon a successful WebSocket connection, the server immediately sends a JSON message to the client acknowledging the connection.

Server to Client:

{
  "status": "connected",
  "connection_id": "unique-uuid-for-this-connection"
}
  • status: Always "connected".
  • connection_id: A unique identifier for the established WebSocket session. This can be used by the client for logging or internal tracking.

2. Client Commands (Client to Server)

The client sends JSON objects to the server to issue commands. Each command should include a command field and optionally other parameters, along with a request_id to correlate responses.

A. Navigate to URL

Loads a specified URL in the browser window.

Client to Server:

{
  "command": "navigate",
  "url": "https://www.example.com",
  "request_id": "my_navigation_request_123"
}
  • command: "navigate"
  • url: The URL to load.
  • request_id: A unique ID for this request.

Server to Client (Response):

{
  "status": "success",
  "command": "navigate",
  "request_id": "my_navigation_request_123",
  "message": "Navigation initiated"
}
  • status: "success" or "error".
  • command: The original command, "navigate".
  • request_id: The request_id from the original client command.
  • message: A descriptive message.

B. Execute JavaScript

Executes a JavaScript string in the context of the current page. The result of the JavaScript execution is returned.

Client to Server:

{
  "command": "execute_js",
  "script": "document.body.style.backgroundColor = 'blue'; document.title;",
  "request_id": "my_js_execution_request_456"
}
  • command: "execute_js"
  • script: The JavaScript code to execute. The last expression's value is returned.
  • request_id: A unique ID for this request.

Server to Client (Response):

{
  "status": "success",
  "command": "execute_js",
  "request_id": "my_js_execution_request_456",
  "result": "Your Page Title",
  "result_type": "string"
}

Or for an error in the JavaScript:

{
  "status": "success",
  "command": "execute_js",
  "request_id": "my_js_execution_request_456",
  "result_type": "javascript_error",
  "result": {
    "_error": true,
    "message": "ReferenceError: nonExistentVar is not defined",
    "stack": "..."
  }
}
  • status: "success" (even for JS errors, as the command itself executed successfully) or "error" (if the server couldn't execute the command at all).
  • command: The original command, "execute_js".
  • request_id: The request_id from the original client command.
  • result: The JSON-stringified result of the JavaScript execution. This can be a string, number, boolean, object, array, or null. If a JavaScript error occurred, result will be an object with _error: true, message, and stack.
  • result_type: A string indicating the JSON type of the result field (e.g., "string", "number", "object", "javascript_error").

C. Set HTML Content

Replaces the entire content of the current page with the provided HTML string.

Client to Server:

{
  "command": "set_html",
  "html": "<h1>Hello from Client!</h1><p>This is custom HTML.</p>",
  "request_id": "my_set_html_request_789"
}
  • command: "set_html"
  • html: The full HTML content to load.
  • request_id: A unique ID for this request.

Server to Client (Response):

{
  "status": "success",
  "command": "set_html",
  "request_id": "my_set_html_request_789",
  "message": "HTML content set"
}
  • status: "success" or "error".
  • command: The original command, "set_html".
  • request_id: The request_id from the original client command.
  • message: A descriptive message.

D. Take Screenshot (Placeholder - Server-side implementation needed)

Note: The current C server implementation does not yet include the code to actually capture and send a screenshot. This command is a placeholder in the protocol. To implement this, you would need to use WebKitGTK's screenshot capabilities (e.g., webkit_web_view_get_snapshot) and encode the image (e.g., Base64) to send it over WebSocket.

Client to Server:

{
  "command": "screenshot",
  "request_id": "my_screenshot_request_101"
}
  • command: "screenshot"
  • request_id: A unique ID for this request.

Server to Client (Response - Current Placeholder):

{
  "status": "success",
  "command": "screenshot",
  "request_id": "my_screenshot_request_101",
  "result": "Placeholder: Screenshot functionality not fully implemented.",
  "result_type": "string"
}

When implemented, result would likely be a Base64-encoded PNG/JPEG string.

Contributing

Feel free to fork, contribute, and enhance this browser automation server. Ideas for improvement include:

  • Full screenshot implementation and sending as Base64.
  • Error handling for invalid JSON messages from client.
  • More robust error reporting from the C server.
  • Support for more browser interactions (e.g., mouse clicks, key presses, element selection).
  • Headless mode option for the WebKitGTK window.
  • Option to specify the port on server startup.
client.py
demo.py
Makefile
README.md
server.py
webapp.c