A proxy service to mediate access to Chrome that is run in headless mode, for use in high-frequency application load testing, end-user behaviour simulations and programmatic access to Chrome Devtools.
Enables automatic initialisation of the underlying chrome browser pages upon the request for a connection, as well as closing the page once the WebSocket connection is closed.
This project was born out of necessity, as we needed to orchestrate a large number of concurrent browser scenario executions, with low-level control and advanced introspection capabilities.
Elixir: 1.8+ OTP: 21.3+
See .travis.yml
for complete list of supported versions.
erlexec
Cowboy is a major dependency of Phoenix, as such here is a little notice as to which versions of cowboy are hard dependencies of Chroxy. This notice will be removed at version 1.0 of Chroxy.
Cowboy 1.x
<= version 0.5.1
Cowboy 2.x
> version 0.6.0
Cowboy 2.8+
> version 0.7.0
The objective of this project is to enable connections to headless chrome
instances with minimal overhead and abstractions. Unlike browser testing
frameworks such as Hound
and Wallaby
, Chroxy aims to provide direct
unfettered access to the underlying browser using the Chrome Debug
protocol whilst
enabling many 1000s of concurrent connections channelling these to an underlying
chrome browser resource pool.
Chroxy uses Elixir processes and OTP supervision to manage the chrome instances, as well as including a transparent proxy to facilitate automatic initialisation and termination of the underlying chrome page based on the upstream connection lifetime.
Get dependencies and compile:
$ mix do deps.get, compile
Run the Chroxy Server:
$ mix run --no-halt
Run with an attached session:
$ iex -S mix
Run Docker image
Note: Chrome required a bump in shared memory allocation when running within docker in order to function in a stable manner.
Exposes 1330, and 1331 (default ports for connection api and chrome proxy endpoint).
docker build . -t chroxy
docker run --shm-size 2G -p 1330:1330 -p 1331:1331 chroxy
ChromeRemoteInterface
Establish 100 Browser Connections:
clients = Enum.map(1..100, fn(_) ->
ChroxyClient.page_session!(%{host: "localhost", port: 1330})
end)
Run 100 Asynchronous browser operations:
Task.async_stream(clients, fn(client) ->
url = "https://github.com/holsee"
{:ok, _} = ChromeRemoteInterface.RPC.Page.navigate(client, %{url: url})
end, timeout: :infinity) |> Stream.run
You can then use any Page
related functionality using
ChromeRemoteInterface
.
Get the address for a connection:
$ curl http://localhost:1330/api/v1/connection
ws://localhost:1331/devtools/page/2CD7F0BC05863AB665D1FB95149665AF
With this address you can establish the connection to the chrome instance (which is routed via a transparent proxy).
The configuration is designed to be friendly for containerisation as such uses environment variables
def deps do
[{:chroxy, "~> 0.3"}]
end
If using Chroxy as a dependency of another mix projects you may wish to leverage
the configuration implementation of Chroxy by replication the configuration in
"../deps/chroxy/config/config.exs"
.
Example: Create a Page Session, Registering for Event and Navigating to URL
ws_addr = Chroxy.connection()
{:ok, page} = ChromeRemoteInterface.PageSession.start_link(ws_addr)
ChromeRemoteInterface.RPC.Page.enable(page)
ChromeRemoteInterface.PageSession.subscribe(page, "Page.loadEventFired", self())
url = "https://github.com/holsee"
{:ok, _} = ChromeRemoteInterface.RPC.Page.navigate(page, %{url: url})
# Message Received by self() => {:chrome_remote_interface, "Page.loadEventFired", _}
Ports, Proxy Host and Endpoint Scheme are managed via Env Vars.
Variable | Default | Desc. |
---|---|---|
CHROXY_CHROME_PORT_FROM | 9222 | Starting port in the Chrome Browser port range |
CHROXY_CHROME_PORT_TO | 9223 | Last port in the Chrome Browser port range |
CHROXY_PROXY_HOST | "127.0.0.1" | Host which is substituted to route connections via proxy |
CHROXY_PROXY_PORT | 1331 | Port which proxy listener will accept connections on |
CHROXY_ENDPOINT_SCHEME | :http |
HTTP or HTTPS
|
CHROXY_ENDPOINT_PORT | 1330 | HTTP API will register on this port |
CHROXY_CHROME_SERVER_PAGE_WAIT_MS | 200 | Milliseconds to wait after asking chrome to create a page |
CHROME_CHROME_SERVER_CRASH_DUMPS_DIR | "/tmp" | Directory to which chrome will write crash dumps |
An intermediary TCP proxy is in place to allow for monitoring of the upstream client and downstream chrome RSP web socket connections, in order to clean up resources after connections are closed.
Chroxy.ProxyListener
- Incoming Connection Management & Delegation
CHROXY_PROXY_HOST
:CHROXY_PROXY_PORT
.accept/1
function which will accept the next upstream TCP connection andProxyServer
process along with the proxy_opts
Chroxy.ProxyServer
- Dynamically Configured Transparent Proxy
proxy_opts
orProxyServer.Hook.up/2
hook modules response, at initialisation.Chroxy.ProxyServer.Hook
- Behaviour for ProxyServer
hooks. Example: ChromeProxy
ProxyServer
@spec up(indentifier(), proxy_opts()) :: proxy_opts()
@spec down(indentifier(), proxy_state) :: :ok
Chrome is the first browser supported, and the following server processes manage the communication and lifetime of the Chrome Browsers and Tabs.
Chroxy.ChromeProxy
- Implements ProxyServer.Hook
for Chrome resource management
connection/1
which returns the websocket connectionProxyServer
process.ProxyServer
, implementing thedown/2
callback in order to clean up the Chrome resource when connectionsChroxy.ChromeServer
- Wraps Chrome Browser OS Process
Chroxy.BrowserPool
- Inits & Controls access to pool of browser processes
connection/0
function which will return a WebSocket connection to aChroxy.BrowerPool.Chrome
- Chrome Process Pool
ChromeServer
process pool, responsible for spawning a browserChroxy.Endpoint
GET /api/v1/connection
Returns WebSocket URI ws://
to a Chrome Browser Page which is routed via the
Proxy. This is the first port of call for an external client connecting to the service.
Request:
$ curl http://localhost:1330/api/v1/connection
Response:
ws://localhost:1331/devtools/page/2CD7F0BC05863AB665D1FB95149665AF
The following is an example configuration which can be used to run Chroxy on Kubernetes.
deployment.yaml
apiVersion: extensions/v1beta1
kind: Deployment
metadata:
name: crawler
namespace: default
labels:
app: myApp
tier: crawler
spec:
replicas: 1
revisionHistoryLimit: 1
strategy:
type: RollingUpdate
rollingUpdate:
maxUnavailable: 0
maxSurge: 1
selector:
matchLabels:
app: myApp
tier: crawler
template:
metadata:
labels:
app: myApp
tier: crawler
spec:
containers:
- image: eu.gcr.io/..../...:latest # your consumer
name: api
imagePullPolicy: Always
resources:
requests:
cpu: 30m
memory: 100Mi
ports:
- containerPort: 4000
env:
- name: USER_AGENT
value: ...
- name: INSTANCE_NAME
valueFrom:
fieldRef:
fieldPath: metadata.name
# [START chroxy]
- name: headless-chrome
image: eu.gcr.io/..../chroxy:latest # chroxy
imagePullPolicy: Always
resources:
requests:
cpu: 30m
memory: 100Mi
env:
- name: CHROXY_CHROME_PORT_FROM
value: "9222"
- name: CHROXY_CHROME_PORT_TO
value: "9223"
ports:
- containerPort: 1331
- containerPort: 1330
# [END chroxy]
service.yaml
apiVersion: v1
kind: Service
metadata:
namespace: default
name: crawler-api
labels:
app: myApp
tier: crawler
spec:
selector:
app: myApp
tier: crawler
ports:
- port: 4000
protocol: TCP