Securing a Serverless Multi-Tenancy Puppeteer Service

If you ever try to stand up a Puppeteer service you will almost immediately find it is difficult to secure when running inside a Docker environment.

I love my serverless, so I was not prepared to take no for an answer. And with a lot of sweat, I think I able to stand up a Puppeteer service with full customer isolation and protection again serverside scripting from within a multi-tenancy docker container.

Customer Isolation

Customers should not be able to view each other’s data.

— no-sandbox

Chrome itself is natively very good at sandboxing tabs for security reasons. Ideally, we would simply just exploit the inbuilt security model, e.g. put each customer in their own tab(s). Not so fast though! Unfortunately, Chrome won’t boot under that configuration. The way that sandbox is implemented does not work containers, as a result, nearly every Dockerfile for Puppeteer on the internet launches with the — no-sandbox flag.

— cap-add=SYS_ADMIN

The few Dockerfiles I could find without — no-sandbox have added the SYS_ADMIN security capability. This is one solution to keep the sandbox, but most managed docker environments don’t expose this control, unfortunately. So I needed a different way to work on Serverless.

Linux process isolation

Normal Linux processes cannot mess with each other’s memory. So the OS approach for customer isolation is to run a different browser for each customer.

Resource isolation

You still need to be careful though, as even separate Chrome processes can still access common resources (e.g. filesystem). In particular, user sessions, cookies, and website cached data need to be stored in different directories for each customer

--disk-cache-dir 
--media-cache-dir

Protection against Serverside Request Forgery

A Puppeteer service essentially allows end-users to run code within our infrastructure. The big danger is that the Puppeteer instance will become a bastion for network intrusion. This is not an academic thing either, a ton of exploits observed in the wild used Puppeteer or similar to launch attacks via serverside request forgery attacks. Check out

In particular, an easy attack is to have Puppeteer run a webpage that probes for Cloud metadata servers, which can then be used to obtain credentials.

So, we must prevent Chrome from accessing certain URLs and local IP addresses.

You can find a list of IP addressed to block on owasp.org.

— proxy-server

Chrome can be configured to use an outbound HTTP proxy server, which we can use to intercept and filter traffic. For our service, we used TinyProxy as it has a very low resource overhead (2MB).

The TinyProxy configuration then protects against access to sensitive IP addresses and domains.

Our translation of OWASP’s IP filters to TinyProxy configuration can be found on Github

Exposing the Chrome DevTools protocol port

The most exciting thing for me is allowing users to script Chrome from within their browser environment remotely. This is enabled by exposing the Devtools debug WebSocket protocol.

To allow us to meter access, we expose a WebSocket endpoint that requires an access_token to be in the path. We can then verify the access_token, boot Puppeteer, and then proxy the public WebSocket endpoint to the internal WebSocket endpoint on demand.

Try it in your Browser without installation

With that in place, we are now able to offer Puppeteer access from right within a browser. Check it out, we are hosting it from within @observablehq in a notebook, and you can sign in using IndieAuth.

Full source code is available on Github and it is designed to be run on Google Cloud Run.

--

--

--

Observablehq/Cloud consultant. Developing webcode.run to serverless to Observablehq. Ex-Firebase, Ex-Google Cloud.

Love podcasts or audiobooks? Learn on the go with our new app.

Recommended from Medium

GridView List Widget in flutter

How to get the most out of Frontend Mentor

Airdrop tokens on the XDC Network with the XRC-AirDrop Contract #BuildItOnXDC

Web Scraping vs. APIs, explained simply and briefly

[Tech Blog] Augmenting GraphQL APIs via schema extensions

Website Hosting by Dynamic Launching of Node & Zero Downtime

How to google like a Pro for non-tech Data Analyst?

What’s coming up? UIST Day 1!

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Tom Larkworthy

Tom Larkworthy

Observablehq/Cloud consultant. Developing webcode.run to serverless to Observablehq. Ex-Firebase, Ex-Google Cloud.

More from Medium

Slot Filling using Sequence Models

Fraud: The invisible enemy of every App

OpenShift Service Mesh on IBM System Z/LinuxONE Part I: Installation

The big red letter `O` with parts of it shifted is the logo of Red Hat OpenShift. The little blue sails are the logo of Maistra, the main component of Red Hat OpenShift Service Mesh. The asymmetrical vesica piscis below the OpenShift logo is the logo of the Kiali subsystem of Service Mesh. The little blue Go Gopher wearing a Tyrolean hat perched atop an IBM LinuxOne chassis is the logo for the Jaeger Tracking system used in Red Hat OpenShift Service Mesh.

Phase Reconstruction and Embedding