How to set up puppeteer with NodeJs and Docker
Quite recently, we released our scraper api and React web scraper component, which basically allows the consumer to scrape a given website. They are for free if you are interested of using them :)
Anyway, when deploying our backend services, we stumble on some issues with the Chrome not being found when it ran inside our Docker container. And it seems that we were not the first ones to find ourselves in this situation. So today, I will share how we solved it so that we can pass along the solution to whoever finds themselves in the same situation.
Architecture
In this case, we are running puppeteer in a NodeJs express API, and we are deploying it to Cloud Run using Docker. The Cloud Run service is not important here, it is the Docker container that is the main character of the issue, so if you are using some other service, it shouldn't matter.
Solution
Let's start with our puppeteer version. We are using
"puppeteer": "^20.1.0"
in this case. And we launch the browser as follows
const browser = await puppeteer.launch({ headless: true, args: ["--no-sandbox", "--disable-setuid-sandbox"], executablePath: "/usr/bin/google-chrome", });
The executablePath: "/usr/bin/google-chrome" is important here since it needs to point to the executable path of our Chrome installation.
Dockerfile
The Dockerfile here will contain everything for our Node project
FROM node:slim # Set the working directory to /app inside the container WORKDIR /app # Copy app files COPY . . # Installation of chrome dependencies RUN apt-get update && apt-get install gnupg wget -y && \ wget --quiet --output-document=- https://dl-ssl.google.com/linux/linux_signing_key.pub | gpg --dearmor > /etc/apt/trusted.gpg.d/google-archive.gpg && \ sh -c 'echo "deb [arch=amd64] http://dl.google.com/linux/chrome/deb/ stable main" >> /etc/apt/sources.list.d/google.list' && \ apt-get update && \ apt-get install google-chrome-stable -y --no-install-recommends && \ rm -rf /var/lib/apt/lists/* # ==== BUILD PART ===== RUN npm install # Set the env to "production" ENV NODE_ENV production # Expose the port on which the app will be running EXPOSE 8080 # Start the app CMD [ "node", "index.js" ]
The important part here is that we download the dependencies for Chrome to work, the rest is maybe not important for your project, but I thought of sharing it if you are not having Docker setup already in your project.
And when you build the project with Docker, it should now work. If not, consider sending us an email and we can hopefully help out. Or if you think something can be done in another way perhaps.
Full deployment tutorial
For a full NodeJs deployment flow to Cloud Run, check this guide out.
Outro
In this guide we shared how we solved issues with running Puppeteer in a Docker container. I hope you got some guidence or help from this guide, and I hope to see you in another article as well.
All the best,