How to run Puppeteer Sharp in a Linux Docker container

Puppeteer Sharp is a a website crawler for C#. I personally use it to crawl websites for price information of products that I am interested in. In this article, you are going to learn about the configurations you have to do if you want to use the crawler in a Linux hosted web application.

Create a new .NET Core API

When creating the new project do not forget to set up Docker support. Create a new C# class for storing the code that follows.

Use the PuppeteerSharp Nuget package

Reference the latest version of PuppeteerSharp in your project and use it in the class you created.

Crawl a webpage

Create a new method for getting the content of a webpage as string to the user. You can use the following code as guidance.

Pay attention to these points in the code:

public async Task<string> GetContentAsync(string url)
{
    using var browser = await Puppeteer.LaunchAsync(new LaunchOptions { Args = ["--no-sandbox"], ExecutablePath = "/usr/bin/google-chrome-stable" });
    using var page = await browser.NewPageAsync();

    page.DefaultNavigationTimeout = 0;

    await page.GoToAsync(url, new NavigationOptions { WaitUntil = [WaitUntilNavigation.Networkidle2], Timeout = 0 });     

    return await page.GetContentAsync();
}

Set up the Dockerfile

We want our code to run inside a Docker container, possible on a Linux App Service in Azure. For that we will instruct Docker on how to build and start the application.

You can adapt the Dockerfile based on the following configuration. Read the comments for a more detailed explanation about why we do things as we do.

FROM mcr.microsoft.com/dotnet/aspnet:8.0 AS base

# Puppeteer recipe
# Based on this code: https://github.com/armbues/chrome-headless/blob/master/Dockerfile
RUN apt-get update && apt-get -f install && apt-get -y install wget gnupg2 apt-utils
RUN wget -q -O - https://dl.google.com/linux/linux_signing_key.pub | apt-key add -
RUN echo 'deb [arch=amd64] http://dl.google.com/linux/chrome/deb/ stable main' >> /etc/apt/sources.list
RUN apt-get update \
&& apt-get install -y google-chrome-stable --no-install-recommends --allow-downgrades fonts-ipafont-gothic fonts-wqy-zenhei fonts-thai-tlwg fonts-kacst fonts-freefont-ttf

# We are setting the same path as before in the C# code
ENV PUPPETEER_EXECUTABLE_PATH="/usr/bin/google-chrome-stable"

# The following commands are the standard configuration for restoring, building and publishing a .NET core application.
# You will have to update the name of your project
USER app
WORKDIR /app
EXPOSE 8080
EXPOSE 8081

FROM mcr.microsoft.com/dotnet/sdk:8.0 AS build
ARG BUILD_CONFIGURATION=Release
WORKDIR /src
COPY ["YourApi/YourApi.csproj", "YourApi/"]
RUN dotnet restore "YourApi.Api/YourApi.Api.csproj"
COPY . .
WORKDIR "/src/YourApi.Api"
RUN dotnet build "YourApi.csproj" -c $BUILD_CONFIGURATION -o /app/build

# We set the UseAppHost to false since we do not want to create any executable for the Linux environment
FROM build AS publish
RUN dotnet publish "YourApi.csproj" -c $BUILD_CONFIGURATION -o /app/publish /p:UseAppHost=false

# Define the entrypoint of the application
FROM base AS final
WORKDIR /app
COPY --from=publish /app/publish .
ENTRYPOINT ["dotnet", "YourApi.Api.dll"]

Conclusion

It is possible to run Puppeteer in a Linux environment. However, some specific configurations in code and in the Dockerfile have to be made to allow it. I hope this article clarifies some open questions you might had.

comments powered by Disqus