January 23, 2024

Building a custom nginx access log parser

After launching the pruvious.com website, I sought to understand its visitor count and traffic sources. Rather than using tools like Google Analytics, I opted to develop a simple parser for the nginx access logs on the server. This parser displays page impressions and visitors, along with their country, city, and operating system, directly in the Pruvious dashboard.

I startet by creating a new Pruvious project using pnpm exec pruvious@latest init access-log-parser. Then, in the module options in nuxt.config.ts, I removed some default API routes and standard collections that won't be used in this project.

# nuxt.config.ts

export default defineNuxtConfig({
  ...
  pruvious: {
    api: {
      routes: {
        'pages.get': false,
        'previews.get': false,
        'robots.txt.get': false,
        'sitemap.xml.get': false,
      },
    },
    standardCollections: {
      pages: false,
      presets: false,
      previews: false,
      redirects: false,
      seo: false,
    },
  },
})

Next, I created the root.global.ts middleware to redirect us to the /dashboard when visiting the root page of the site, the custom collections impressions and visitors to store our log data, and a parser that crawls the access logs every minute using job queues in Pruvious. Here are the results:

The project is open source and free to use. Check it out on GitHub: https://github.com/pruvious/access-log-parser

How to use it

Clone the repository on your local machine and deploy it on a remote server using the Pruvious CLI or manually, like a normal Nuxt app.

Once the app is running, configure the paths to the access logs you want to parse and upload the GeoLite2-City.mmdb database from Maxmind to obtain the location from the users' IP addresses. You can obtain this database for free by registering an account on https://www.maxmind.com/en/geolite2/signup. Alternatively, you can get it from other sources as well.

Access log parser settings

Make sure the user running the app can read the access.log file. Here's an example of how to accomplish this:

# Terminal

sudo setfacl -m u:pruvious:r-x,d:u:pruvious:r-x /var/log/nginx

Customization

You can integrate this parser into your Pruvious site directly by copying the components from the repository and adapting them to your needs. It is a straightforward process. You can also use the parser for any other type of sites on your nginx server, or as a basis for creating parsers for different log files. The key caveat lies in the parse-logs job. It excludes log entries with a period and disregards API requests, except for those related to public pages. Here's the crucial section:

# jobs/parse-logs.ts

import { defineJob, type CreateInput } from '#pruvious'
import { query } from '#pruvious/server'
import { Parser } from '~/utils/Parser'
import { getLocation } from '~/utils/geoip'

export default defineJob({
  name: 'parse-logs',
  interval: 60,
  callback: async () => {
    const { accessLogPaths } = await query('settings').read()
    const visitorsMap: Record<string, number> = {}

    for (const { path, site } of accessLogPaths) {
      const parser = new Parser(path)
      const newImpressions: CreateInput['impressions'][] = []

      // ...

      for (const entry of parser.walk()) {
        if (entry.status !== 200 || !entry.request.match(/^GET (\/(?!api(?!\/pages\/))[^\.]*) HTTP/)) {
          continue
        }

        const page = entry.request
          .slice(4)
          .split(' ')[0]
          .replace(/^\/api\/pages\//, '/')

        // ...
      }

      if (newImpressions.length) {
        await query('impressions').createMany(newImpressions)
      }
    }
  },
})

Feel free to experiment with this project. If you encounter any problems, don't hesitate to open an issue on GitHub.

Last updated on February 11, 2024 at 10:59

View on GitHubNginx access log parser made with Pruvious
Share