DIY Quick and dirty DNS client stats collection

/ Infoblox, Python, DNS, Development, Logging

Picture of guy holding head

A common task for decommissioning or moving a name server is figuring out which clients are querying our particular name server. In this article, I discuss how you can parse syslog messages and build a DNS Top Talkers list.

OVERVIEW

I recently worked with a customer on a project to help them decommission an Infoblox DNS appliance from their Grid. They needed the device to be removed from "play" immediately and there was very little time to accomplish the task. Our first task was to determine the following:

  • What IP addresses were querying the DNS?
  • What was the volume of DNS queries for each IP?

In the absence of a Reporting Server, netflow data, or some other RMON tool, I needed a quick way of building a list of DNS Top Talkers. For that, I decided I would rely on DNS query logging. I would run DNS query logging on this particular member for a period of time and then analyze the data. That brings me to the point of this blog post. I'm sharing my "quick n' dirty" method of how I solved this in short order.

This article applies to the following:

  • Infoblox NIOS 6.12.x and above
  • Sift-tool
  • Python 3

The Problem

The problem is that we need to quickly, efficiently, and easily be able to enumerate all the DNS clients by IP address who are "hitting" this particular Infoblox DNS Grid Member. We also want to know how often or what is the volume of DNS requests being made by each DNS client. This information allows us to build a DNS Top Talkers report. That data can be used to evaluate the risk to the firm if the system were forced to be shutdown at a moments notice.

Extra credit: For some enterprise environments, this problem begets another problem - Ownership

  1. Who owns the IP addresses?
  2. Who supports the IP addresses?
  3. Where are the IP addresses located physically?
  4. What contact info do we have for each of the IP addresses?

I mentioned this because, more often than not, this actually becomes THE problem. For now, this was Extra Credit, and is somewhat outside the scope of the article, but I felt it was important to raise this. Why? Because to many businesses this is a weakness. It is vitally important data and enterprises should be able to correlate IP to user (or owner).

The Resolution

Our problem can be solved with a few tools at our disposal, including CLIs and a script or two. First, we must enable DNS query logging.

Caveat Emptor (Buyer Beware) - Enabling DNS query logging puts significant I/O overhead on the name server, reducing it's performance as a result. Do this with caution! Check the resources of the server, evaluate risk, and proceed with caution.

In our case, we had a server that had to be removed, so there was no choice but to resort to the use of DNS query logging given very little time to remediate and lack of other network monitoring and reporting tools.

After running our server with query logging enabled for a period of time, we had several days worth of log data. The data is harvested and brought back to our engineering workstation. Next, is to use CLIs and scripts to extract, parse, and obtain all DNS client source IP addresses and count the number of times each IP queried the DNS server from our data. That in a nutshell is our resolution.

Our Tasks

I recommend the use of an invaluable tool called sift-tool. Sift is an uber powerful and lightning fast replacement for Grep, with widespread support for most Operating Systems. If you've never used it, RUN don't walk to get it! You can get more info on Sift from their website at https://sift-tool.org/

The tasks for obtaining the data are as follows:

  1. Enable DNS query logging ONLY on the impacted Infoblox DNS Grid Member
  2. Operate normally for a period of time, keeping an eye on the server's resources, stability, and such
  3. Fetch an Infoblox supportBundle with the rolled syslog files - this will contain several compressed hours or days worth of data (see note below)
  4. Extract the data from the supportBundle.gz file into a working directory - the logs will be in <working_directory>/var/log
  5. Use sift to extract the IPs and count from the compressed log files
    for i in `seq 0 9`
     do
         sift 'info\s+client\s+(.*)#\d+\s+' messages.$i.gz -z --replace '$1' | sort | uniq -c > stats$i
     done

    The code above iterates from 0 to 9, searching for the Client IP Regex Group 1 over each of the syslog messages files named messages.0.gz thru messages.9.gz, sorts the data and outputs the ip address and count. That info is written out to separate stats files named stats0 thru stats9.

  6. Lastly, club together these stats files since there's one stats file per messages log file. Ideally, we want to look across all of the IP addresses as a unique list and sum() all query counts. To solve this, we use the following Python 3 script called fixup.py. This script coalesces the data and outputs the results to a single statistics file that can now be sorted by qcount in descending order.

The Python fixup.py script I used is quite simple:

#!/usr/bin/env python3

import csv
import sys
from collections import Counter

def main():
    cnt = Counter()

    file_prefix = 'stats'

    for i in range(10):
        with open(f'{file_prefix}{i}', 'r') as fh:
            for line in fh:
                qcount, source = line.split()
                cnt[source] += int(qcount)

        fh.close()

    res = dict(cnt)
    with open('dns-client-resolvers-stats.csv', 'w') as fh:
        mywriter = csv.writer(fh)
        for source in res:
            mywriter.writerow([source, res[source]])

        fh.close()

    sys.exit()

if __name__ == '__main__':
    main()

Execute the script as follows:

./fixup.py

This assumes that our stats0 through stats9 files are in the same directory.

NOTE: regarding DNS query logging and the volume of data that can be potentially generated - that depends on the name server that you've targeted for remediation. It depends on how busy this server is, how many resolvers are using it, how many queries each resolver is generating and so forth. You may need to experiment on tweaking the size of the syslog files, number of files to save, and how often you will collect the data.

That's it! I hope this snippet helps someone in need.

Next Post Previous Post