DNSSEC secured blog: raising awareness on DNS security

Hurray! My blog and the whole pierky.com domain are now running on a DNSSEC secured zone.

Thanks to the recent moving of the blog from the WordPress.org hosted infrastructure to the OVH hosting service I finally managed to enable IPv6 and DNSSEC support.

If you are using a DNSSEC-aware resolver (are you? check it out…) you can verify it yourself:

:~# dig +dnssec blog.pierky.com

; <<>> DiG 9.8.1-P1 <<>> +multi +dnssec blog.pierky.com
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 31643
;; flags: qr rd ra ad; QUERY: 1, ANSWER: 2, AUTHORITY: 0, ADDITIONAL: 1
...

There it is the ad (Authenticated answer) flag.

If your resolvers are not DNSSEC-aware – what a shame! Tell your ISP to enable them 🙂 – you can try the same using an open resolver which supports DNSSEC, like those of Google…

:~# dig +dnssec blog.pierky.com @8.8.8.8

… or you can try an online test suite, like the one provided by Verisign Labs or DNSViz.

A nice browser addon – available for Internet Explorer, Firefox and Chrome – allows you to check the DNSSEC validity of the domain names in your browser window. It’s name is DNSSEC Validator and it works even if your resolvers are not DNSSEC enabled (you can set an external resolver different from the one in use in your operating system); here it is a screenshot showing my blog’s status:

DNSSEC secured blog as seen by DNSSEC Validator addon

DNSSEC secured blog as seen by DNSSEC Validator addon

(in the above screenshot you can see a green 6 too, originated from another Chrome addon, IPvFoo, which indicates whether the current page was fetched using IPv4 or IPv6).

This is just a small drop in the ocean of Internet, but I like to believe that it might raise awareness about DNS security matter and encourage its adoption (it seems that as of September 2012 only 1.7% of the visible DNS resolvers in the Internet were performing DNSSEC validation).

References

RIPE Labs – Counting and Re-Counting DNSSEC

dnssec-deployment.org – DNSSEC in ccTLDs, Past, Present, and Future/

dnssec-deployment.org – ccTLD DNSSEC Adoption as of 2013-07-30 [PDF]

CZ.NIC – DNSSEC Validator

Verisign Labs – Test if you are benefiting from DNSSEC

Verisign Labs – DNSSEC-Debugger

Sandia.gov – DNSViz

Cluster fencing using SNMP fence_ifmib and Cisco switch

Fencing is a vital component in a virtualization cluster; when a cluster member fails it must be inhibited to access shared resources such as network disks or SAN, so that any virtual machine still running on it could be restarted on other members, being sure that no data will be corrupted because of simultaneous access.

Many methods exist to fence failed cluster members, mostly based on powering them off or on disconnecting their network cards; here I would like to show how to use network fencing on a Linux cluster environment (Cman based), using the fence_ifmib against a Cisco managed switch.

The logic behind this mechanism is very simple: once a node has been marked as dead the agent uses the SNMP SET method to tell the managed switch to shut the ports down.

Read more …

Windows Server 2008 / IIS 7.5: client source port logging

Many countermeasures taken by ISPs to face IPv4 exhaustion (DS-Lite, NAT64, NAT44, CGN) need more than the old IP-address/timestamp couple to uniquely identify an end-user on Internet. Even with a full logging of activities and sessions an ISP can’t trace a specific user if no source TCP/UDP port is given. So content providers, whether large or small, need to enable clients source port logging; it doesn’t matter if they run an enormous e-commerce website or a small blog like this, if they want to provide Law Enforcement Agencies (LEAs) a set of information capable of uniquely trace a user they need client source port logging.

Many software products have simple builtin configuration commands to accomplish this task, here I write how to enable this feature under Microsoft Windows Server 2008 R2 – IIS 7.5.

Advanced Logging IIS extension

The IIS builtin logging module doesn’t allow client source port logging, so an extension is needed: Advanced Logging. Once installed a new icon appears in the IIS Management Console:

IIS Advanced Logging icon

IIS Advanced Logging icon

Enable client port logging

Configuration can be done at any level: global, web site, directory. Open the Advanced Logging icon then, in the Actions pane, click Enable Advanced Logging. Once enabled the feature you just need to add the client port to the list of logged fields: always from the Action pane click Edit Logging Fields, then the Add field button and use the following data:

Field ID: Client-IP
Source type: Server variables
Source name: REMOTE_PORT

Hit the OK button a couple of time and go back to the main window, where you find the default log definition named %COMPUTERNAME%-Server; double click it in order to open details then select your logging preferences, being careful to add the Client-IP field ID to the list of the selected ones (from the Selected Fields section click the Select Fields button and check it).

After you have done some activity on your web site you can check the log content clicking View log files from the Actions pane; client port will be there somewhere, depending on the fields sequence you have on the log definition Selected Fields list.

LVM: disable udev sync to avoid “udevd timeout: killing watershed” error

While I was working on the setup of a simple cluster with LVM running on top of DRBD and managed by Pacemaker/Corosync I had a problem with LVM resources not coming up after a reboot.

The cluster was running on Ubuntu 12.04 (3.5.0-36 kernel), LVM logical volumes were used for data storage only (mapped to iSCSI target LUNs) while the whole system was running on physical disks.

The message logged by the Heartbeat OCF resource agent was “ERROR: LVM: MyVolumeGroup did not activate correctly” and the cluster status stuck in this way:

Resource Group: MyCluster
     LVM_Group     (ocf::heartbeat:LVM):   Started MYHOSTNAME (unmanaged) FAILED

To manually resolve the situation I had to deactivate volume groups and restart the cluster manager every time:

MYHOSTNAME:~# vgchange -a n 
MYHOSTNAME:~# service corosync restart

Further logs investigation led me to find some udevd errors:

udevd[2083]: timeout: killing 'watershed sh -c '/sbin/lvm vgscan; /sbin/lvm vgchange -a y'' [4934]
 udevd[2083]: 'watershed sh -c '/sbin/lvm vgscan; /sbin/lvm vgchange -a y'' [4934] terminated by signal 9 (Killed)

Something was going wrong in the LVM/udev synchronisation, resulting in the deadlock or failure of resource manager, so I decided to bypass it by setting udev_sync parameter to 0 (zero) in the /etc/lvm/lvm.conf:

[...]
    activation {
    udev_sync = 0   # please read further - do it at your own risk
    [...]
}

This solution worked very well, LVM resources started coming up in the proper way after reboot and, still now, they are up & running and fully manageable by Pacemaker.

A couple of searches on the web showed me Ubuntu Bug #995645, that could be somehow related to my case. Of course, if LVM had been used for system storage many other problems could arise due to the lack of sync with udev, but that was not my case.

At the time of writing the bug is still confirmed but unassigned.

DNS-amplification attack reflection on backhaul circuit

As many of us already know, DNS amplification attacks are a big plague for who fights every day for the sake of Internet security and service availability.

Infected hosts are instructed by botnet controllers to send DNS queries to recursive open resolvers, asking them for big zones with spoofed UDP packets containing the victim’s IP address in the source field, so that a small request would generate a big traffic toward the victim.

Small efforts are needed in order to mitigate those attacks – a proper DNS resolvers configuration to avoid open recursion, IP source validation (such as Cisco uRPF) to block source IP spoofing at the access network layer – but they may not be sufficient to immunize a network against annoying issues.

An unpleasant side effect

Even on secure networks an unpleasant side effect may occur: attack reflection against infected hosts, with the consequent backhaul circuit saturation and users’ downstream degradation.

Take, for example, the following not uncommon scenario:

DNS Amplification Reflection - Scenario

An ISP, running a properly configured DNS resolver, connects many users with a shared backhaul link between its core network and a local metro area; one or more users have infected devices responding to a botnet C&C server who aims to launch a DDoS against a given target.

A well implemented network access layer would stop spoofed packets whose source IP can not be reached through the same link on which they came from. At the same time a properly configured DNS resolver would not let recursive queries to go on by untrusted sources. The problem raises when proper DNS queries came in from trusted users and go to the ISP DNS resolver.

A not-really-failed attack attempt

Failed Attack

In the above diagram, at step 1, the botnet controller instructs the infected host to start a DNS amplification attack against the victim’s IP address 1.2.3.4. In the step 2 the malicious software tries to send a spoofed packet containing the victim’s address in the source field but something goes wrong: the operating system doesn’t let the malware to forge such a packet and rewrites it using its LAN address, or the router/firewall/CPE changes it with the WAN IP address (NAT). Anyway, at step 3, a proper DNS query comes out the user’s network and heads to the ISP DNS resolver, which in turn sends back a response with the huge DNS zone (step 4).

It’s easy to understand how this behaviour could lead to ISP internal issues regarding the backhaul link saturation and the users experience’s deterioration.

Consequences

A small upstream user’s query (65 bytes for an ANY query on isc.org) produces a big downstream response (~ 4 KB for isc.org zone), with a ~ 60x multiplicative factor. Every infected host may send many and many queries over a long period, even more than 1 query per second for many days, and many compromised hosts may be triggered at the same time by the same botnet controller.

Backhaul links may be rent from incumbent local carriers and may be characterized by an overbooking ratio calculated over the expected usage by customers who share them; high speed links which connect DNS resolvers to the core may overwhelm them when filled by UDP response packets and lead to traffic stagnation because of traffic policing operated by the carrier.

Customers also may report a bad user experience: it’s true, their links are operating at 100% of their capacity, but Facebook is slow and the VoIP is unusable.

A very big headache, even for an ISP with a properly configured network.

Symptoms

The first symptom that can be observed is an abnormal peak in resolvers bandwidth usage:

DNS resolver bandwidth usage during an attack attempt - response traffic in green

DNS resolver bandwidth usage during an attack attempt – response traffic in green

During an attack attempt the network usage (servers’ upstream) may raise up to hundreds of times higher than average.

NetFlow also may help us to identify this kind of traffic; big response UDP datagrams may be fragmented over the network and they would be shown as port-0 UDP packets in the output of nfdump or similar tools, with an high Bpp (bytes-per-packet) ratio:

Proto Src IP Addr:Port  Dst IP Addr:Port   Packets    Bytes  pps     bps   Bpp Flows
UDP    RESOLVER_1:0   ->  A.B.1.155:0        78966  106.6 M   48  519300  1350    79
UDP    RESOLVER_1:0   ->   G.H.4.73:0        35798   48.3 M   25  274100  1350    38
UDP    RESOLVER_1:53  ->  I.J.5.101:14068     7430    9.3 M    4   46712  1249   187
A 65-bytes request generated a 4157 bytes response in 3 segments - calculated at IP level

A 65-bytes request generated a 4157 bytes response in 3 segments – calculated at IP level

Mitigation

Unfortunately, as far as I know, there are still no specific implementations aimed to mitigate those kind of attack.

BIND9 has a generic rate-limit option which prevents a requestor to be told the same answer more than a specific number of times within a one-second interval, but there is no way to apply it only to a subset of responses (like the ones used in DDoS attack, such as ANY to isc.org or ripe.net). DNS RRL (Response Rate Limiting) is focused on authoritative servers, not on recursive ones.

A suitable way would be the use of the iptables recent module on recursive resolvers, but other aspects have to be considered, such as servers load and performances degradation.
A first deep-packet inspection of the incoming DNS requests would filter those DNS queries whose type has been set to ANY, then the recent module would lookup the source IP address on a local list and drop the packet if it violates the predetermined policy. For example, a policy may allow one or two queries with type = ANY every 5 seconds, so that “regular” usage would be allowed while malware initiated traffic would be dropped within few seconds.

Number of different IP addresses on the recent module's queue - peak during an attack attemp

Number of different IP addresses on the recent module’s queue – peak during an attack attemp

References

“Alert (TA13-088A) DNS Amplification Attacks”, US-Cert: http://www.us-cert.gov/ncas/alerts/TA13-088A

“DNS Response Rate Limiting (DNS RRL)”, Paul Vixie, ISC – Vernon Schryver, Rhyolite: http://ss.vix.su/~vixie/isc-tn-2012-1.txt