Archive

Archive for January, 2013

SRP tools problems

January 12, 2013 Leave a comment

It’s like rain on your wedding day
It’s a free ride when you’ve already paid
It’s the good advice that you just didn’t take
Who would’ve thought… it figures

(Alanis Morissette – Ironic)

SRP stands for SCSI RDMA Protocol. If you still don’t know what it is – a protocol that allows you to connect to SCSI devices attached to other computer, over Remote Direct Memory Access. Remote-Direct? Yeah, oxymoron. So if you wish to use RDMA, underlying network has to support it. So far, the most common usage is on InfiniBand networks.
I had a chance to play with SRP target / SRP client connection, and from my impressions – whole SRP field is still – let’s put it this way – not tested enough. I wasn’t impressed with what I saw. Software isn’t mature yet… But, since I’ve already started, lets dive in.

My distro of choice for production environment is CentOS, so I’ll talk about implementations available on CentOS. So, if you have Infiniband adapter, how should you start?

# yum install rdma infiniband-diags mstflint qperf
# yum install librdmacm libmlx4 libmthca srptools opensm
# /etc/init.d/rdma start
# /etc/init.d/opensm start

As you can notice, I’m using mlx4 adapter. And now the fun starts.
I have the targets already set up on port 1 of two-port Mellanox adapter. Port 2 is not connected (even IB cables cost as hell), so I won’t be using multipathing. I just want to simply connect to the target and use the disk. There is ibsrpdm command that allows you to scan the network for available targets, so we can use it:

# ibsrpdm -c
id_ext=0002c9030051060c,ioc_guid=0002c9030051060c,\
dgid=fe800000000000000002c9030051060d,pkey=ffff,service_id=0002c9030051060c

The output of the command can be fed to your ‘/sys/class/infiniband_srp/srp-_0-/add_target‘, and client will connect to target and see all the disks available to him on that target.
Now, to automate this task for you, and to do periodic re-scans, you can use srp daemon. At least that documentation says.

First, let me describe the way daemon starts on CentOS.
After you start the init script, it calls another bash script as daemon, ‘/usr/sbin/srp_daemon.sh‘. This script tries to trap handles, manage log, and after that, goes through all available adapters and ports, and runs new bash script, named ‘run_srp_daemon‘, for each of them. And finally, ‘run_srp_daemon‘ actually runs ‘srp_daemon‘ binary.

First problem I’ve noticed is that ‘srp_daemon‘ is called for available interfaces, no matter if they are up or down. This results in these errors:

25/10/12 17:34:58 : No response to inform info registration
25/10/12 17:34:58 : Fail to register to traps, maybe there is no opensm running on fabric
25/10/12 17:34:58 : SM LID is 0, maybe no opensm is running

Next thing I decided was to grab and unpack Debian Sid package, to see if how they did it. I’ve found a piece of code that checks if the interfaces is actually up and running before starting the daemon. So I ported that code to RHEL script. So that fixed that problem, but now I was onto new issue. ‘srp_daemon‘ just didn’t add targets at all. If I run the following command:

for i in `/usr/sbin/ibsrpdm -c`; do \
 echo $i > /sys/class/infiniband_srp/srp-mlx4_0-1/add_target;\
done

all the block devices are added and registered correctly, and can be seen in ‘fdisk -l‘ output. So, for a first couple of days I was running this command in ‘/etc/rc.local‘ script, but eventually get pissed off and decided to debug this thing further. So, running ‘srp_daemon‘ in verbose mode (-V) shows following information that can be seen as useful:

enter do_port
Found an SRP target with id_ext 0002c90300510648 - check if it allowed by rules file
Found an SRP target with id_ext 0002c90300510648 - check if it is already connected
id_ext=0002c90300510648,ioc_guid=0002c90300510648,dgid=fe800000000000000002c90300510649,pkey=ffff,service_id=0002c90300510648
Adding target returned 156

Now, this returned 156 sounds like something didn’t go quite right. So, I’ve fetched source code of srptools, and applied this patch:

--- srptools-0.0.4/srp_daemon/srp_daemon.c	2009-08-30 15:56:11.000000000 +0200
+++ srp_daemon.c	2013-01-12 03:11:09.000000000 +0100
@@ -183,6 +183,7 @@
 		}
 		ret = write(fd, target_str, strlen(target_str));
 		pr_debug("Adding target returned %d\n", ret);
+		pr_debug("Target string: %s\n", target_str);
 		close(fd);
 	}
 }

This actually lead me to see that srp_daemon is trying to add target with initiator_ext=… So, my solution was to erase ‘-n’ option. Finally, the patch looks like this:

--- srp_daemon.sh	2013-01-12 02:17:37.000000000 +0100
+++ srp_daemon.sh_patched	2013-01-12 02:17:52.000000000 +0100
@@ -108,8 +108,14 @@
 do
     for port in `/bin/ls -1 ${ibdir}/${hca_id}/ports/`
     do
-        ${prog} -e -c -n -i ${hca_id} -p ${port} -R ${retries} ${params}&
-        pids="$pids $!"
+        STATUS=`/usr/sbin/ibstat $hca_id $port | grep "State:"`
+        if [ "$STATUS" = "State: Active" ] ; then
+            ${prog} -e -c -i ${hca_id} -p ${port} -R ${retries} ${params}&
+            pids="$pids $!"
+        fi 
     done
 done

Now, the script was starting ‘srp_daemon‘ only if interface was up, and targets were added correctly. Logs were finally quiet.

But, next problem came after the reboot. Daemon didn’t start at all, although it was in the correct runlevel. After some more reboots of machine and few dozen restarts of services I’ve noticed that the issue was with ‘opensm‘ slow warmup… Now, init script of Subnet Manager has a ‘sleep‘ command after the start procedure, but it sleeps only for a second. Increasing that to 18 seconds did the trick. Yeah, I know, pretty much! But servers these days tend to do a lot of self-checks at each startup, so some additional seconds to allow ‘srp_daemon‘ to start sanely isn’t all that much. I’ve opened a bug report: https://bugzilla.redhat.com/show_bug.cgi?id=894546 , and here’s my patch:

--- opensm	2013-01-12 02:40:35.000000000 +0100
+++ opensm_patched	2013-01-12 03:04:52.000000000 +0100
@@ -61,7 +61,7 @@
     else
         $prog -B $prio >/dev/null 2>&1
     fi
-    sleep 1
+    sleep 20
     OSM_PID=`pidof $prog`
     checkpid $OSM_PID
     RC=$?

I will write about SCST target daemon in another post so stay tuned ūüėČ

Advertisements

Finding UID that is generating traffic

January 8, 2013 Leave a comment

And though our hearts are broken
We have to wipe the tears away
In vain they did not suffer
Ten Thousand Strong will seize the day

(Iced Earth – Ten Thousand Strong)

Did you have one of those days when you notice strange traffic in your firewall logs and don’t know who is responsible for it? Is your machine compromised, or is it a legitimate traffic? Or maybe your server ends on SPAM blacklists every now and then although mail.log is as clean as your car? Well, first step in this case is to find out what UID is responsible for the suspicious traffic.
Iptables on Linux offers owner match, which works on OUTPUT chain only and attempts to match characteristics of the packet creator. Offcourse this works only for locally-generated packets. So, in this example we’ll try to match UID of the user that’s sending strange traffic. First off all, let’s enumerate all UIDs for running processes:

# ps -ef n | grep -v UID | sed 's/^\s*//' | cut -d' ' -f1 | sort | uniq
0
25
27
29
32
38
43
482
487
488
490
501
502
89
91
97
99

Next step is to generate iptables rules in OUTPUT chain to log outgoing connections. Let’s suppose we want to focus on packets going to destination port SMTP (TCP/25), because we’re suspicious about someone sending mails directly, without using local MTA. We can achieve this by running:

# for i in \
`ps -ef n | grep -v UID | sed 's/^\s*//' | cut -d' ' -f1 | sort | uniq`; \
do \
  iptables -A OUTPUT \
    -m owner --uid-owner $i \
    -p tcp --dport 25 \
    -j ULOG --ulog-prefix "GENERATED BY UID $i: "; \
done

With iptables populated, we can relax, lay back into comfortable chair and tail the log:

# tail -f /var/log/ulog/ulog/syslogemu | grep "GENERATED BY UID"
Dec  6 17:07:10 hostname GENERATED BY UID 502:  IN= OUT=eth0 MAC=
  SRC=local_ip DST=64.12.90.33 LEN=60 TOS=00 PREC=0x00 TTL=64 ID=1285
  DF PROTO=TCP SPT=54558 DPT=25 SEQ=1228047343 ACK=0 WINDOW=5840 SYN URGP=0 
Dec  6 17:07:10 hostname GENERATED BY UID 502:  IN= OUT=eth0 MAC=
  SRC=local_ip DST=65.55.92.152 LEN=60 TOS=00 PREC=0x00 TTL=64 ID=21290
  DF PROTO=TCP SPT=52895 DPT=25 SEQ=2552747462 ACK=0 WINDOW=5840 SYN URGP=0  
Dec  6 17:07:10 hostname GENERATED BY UID 502:  IN= OUT=eth0 MAC=
  SRC=local_ip DST=173.194.69.26 LEN=60 TOS=00 PREC=0x00 TTL=64 ID=46380 CE
  DF PROTO=TCP SPT=46744 DPT=25 SEQ=314520542 ACK=0 WINDOW=5840 SYN URGP=0 
Dec  6 17:07:10 hostname GENERATED BY UID 502:  IN= OUT=eth0 MAC=
  SRC=local_ip DST=173.194.69.26 LEN=52 TOS=00 PREC=0x00 TTL=64 ID=46381 CE
  DF PROTO=TCP SPT=46744 DPT=25 SEQ=314520543 ACK=814882206 WINDOW=46 ACK URGP=0 
Dec  6 17:07:10 hostname GENERATED BY UID 502:  IN= OUT=eth0 MAC=
  SRC=local_ip DST=98.139.54.60 LEN=60 TOS=00 PREC=0x00 TTL=64 ID=57227 CE
  DF PROTO=TCP SPT=54942 DPT=25 SEQ=2517129359 ACK=0 WINDOW=5840 SYN URGP=0 
Dec  6 17:07:10 hostname GENERATED BY UID 502:  IN= OUT=eth0 MAC=
  SRC=local_ip DST=173.194.69.26 LEN=52 TOS=00 PREC=0x00 TTL=64 ID=46382 CE
  DF PROTO=TCP SPT=46744 DPT=25 SEQ=314520543 ACK=814882251 WINDOW=46 ACK URGP=0 
Dec  6 17:07:10 hostname GENERATED BY UID 502:  IN= OUT=eth0 MAC=
  SRC=local_ip DST=173.194.69.26 LEN=58 TOS=00 PREC=0x00 TTL=64 ID=46383 CE
  DF PROTO=TCP SPT=46744 DPT=25 SEQ=314520543 ACK=814882251 WINDOW=46 ACK PSH URGP=0  
Dec  6 17:07:10 hostname GENERATED BY UID 502:  IN= OUT=eth0 MAC=
  SRC=local_ip DST=65.54.188.110 LEN=60 TOS=00 PREC=0x00 TTL=64 ID=54361 CE
  DF PROTO=TCP SPT=44414 DPT=25 SEQ=3003601745 ACK=0 WINDOW=5840 SYN URGP=0

OK, so we’ve found out the culprit!

Note that we only monitor UIDs that have running processes. Wether or not to log all the existing UIDs on local system is out of scope of this article, and depends on each particular case.

Hope you guys enjoyed it and see you guys next time (by my favorite e-sports commentator – Husky) ūüėČ

Categories: Linux, Security Tags: ,

Expanding ZFS zpool RAID

January 1, 2013 6 comments

I’m big fan of ZFS and all the volume managing options it offers. It’s often that ZFS makes hard things easy and impossible things possible. In an era of ever growing data sets, sysadmins are regularly pressed with the need to expand volumes. While this may be easy to accomplish in an enterprise environment with IBM or Hitachi storage solutions, problems arise on mid and low level servers. Most often expanding volumes means online rsync to new data pool, then another rsync while the production system is down and finally putting new system to production. ZFS makes this process a breeze.

Here is one example where ZFS really shines. Take a look at the following pool:

# zfs list | grep "tank "
tank                        1.75T  31.4G  40.0K  /tank

and it’s geometry:

# zpool status tank
  pool: tank
 state: ONLINE
 scan: none requested
config:

        NAME        STATE     READ WRITE CKSUM
        tank        ONLINE       0     0     0
          raidz1-0  ONLINE       0     0     0
            c2t0d0  ONLINE       0     0     0
            c2t1d0  ONLINE       0     0     0
            c2t2d0  ONLINE       0     0     0
          raidz1-1  ONLINE       0     0     0
            c2t3d0  ONLINE       0     0     0
            c2t4d0  ONLINE       0     0     0
            c2t5d0  ONLINE       0     0     0

errors: No known data errors

1.75TiB pool is slowly getting filled. As you see, it’s 6 disk pool consisted of two RAID-Z’s in stripe. It is approximate of RAID 50, in normal RAID nomenclature. It’s a lot of data to rsync over, isn’t it? Well, ZFS offers one neat solution. We could replace single disk with a bigger one, rebuild RAID and repeat the procedure 6 times. Finally, after last rebuild, we could ‘grow’ the pool to new size. In this particular case I decided to replace 500GB Seagate’s with 2TB Western Digital drives.

This is how the pool looks like after disk c2t5d0 is replaced with 2TB drive:

# zpool status tank
  pool: tank
 state: DEGRADED
status: One or more devices could not be opened.  Sufficient replicas exist for
        the pool to continue functioning in a degraded state.
action: Attach the missing device and online it using 'zpool online'.
   see: http://www.sun.com/msg/ZFS-8000-2Q
 scan: resilvered 449G in 7h9m with 0 errors on Mon Dec 24 20:58:51 2012
config:

        NAME        STATE     READ WRITE CKSUM
        tank        DEGRADED     0     0     0
          raidz1-0  ONLINE       0     0     0
            c2t0d0  ONLINE       0     0     0
            c2t1d0  ONLINE       0     0     0
            c2t2d0  ONLINE       0     0     0
          raidz1-1  DEGRADED     0     0     0
            c2t3d0  ONLINE       0     0     0
            c2t4d0  ONLINE       0     0     0
            c2t5d0  UNAVAIL      0     0     0  cannot open

errors: No known data errors

Now we need to tell ZFS to rebuild the pool:

# zpool replace tank c2t5d0 c2t5d0

After this command, rebuild process will start. Few hours later, state of the system is:

# zpool status tank
  pool: tank
 state: DEGRADED
status: One or more devices is currently being resilvered.  The pool will
        continue to function, possibly in a degraded state.
action: Wait for the resilver to complete.
 scan: resilver in progress since Tue Jan  1 14:43:22 2013
    91.6M scanned out of 2.63T at 3.52M/s, 217h26m to go
    91.6M scanned out of 2.63T at 3.52M/s, 217h26m to go
    14.5M resilvered, 0.00% done
config:

        NAME              STATE     READ WRITE CKSUM
        tank              DEGRADED     0     0     0
          raidz1-0        ONLINE       0     0     0
            c2t0d0        ONLINE       0     0     0
            c2t1d0        ONLINE       0     0     0
            c2t2d0        ONLINE       0     0     0
          raidz1-1        DEGRADED     0     0     0
            c2t3d0        ONLINE       0     0     0
            c2t4d0        ONLINE       0     0     0
            replacing-2   DEGRADED     0     0     0
              c2t5d0/old  FAULTED      0     0     0  corrupted data
              c2t5d0      ONLINE       0     0     0  (resilvering)

errors: No known data errors

After the process is finished, pool will look something like this:

# zpool status tank
  pool: tank
 state: ONLINE
 scan: resilvered 449G in 7h1m with 0 errors on Tue Jan  1 21:44:37 2013
config:

        NAME        STATE     READ WRITE CKSUM
        tank        ONLINE       0     0     0
          raidz1-0  ONLINE       0     0     0
            c2t0d0  ONLINE       0     0     0
            c2t1d0  ONLINE       0     0     0
            c2t2d0  ONLINE       0     0     0
          raidz1-1  ONLINE       0     0     0
            c2t3d0  ONLINE       0     0     0
            c2t4d0  ONLINE       0     0     0
            c2t5d0  ONLINE       0     0     0

errors: No known data errors

Once when all disks are replaced, only thing needed to grow the pool is to set autoexpand to on. If it was already on, then first turn it to off and then turn it again to on to grow the pool:

# zfs list | grep "tank "
tank                        1.75T  31.4G  40.0K  /tank
# zpool set autoexpand=off tank
# zpool set autoexpand=on  tank
# zfs list | grep "tank "
tank                        1.75T  5.40T  40.0K  /tank

And that’s it! We’ve grown striped 2x RAID-Z configuration from 500GB drives to 2TB drives, growing the total size from 1.7 TB to 7.10 TB. Enjoy the wonders of ZFS!

Categories: Solaris, Storage Tags: , , , , ,
%d bloggers like this: