November 19, 2012

Using the EC2 High I/O instance SSDs

At $3.1 per hour, the Amazon EC2 SSD option (called High I/O Quadruple Extra Large) isn't exactly cheap. What it is though? Fast. So that is a great way to try out the impact of a large(-ish) storage with low predictable latency and its effect on your product or application.

I had a chance to give it a whirl today so here are my first impressions.
Amazon makes the 2 1TB SSD drives available as /dev/sdf and /dev/sdg. For the sake of sheer performance, we went with a RAID 0 setup.

[root@ip-10-155-240-195 ~]#mdadm --create --verbose /dev/md/ssd --level=stripe --raid-devices=2 /dev/sdf /dev/sdg

devices=2 /dev/sdf /dev/sdg
mdadm: chunk size defaults to 512K
mdadm: Defaulting to version 1.2 metadata
mdadm: array /dev/md/ssd started.


[root@ip-10-155-240-195 ~]# mkfs.ext3 /dev/md/ssd 
mke2fs 1.42 (29-Nov-2011)
Filesystem label=
OS type: Linux
Block size=4096 (log=2)
Fragment size=4096 (log=2)
Stride=128 blocks, Stripe width=256 blocks
134217728 inodes, 536870400 blocks
26843520 blocks (5.00%) reserved for the super user
First data block=0
Maximum filesystem blocks=4294967296
16384 block groups
32768 blocks per group, 32768 fragments per group
8192 inodes per group
Superblock backups stored on blocks: 
32768, 98304, 163840, 229376, 294912, 819200, 884736, 1605632, 2654208, 
4096000, 7962624, 11239424, 20480000, 23887872, 71663616, 78675968, 
102400000, 214990848, 512000000

Allocating group tables: done                            
Writing inode tables: done                            
Creating journal (32768 blocks): done
Writing superblocks and filesystem accounting information: done       

[root@ip-10-155-240-195 ~]# mount /dev/md/ssd /ssd
[root@ip-10-155-240-195 ~]# df -h /ssd
Filesystem            Size  Used Avail Use% Mounted on
/dev/md127            2.0T  199M  1.9T   1% /ssd

tadaaaa! now we have 2TB of SSD goodness.

on with testing now ...

November 12, 2012

Find duplicate files on OSX - one liner

As I was consolidating 15 years of photos onto a single disk, I ended up with a disk bloated with duplicates. Bloated might actually be a new level of understatement. And as I was looking into resolving this troublesome issue, I found that there was a wealth of commercial products simply detecting duplicates based on MD5 sums, which are trivial and cheap to compute. Not wanting to shell $30 or more to solve a simple problem, I googled around, as anyone would, to find a script that would do this on OSX but found mostly similar things for Linux.

So, time to dust out the awk/find/sed tools and see if I could crack that nut. Turns out it's trivial to do on OSX as well but the OSX tools dont have the same options as on Linux...

find . -type f -print0 | xargs -0 md5 | awk '{sub(/^MD5 \(/,"",$0);sub(/\) =/,"",$0);md5=$NF;$NF=""; print md5" "$0}'|tee /tmp/files.out|awk '{ print $1}'|sort|uniq -d >/tmp/dupes.out ; for i in `cat /tmp/dupes.out` ; do echo $i; awk '$1 ~ /^'$i'$/ {$1=" ";print} ' /tmp/files.out; echo;echo; done > /tmp/duplicate-files.txt; rm -f /tmp/files.out /tmp/dupes.out ; 

you can then simply cat the result file:
cat /tmp/duplicate-files.txt

pretty simple stuff, I can now clean my pix from useless duplicates!

November 11, 2012

backing up to a USB drive when plugged in

For some reason, I cannot use Time Machine. I have it set up and going but if I am for example, as I recently was, looking for a lost picture in iPhoto, how do you find it with Time Machine ?

So I set out to complement Time Machine with a "simpler" back up. Not simpler to set up, granted, but simpler in the sense that it just copies files over to another disk. That's it. Just somewhere else.

So ... I stumbled upon this article describing out to use launchd to trigger a script when a "watched" folder's state changes. Their solution works great, here's how I implemented mine: same recipe, you will want a plist to kick the backup script and the backup script itself, using rsync.

I used ~/Documents/perso/com.arnaudlacour.osx.backup.plist and ~/Documents/perso/backup
That's because I'm going to use that to back up ~/Documents/perso to a USB disk called OPALE.

here's the plist:


 
         Label
         com.arnaudlacour.osx.backup
         LowPriorityIO
         
         Program
         /Users/arno/Documents/perso/backup
         ProgramArguments
         
   backup
                 OPALE
         
         WatchPaths
         
          /Volumes
         
 

And here is the backup script:

#!/bin/bash -llog=/var/log/com.arnaudlacour.osx.backupecho "`date` - starting back up to ${1}" >> $logfrom=~/Documents/persoto=/Volumes/${1}if ! test -z "$1" && test -e "${to}" ; then  # WAIT some time until the disk is ready  sleep 10
  # Start rsync'ing to the target  rsync -ahmuv --progress ${from} ${to} 2>&1 >> ${log}else echo "target [${to}] not valid" > ${log}
fi echo "`date` - done" >> ${log}
then it is a simple matter of registering the plist in launchd
launchctl load ~/Documents/perso/com/arnaudlacour.osx.backup

When I then plug my USB drive and OSX mounts it, the script kicks in:

Sun Nov 11 16:38:18 MST 2012 - starting back up to OPALE
rsync -ahmuv --delete --progress /Users/arno/Documents/perso /Volumes/OPALE
building file list ... 
37419 files to consider

sent 670.76K bytes  received 20 bytes  268.31K bytes/sec
total size is 24.61G  speedup is 36691.20
Sun Nov 11 16:38:31 MST 2012 - done

Incremental backup upon USB Drive plug-in

EDIT: it's worth noting that rsync will prefer the source and destination partitions to be of the same type. I tried from HFS+ to FAT and it ended up rsync'ing from scratch every time without doing the incremental backup because FAT does not preserve the HFS metadata. Just sayin'

November 9, 2012

OSX swapping ... or not


After my wife's repeated complaints on the performance of our family iMac, I have had to do some housecleaning on it and found out that it was spending an inordinate amount of time swapping. And having looked into that in more detail, I also realized you couldn't control OSX swapping policies the way you can on Linux so I did a number of mundane cleaning operations but the most effective seem to have been the following: forcing OSX to purge at regular intervals. It seems like the operating system ought to do a better job of it but this simple step has helped limit the amount of swap consumed. I ended up implementing this on all our macs, irrespective of whether they showed signs of swap strain or not.

featherweight:~ arno$ sudo crontab -l
Password:
*/15 * * * * purge

November 7, 2012

changing motd on amazon EC2 linux

Having changed /etc/motd on my EC2 AMI only to find the original message upon the next spawn, I got annoyed and looked into making the change permanent, as I originally intended.

It turns out that the message of the day is generated by a script, namely /usr/sbin/update-motd. Further, the banner displayed is stored in /etc/update-motd.d/30-banner.

All I had to do now was edit this one file and let it be. Easy!

November 6, 2012

EC2 t1.micro overprovisioning

I was playing with a bunch of instances today and trying to keep budget low as I was pushing upwards of 20 EC2 instances, I spawned a t1.micro instance to generate some load on my servers.
This is what I got:


Roughly 87% steal!

Meaning the hypervisor is gently telling me to go to hell 9 times out of 10.

There must be a bunch of these micro instances running... Actually what I think is happening is amazon must be using the t1.micro format to try and fill the gaps on otherwise underutilized racks. In the end, the quest for maximizing resources ends up with a massively underwhelming experience for this instance type.
Fortunately, the rest of the infrastructure is second to none.