December 31, 2012

Super simple D3 line with axis

Following last week's post, I continued looking at D3 and this time we're adding an axis, very simple again, just to get things started ... here goes again:

<meta charset="UTF-8">
<title>super simple d3 line with axis</title>
.line {
 fill: none;
 stroke: #00F;
 stroke-width: 2px;
.axis path,
.axis line {
   fill: none;
   stroke: black;
   shape-rendering: crispEdges;

.axis text {
   font-family: sans-serif;
   font-size: 10px;
<script src="js/jquery-1.7.2.min.js"></script>
<script src="js/d3.v3.min.js"></script>

var data = [0,2,4,6,8,10,8,6,4,2,4,6,8,6,4,6,6,6,6],
width = 400,
   height = 200,
   padding = 20,
x = d3.scale.linear().domain([0, data.length]).range([padding, padding+width]),
   y = d3.scale.linear().domain([0, d3.max(data)]).range([padding+height,padding]);

 add a visualisation,
 select the body tag,
 join the data array to to the body tag for our next data driven moves,
 append an svg container tag,
 give it the right width,
 give it the right height as well,
var visualisation = d3

this is the line to put on the graph
append a path tag as a child of the svg tag we added to the body tag,
apply css style to make the line blue
   .attr("class", "line")
    .x(function(d,i) {return x(i);})
    .y(function(d) {return y(d);})
    this is the line to represent an axis:
    - it is drawn in an SVG g tag
    - set the class to "axis" for that g tag
    - translate the coordinates down to the bottom of the graph
    - call a D3 axis function scaled to the right domain and range with 5 ticks
.attr("transform", "translate(0," + (height+padding) + ")")

You get:

I found that Scott Murray's tutorial was really well done. Check it out:

December 24, 2012

Super duper simple D3 line graph

I have been trying to find a REALLY stupid sipmle example of a line graph with d3js to no avail, so here is a 3-liner...

<title>super simple d3 line</title>
.line {
  fill: none;
  stroke: #00F;
  stroke-width: 2px;
<script src=""></script>
<script src=""></script>

var data = [0,2,4,6,8,10,8,6,4,2,4,6,8,6,4,6,6,6,6],
width = 300,
    height = 180,
x = d3.scale.linear().domain([0, data.length]).range([0, width]),
    y = d3.scale.linear().domain([0, d3.max(data)]).range([height,0]);

  add a visualisation,
  select the body tag,
  join the data array to to the body tag for our next data driven moves,
  append an svg container tag,
  give it the right width,
  give it the right height as well,
var visualisation = d3

this is the line to put on the graph
append a path tag as a child of the svg tag we added to the body tag,
apply css style to make the line blue
    .attr("class", "line")
    .attr("d", d3.svg.line()
    .x(function(d,i) {return x(i);})
    .y(function(d) {return y(d); }));

And you get ... this

December 7, 2012

enter a path in OSX finder

Command + Shift + G is the charm ...
courtesy of

November 19, 2012

Using the EC2 High I/O instance SSDs

At $3.1 per hour, the Amazon EC2 SSD option (called High I/O Quadruple Extra Large) isn't exactly cheap. What it is though? Fast. So that is a great way to try out the impact of a large(-ish) storage with low predictable latency and its effect on your product or application.

I had a chance to give it a whirl today so here are my first impressions.
Amazon makes the 2 1TB SSD drives available as /dev/sdf and /dev/sdg. For the sake of sheer performance, we went with a RAID 0 setup.

[root@ip-10-155-240-195 ~]#mdadm --create --verbose /dev/md/ssd --level=stripe --raid-devices=2 /dev/sdf /dev/sdg

devices=2 /dev/sdf /dev/sdg
mdadm: chunk size defaults to 512K
mdadm: Defaulting to version 1.2 metadata
mdadm: array /dev/md/ssd started.

[root@ip-10-155-240-195 ~]# mkfs.ext3 /dev/md/ssd 
mke2fs 1.42 (29-Nov-2011)
Filesystem label=
OS type: Linux
Block size=4096 (log=2)
Fragment size=4096 (log=2)
Stride=128 blocks, Stripe width=256 blocks
134217728 inodes, 536870400 blocks
26843520 blocks (5.00%) reserved for the super user
First data block=0
Maximum filesystem blocks=4294967296
16384 block groups
32768 blocks per group, 32768 fragments per group
8192 inodes per group
Superblock backups stored on blocks: 
32768, 98304, 163840, 229376, 294912, 819200, 884736, 1605632, 2654208, 
4096000, 7962624, 11239424, 20480000, 23887872, 71663616, 78675968, 
102400000, 214990848, 512000000

Allocating group tables: done                            
Writing inode tables: done                            
Creating journal (32768 blocks): done
Writing superblocks and filesystem accounting information: done       

[root@ip-10-155-240-195 ~]# mount /dev/md/ssd /ssd
[root@ip-10-155-240-195 ~]# df -h /ssd
Filesystem            Size  Used Avail Use% Mounted on
/dev/md127            2.0T  199M  1.9T   1% /ssd

tadaaaa! now we have 2TB of SSD goodness.

on with testing now ...

November 12, 2012

Find duplicate files on OSX - one liner

As I was consolidating 15 years of photos onto a single disk, I ended up with a disk bloated with duplicates. Bloated might actually be a new level of understatement. And as I was looking into resolving this troublesome issue, I found that there was a wealth of commercial products simply detecting duplicates based on MD5 sums, which are trivial and cheap to compute. Not wanting to shell $30 or more to solve a simple problem, I googled around, as anyone would, to find a script that would do this on OSX but found mostly similar things for Linux.

So, time to dust out the awk/find/sed tools and see if I could crack that nut. Turns out it's trivial to do on OSX as well but the OSX tools dont have the same options as on Linux...

find . -type f -print0 | xargs -0 md5 | awk '{sub(/^MD5 \(/,"",$0);sub(/\) =/,"",$0);md5=$NF;$NF=""; print md5" "$0}'|tee /tmp/files.out|awk '{ print $1}'|sort|uniq -d >/tmp/dupes.out ; for i in `cat /tmp/dupes.out` ; do echo $i; awk '$1 ~ /^'$i'$/ {$1=" ";print} ' /tmp/files.out; echo;echo; done > /tmp/duplicate-files.txt; rm -f /tmp/files.out /tmp/dupes.out ; 

you can then simply cat the result file:
cat /tmp/duplicate-files.txt

pretty simple stuff, I can now clean my pix from useless duplicates!

November 11, 2012

backing up to a USB drive when plugged in

For some reason, I cannot use Time Machine. I have it set up and going but if I am for example, as I recently was, looking for a lost picture in iPhoto, how do you find it with Time Machine ?

So I set out to complement Time Machine with a "simpler" back up. Not simpler to set up, granted, but simpler in the sense that it just copies files over to another disk. That's it. Just somewhere else.

So ... I stumbled upon this article describing out to use launchd to trigger a script when a "watched" folder's state changes. Their solution works great, here's how I implemented mine: same recipe, you will want a plist to kick the backup script and the backup script itself, using rsync.

I used ~/Documents/perso/com.arnaudlacour.osx.backup.plist and ~/Documents/perso/backup
That's because I'm going to use that to back up ~/Documents/perso to a USB disk called OPALE.

here's the plist:


And here is the backup script:

#!/bin/bash -llog=/var/log/com.arnaudlacour.osx.backupecho "`date` - starting back up to ${1}" >> $logfrom=~/Documents/persoto=/Volumes/${1}if ! test -z "$1" && test -e "${to}" ; then  # WAIT some time until the disk is ready  sleep 10
  # Start rsync'ing to the target  rsync -ahmuv --progress ${from} ${to} 2>&1 >> ${log}else echo "target [${to}] not valid" > ${log}
fi echo "`date` - done" >> ${log}
then it is a simple matter of registering the plist in launchd
launchctl load ~/Documents/perso/com/arnaudlacour.osx.backup

When I then plug my USB drive and OSX mounts it, the script kicks in:

Sun Nov 11 16:38:18 MST 2012 - starting back up to OPALE
rsync -ahmuv --delete --progress /Users/arno/Documents/perso /Volumes/OPALE
building file list ... 
37419 files to consider

sent 670.76K bytes  received 20 bytes  268.31K bytes/sec
total size is 24.61G  speedup is 36691.20
Sun Nov 11 16:38:31 MST 2012 - done

Incremental backup upon USB Drive plug-in

EDIT: it's worth noting that rsync will prefer the source and destination partitions to be of the same type. I tried from HFS+ to FAT and it ended up rsync'ing from scratch every time without doing the incremental backup because FAT does not preserve the HFS metadata. Just sayin'

November 9, 2012

OSX swapping ... or not

After my wife's repeated complaints on the performance of our family iMac, I have had to do some housecleaning on it and found out that it was spending an inordinate amount of time swapping. And having looked into that in more detail, I also realized you couldn't control OSX swapping policies the way you can on Linux so I did a number of mundane cleaning operations but the most effective seem to have been the following: forcing OSX to purge at regular intervals. It seems like the operating system ought to do a better job of it but this simple step has helped limit the amount of swap consumed. I ended up implementing this on all our macs, irrespective of whether they showed signs of swap strain or not.

featherweight:~ arno$ sudo crontab -l
*/15 * * * * purge

November 7, 2012

changing motd on amazon EC2 linux

Having changed /etc/motd on my EC2 AMI only to find the original message upon the next spawn, I got annoyed and looked into making the change permanent, as I originally intended.

It turns out that the message of the day is generated by a script, namely /usr/sbin/update-motd. Further, the banner displayed is stored in /etc/update-motd.d/30-banner.

All I had to do now was edit this one file and let it be. Easy!

November 6, 2012

EC2 t1.micro overprovisioning

I was playing with a bunch of instances today and trying to keep budget low as I was pushing upwards of 20 EC2 instances, I spawned a t1.micro instance to generate some load on my servers.
This is what I got:

Roughly 87% steal!

Meaning the hypervisor is gently telling me to go to hell 9 times out of 10.

There must be a bunch of these micro instances running... Actually what I think is happening is amazon must be using the t1.micro format to try and fill the gaps on otherwise underutilized racks. In the end, the quest for maximizing resources ends up with a massively underwhelming experience for this instance type.
Fortunately, the rest of the infrastructure is second to none.

July 19, 2012

note to self

Permanently increasing the file descriptor limit (on OSX 10.7 Lion)
edit or create /etc/launchd.conf
limit maxfiles 65535 65535

reboot. done. 

May 2, 2012

The "WireShark" logger

In the UnboundID product stack has this very neat ability to extend the core functionality to meet needs that may be too specific to a customer to make sense to roll into the off-the-shelf products and also quickly address evolving needs.
One of our customers has most of their traffic wrapped in the shrouds of SSL encryption which by design make external observability impossible. They came to us with an ask of being able to replay actual production traffic to a staging environment where the data would be a snapshot of prod.
The idea is: 

  1. take a db snapshot from prod
  2. record prod traffic
  3. much later, restore prod DB
  4. replay traffic
While this isn't perfect in many respects, it's simple and ought to serve various purposes like evaluating how much impact something would actually have had in production on a given day. It may be a tuning, a configuration change, a change in hardware -CPU upgrade, memory increase, IO subsystem like moving to SSD or NAS, etc..- or explore migration scenario with actual traffic. For example, you can imagine comparing the production performance on day x WHILE UPGRADING THE INFRASTRUCTURE to make sure that the upgrade scenario would not impact the client applications, which in this case is a healthy 4,000-strong ecosystem.

So we did. We wrote a logger capable of dumping live traffic to a dedicated log and extreme care was taken to reduce the overhead. When replayed, the tool can process traffic in "actual time" -at the same pace it happened in prod when recorded- and even honoring the same numbers of connections used by clients. This is to ensure that the reproduced conditions are as close as possible to what happened in real life a the time of the recording.

In a later post, I will describe how that works in more detail.

January 31, 2012

The audit log with reversible operations

Though audit is eventually going to befall the enterprise, rarely do we see any of them take proactive action on the most precious of data: the identity store. It is a sad realization, if a matter-of-factly one, especially when you know there some neat features in the UnboundID product line squarely aimed at making auditing easier but a valuable risk hedging practice to deploy across the board. Here is the first step you can take to wrap your head around it:
  a) enable an audit log publisher, just so you can see what goes in the log
dsconfig create-log-publisher --publisher-name "File Based Audit Log" --type file-based-audit --set enabled:true --set suppress-internal-operations:true --set log-file:logs/audit --set "rotation-policy:24 Hours Time Limit Rotation Policy" --set "retention-policy:File Count Retention Policy" --set include-requester-ip-address:true --set use-reversible-form:true
  You can obviously also go do this on the web console or the dsconfig interactive cli. This just makes it an easy copy/paste.

   b) fire some operations
  we'll simply create an object here and delete it immediately afterwards, just so we get something in the new log...

# bin/ldapmodify -a -c

Arguments from tool properties file:  --hostname localhost --port 9389
--bindDN o=i --bindPasswordFile config/i.pwd

dn: cn=test.0,o=demo
objectclass: person
cn: test.0
sn: demo

# Processing ADD request for cn=test.0,o=demo
# ADD operation successful for DN cn=test.0,o=demo

# bin/ldapdelete cn=test.0,o=demo

Arguments from tool properties file:  --hostname localhost --port 9389

--bindDN o=i --bindPasswordFile config/i.pwd

Processing DELETE request for cn=test.0,o=demo
DELETE operation successful for DN cn=test.0,o=demo

  c) check out the audit log
# 21/Jan/2012:15:46:15.617 -0600; conn=10; op=1; clientIP=
dn: cn=test.0,o=demo
changetype: add
objectClass: person
objectClass: top
cn: test.0
sn: demo
ds-entry-unique-id:: IAa+rBsFTZi3P6lpUaFG3Q==
ds-update-time:: AAABNQI76kc=
ds-create-time:: AAABNQI76kc=
creatorsName: cn=Directory Manager,cn=Root DNs,cn=config
modifiersName: cn=Directory Manager,cn=Root DNs,cn=config

# 21/Jan/2012:15:48:03.092 -0600; conn=13; op=1; clientIP=
# Deleted entry attributes
# dn: cn=test.0,o=demo
# objectClass: top
# objectClass: person
# cn: test.0
# sn: demo
# ds-entry-unique-id:: IAa+rBsFTZi3P6lpUaFG3Q==
# ds-update-time:: AAABNQI76kc=
# ds-create-time:: AAABNQI76kc=
# creatorsName: cn=Directory Manager,cn=Root DNs,cn=config
# modifiersName: cn=Directory Manager,cn=Root DNs,cn=config
# subschemaSubentry: cn=schema
# entryUUID: 2006beac-1b05-4d98-b73f-a96951a146dd
# createTimestamp: 20120121214615.495Z
# modifyTimestamp: 20120121214615.495Z
# ds-entry-checksum: 3969919676
dn: cn=test.0,o=demo
changetype: delete

OK so what do we have here? Both operations are reported with enough information to review at a later time. Note too that the delete operation doesn't merely contain the delete instruction but also the contents of the object being deleted, in the event that this was an operator or application error, you would have a simple way to revert the change and restore the original entry without fear of losing precious information.

I invite you to get more familiar with this type of logger, as it can be tweaked to do a lot more. For example you may restrict the logger to only log traffic satisfying certain criteria on the connection (IP, security and such), the type of request and/or the results being fetched(a certain type of sensitive data, entry, ...).
It's a very powerful framework, give it a whirl some time.

January 26, 2012

Figuring out where the server spends time

The UnboundID products offer unique observability features to help understand how deployments behave and where to improve.
One of them is the sup-operation timer, a facility allowing to peek into processing times for operations of interest.
First, we need to enable it:
$ dsconfig set-global-configuration-prop --set enable-sub-operation-timer:true
$ dsconfig create-log-publisher --publisher-name "Operation Timing Access Log" --type operation-timing-access --set enabled:true --set log-file:logs/operation-timing --set "rotation-policy:24 Hours Time Limit Rotation Policy" --set "retention-policy:File Count Retention Policy"

Then we'll search the database so we have something to look at

# bin/ldapsearch -b o=demo "(uid=user.0)"
/usr/bin/tput: unknown terminal "xterm-256color"
/usr/bin/tput: unknown terminal "xterm-256color"

Arguments from tool properties file:  --hostname localhost --port 9389
--bindDN o=i --bindPasswordFile config/i.pwd

dn: uid=user.0,ou=People,o=demo
objectClass: top
objectClass: person
objectClass: organizationalPerson
objectClass: inetOrgPerson
postalAddress: Aaren Atp$01251 Chestnut Street$Panama City, DE  50369
postalCode: 50369
description: This is the description for Aaren Atp.
uid: user.0
userPassword: {SSHA}Xbl7h2+tKwok2OQ2CsPcI1QmWpdZfyHkvGp+Ng==
employeeNumber: 0
initials: ASA
givenName: Aaren
pager: +1 779 041 6341
mobile: +1 010 154 3228
cn: Aaren Atp
sn: Atp
telephoneNumber: +1 685 622 6202
street: 01251 Chestnut Street
homePhone: +1 225 216 5900
l: Panama City
st: DE

Finally, we can look at how this operation was processed and where the server spent the time. It is important to note that the times here are in microseconds except for "etime" (in milliseconds), the decimals would therefore be nanoseconds, providing the most fine-grained information on processing time we can to help you figure out where you will get the most bang for the buck trying to optimize your service. Let's have a look:

# vi logs/operation-timing 


Some subtitles may help, so let's look at a few things:
etime -highlighted in blue- stands for elapsed time. It totals at 38 microseconds here.
That is the total time between the server being handed the tcp packet(s) by the operating system, processing the request and returning a response. So it is the server-side perception of the elapsed time.

 As you can see, the most time consuming phase in the processing of this operation was actually putting the result back on the wire to get the result back to the client. It may seem like a small number but if you put that in perspective, since the entire processing time was 38 microseconds, spending 19 to send the result to the client means we are effectively spending 50% of the time waiting for I/O, which is indeed quite enormous.

So that is all, just one more thing to help get to the bottom  of performance optimizations...

January 24, 2012

using tools properties for easy day-to-day operations

You can use the file in the /config directory for either OpenDS, OpenDJ or any UnboundID product (DS, Proxy or Sync) to make your life tremendously easier when dealing with your regular instance.
Here's how, simply enter your regular parameters like so:


then invoking dsconfig will be as easy as just typing the command, all the parameters will be automatically picked up from the properties file:

# bin/dsconfig
Arguments from tool properties file:  --useNoSecurity true --hostname
localhost --port 9389 --bindDN o=i --bindPasswordFile config/i.pwd

>>>> UnboundID Directory Server configuration console main menu

What do you want to configure?

    1)  Backend               7)   Log Retention Policy
    2)  Connection Handler    8)   Log Rotation Policy
    3)  Global Configuration  9)   Password Generator
    4)  Local DB Index        10)  Password Policy
    5)  Location              11)  Password Validator
    6)  Log Publisher         12)  Work Queue

    o)  'Basic' objects are shown - change this
    q)  quit

Enter choice: 


January 23, 2012

2 JeeNodes go to a bar ...

Well, we are 1/12th into 2012 now and I have been dutifully implementing JCW recommendation for the new year of wisely using my 5,000 waking hours and I am happy to report that I have had my setup controlling my HVAC and reporting the temperatures on pachube (I even fixed some bugs in the jPachube library while I was at it, open source fun) for 5 months now, despite some initial skepticism from some members of the Jee community which I am happy to be part of, however small my participation and time allotment to it is.

The funny part is there have been some glitches, one of them was that the JeeLink totally disappeared from Linux, like the USB port had nothing plugged in it. But that is the beauty of internet, I was able to fix everything remotely after that...

anyway, I also have been working on a very simple little GWT application to make a very nice, though simple, UI for the off-the-shelf sketch loaded on the JeeNodes to allow anyone to use them with point and click ease. I will share after having -slowly, mind you- spent time on it to make it shareable.

January 21, 2012

Installing a modern DS in one fell swoop

for quick demo purpose ...
  a) installs and starts the UnboundID server with a 4g heap, startTLS and SSL enabled serving a self-signed cert.

$./setup --cli --no-prompt --acceptLicense --ldapPort 9389 --ldapsPort 9636 --generateSelfSignedCertificate --enableStartTLS --baseDN o=demo --sampleData 9 --aggressiveJVMTuning --maxHeapSize 4g --rootUserDN o=i --rootUserPassword p

UnboundID Directory Server
Please wait while the setup program initializes...
Configuring Directory Server ..... Done
Configuring Certificates ..... Done
Importing Automatically-Generated Data (9 Entries) ....... Done
Starting Directory Server ........ Done

Warning: the collect-support-data tool is used to collect information about
your system for support purposes.  The following commands invoked by this tool
were not found in the system path.  You should consider installing them in the
event that you need support for this system:  patchadd

To see basic server configuration status and configuration you can launch

See /ds/arno/UnboundID-DS/logs/tools/setup.log for a detailed log of this

  b) installs and starts the either the Oracle or ForgeRock server, startTLS and SSL enabled serving a del-signed cert.
   b-1) ForgeRock:
$ ./setup --cli --no-prompt --ldapPort 9389 --ldapsPort 9636 --generateSelfSignedCertificate --enableStartTLS --baseDN o=demo --sampleData 9  --rootUserDN o=i --rootUserPassword p

OpenDJ 2.4.4
Please wait while the setup program initializes...

See /var/folders/f5/5bftn0fs2_7_8ct2bqf58h0h0000gn/T/opends-setup-3213346674147849266.log for a detailed log of this operation.

Configuring Directory Server ..... Done.
Configuring Certificates ..... Done.
Importing Automatically-Generated Data (9 Entries) ....... Done.
Starting Directory Server ....... Done.

To see basic server configuration status and configuration you can launch /Users/arno/Downloads/OpenDJ-2.4.4/bin/status

./setup --cli --no-prompt --ldapPort 9389 --ldapsPort 9636 --generateSelfSignedCertificate --enableStartTLS --baseDN o=demo --sampleData 9  --rootUserDN o=i --rootUserPassword p

OpenDS Directory Server 2.2.1
Please wait while the setup program initializes...

Configuring Directory Server ..... Done.
Configuring Certificates ..... Done.
Importing Automatically-Generated Data (9 Entries) ....... Done.
Starting Directory Server ....... Done.

See /var/folders/f5/5bftn0fs2_7_8ct2bqf58h0h0000gn/T/opends-setup-153139788483549661.log for a detailed log of this operation.

To see basic server configuration status and configuration you can launch /Users/arno/Downloads/OpenDS-2.2.1/bin/status