September 14, 2011

Directory and File System ... differences, similarities

As is commonly the case with many technologies, it seems LDAP products suffer from a really bad image. They are perceived as obsolete servers that are both convoluted, inadequate as general purpose stores and instead are purpose built for certain "niche" use-cases because the data layout is hierarchical and thus, weird. File systems are almost always hierarchical but somehow do not suffer from the same perception.

Let me try to draw a parallel between a file system -you know, this place where you trustfully organize all your work documents, your family pictures and other music files- and a good old LDAP Directory Server.

  • the structure is the same: it's a tree!

Above, Microsoft Windows folders (on NTFS)
Below, LDAP entries (UnboundID Directory Server

  • items can be manipulated the same way!

  • in LDAP, every object can have children. That is, it's like every file could also be a folder.
  • in LDAP, every object is characterized by a class. It is like the file type, except that a class can inherit characteristics from a parent class. Imagine that a WORD 2007 document inherited characteristics that are common to other documents like say, a revision number. Now another document, say an Excel spreadsheet could also have a revision number even though the word and excel documents contents are very different in nature, they share some characteristics that can be described in a common "structure". That's in essence what the hierarchy of object classes achieve.
  • In a file system, files can be journaled or revisioned. I don't know of any LDAP server supporting this as-is but LDAP servers usually have some sort of a changelog that can keep track of data changes for some time. This usually allows strong replication, resolution of conflicts and repairing most administrative errors with respect to data handling. Think of it as an integrated time machine.
  • LDAP allows extra mechanisms than a file system does, a big one is a strong authentication system that has effectively made Directory servers prime candidates for ... Operating system (and thus file system) authentication
  • LDAP supports grouping mechanisms
This is obviously not completely exhaustive but at least it gives you an idea of the similarities between the contents of an LDAP server and those of a file system and how to manipulate them: pretty much the same thing, just called different.

September 12, 2011

Counters done right.

Terry has posted two articles on assertions and "increment"  that I thought were good stepping stones to show you how a Directory Server may solve some tough problems better than you think: how to concurrently keep track of counters in your applications.

Let's take a simple application:
Get a value, decrement it, save it back. Simple. It works. Until ....
Shoot. We lost 1. This type of concurrency issue is well know to developers who have to concurrently keep track of sessions for example. The proposed solution is locking. Here's the issue with locking:

Why the warning sign? Well, locking works. Fine actually. But the client waits. In a lot of cases you do not even have much of a choice. But for counters, you do, and most likely you must. Here's an alternative:
OK, I know, the server-side decrement isn't exactly prime news and has been around for a while. No surprise! It's way easier from the application side and leaves the developer to solely focus on business "value-added" logic. So why is it not used more? I don't know. Maybe because not very many people know to ask the right question, so they never get the right answer?
With MySQL or other relational databases you usually have the ability to so with a pseudo syntax like:
UPDATE plan SET minutes=minutes-1 WHERE subscriber_id=(555)123-4567

Let's make that even better with some LDAP. First, let's take a look at how to decrement the "minutesLeft" counter stored in our user.19 profile:
dn: uid=user.19,ou=people,dc=example,dc=com
changetype: modify
increment: minutesLeft
minutesLeft: -1

So, functionally, our application will attempt to decrement the minutesLeft counter and depending on whether or not it succeeds it will proceed to let the user use his/her plan for a minute. Additionally, we'll make sure that there are indeed minutesLeft on the plan for the decrement to be successful. That's where the assertion comes in the picture. The --assertionFilter can be used on the CLI tool to manually test it. In your client, the UnboundID LDAP SDK provides full programmatic control.

This is how it looks when it is successful (there ARE minutes left!)
C:\UnboundID-DS\bat>ldapmodify -a -c --assertionFilter "(minutesLeft>=1)" --postReadAttributes minutesLeft -f Decrement.ldif
# Processing MODIFY request for uid=user.19,ou=people,dc=example,dc=com
# MODIFY operation successful for DN uid=user.19,ou=people,dc=example,dc=com
# Target entry after the operation:
# dn: uid=user.19,ou=People,dc=example,dc=com
# minutesleft: 2

Note that you can request the value before and/or after (only after in this example) to use the write operation as a read as well.
When our subscriber has depleted his plan completely, the server will return:
C:\UnboundID-DS\bat>ldapmodify -a -c --assertionFilter "(minutesLeft>=1)" --postReadAttributes minutesLeft -f Decrement.ldif

# Processing MODIFY request for uid=user.19,ou=people,dc=example,dc=com
MODIFY operation failed
Result Code:  122 (Assertion Failed)
Diagnostic Message:  Entry uid=user.19,ou=people,dc=example,dc=com cannot be modified because the request contained an LDAP assertion control and the associated filter did not match the contents of the that entry

So there you have it, the most elegant, concurrent-friendly way to keep track of your counters and avoid server round-trips to keep user experience nice thanks to constant low-latency requests.

September 9, 2011

Sync speed ... part 2

In the earlier post about Sync performance I had only tested on a small machine (namely, my trusty old laptop) simply to prove -or disprove- the experiment was valid.
I only recently got around to test on something more realistic although still not up to date. A Dell r610, two socket Intel Xeon E5520 @ 2.27GHz.

But let's look at our test setup:
In essence, on the source side (left), the test extension to the server simulates a database that has 1,000 new changes every time the server polls it. And the server is configured to poll every 1ms. On the destination side, the test extension gets the changes and returns immediately without doing anything with.

This setup has 2 main advantages:

  • it rules out network latency to isolate just the box the Synchronization Server runs on
  • it eliminates any latency due to either the source or the destination
That is, to date, the best way to test the absolute best performance the Synchronization server (or any piece of software, really) can achieve on particular rig.

I'm going to cut to the chase, since this is part 2 of the series, and show you 2 things:

and the less sexy capture:
[root@r610-02 UnboundID-Sync]# ./get-pipes-throughput
getting first measurement for all started sync pipes...
first measurement acquired. Reading: 3430978
Waiting for 10 seconds...
Getting second measurement...
Second measurement acquired. Reading: 4315672
884694 operations processed in 10 seconds
Throughput 88469 Sync/sec

So there you go, we have 8 physical cores on this platform, and the rule of thumb is that you can get about 10,000  transactions per second per core with sync (11,058 in this case). Note too that Sync scales really well vertically and is able to take advantage of all CPUs on the machine: there is only 7.74% CPU idle overall on this machine at the time these metrics were taken!

In reality, there is going to be a few things in the way to be able to achieve such numbers:
  • Network latency, which can hit you either on the source side, the destination side or both if your Sync server is collocated with neither
  • Source latency. For example, if you query an RDBMS source, there will be an inherent lag to the source engine processing the request and serving the results to the sync engine.
  • Destination latency. Same as the source except the destination is actually written to, which can take an even longer amount of time.
In Part 3, I will get to how we deal with these hurdles and what you can tune to help keep the synchronization as fast as possible.

September 8, 2011

1 JeeLink and 2 JeeNodes go to a bar ...

I find myself in an interesting situation where I have one house near Denver and one apartment in Steamboat Springs. Our family is going to live in the apartment the whole year with an occasional trip down to Denver every once in a while. To be able to keep an eye on things and reduce the power footprint while retaining as much comfort as possible when we go back, I set out to automate a few things in the house. I had looked around for things and had tinkered with PIC MCUs in college but I had never done much with them except a water fountain game for the kids located under our deck. More on that later.

One of the main issues is cost of course, as with most tinkerers, this is not serious enough that I would want to invest in a commercial solution or expensive electronics platforms. I looked at the arduino platform because it's obviously very popular and so modular that a 4 year-old could put something together without knowing anything about electronics. Unfortunately, I did not find anything "arduino" that met another one of my requirements: wireless simplicity.

Enter JeeLabs.

I stumbled upon the work of Jean-Claude Wippler (widely known simply as jcw) and the JeeLabs community at large, which counts lots of very active and really helpful members. What I really like about this community is that they are not going to belittle the new comer who knows little about the platform but au contraire, they will teach you how to fish instead. This has been one of my most agreeable and educative experience to date.

The merits of the JeeLabs platform are many but let me name a few that really helped me get things done instead of spending countless hours figuring things out:

  • It is somewhat "standard" as it builds on the arduino strengths and makes things even easier
  • Arduino libraries are functional right out of the box, you only need to know pin numbers are shifted on your JeeNode compared to the Arduino
  • It has an ULTRA simple, fairly reliable and pretty good range radio module making your setup instantly accessible over wireless. THAT alone is awesome. The radio module it sports isn't as reliable as other more expensive solutions are (XBee comes to mind) but it is absolutely good enough for most applications around the house. To draw a parallel, I would use this one: over IP networks, the same difference exists between TCP and UDP. TCP is reliable. UDP is not. It does not mean that UDP is UNRELIABLE. Get the nuance?
  • It is rock bottom cheap. You can get a JeeNode on ModernDevice for $22. That's with radio. C'mon. How's that even possible...
  • JeeLabs makes lots of simple yet useful "plugs", so there very little you will actually need to do yourself. It's more like Lego than electronics really, there is no glory in putting something that works together, they have already done all the work of making it easy. Literally plug and play.
  • There's also a nice JeeLink which is nothing more than a JeeNode in a USB stick format. That thing is sweet! Stick it in your USB port, start talking wirelessly to the other nodes or write your Perl/python/ script/program to drive all the nodes on your network.
Here's what I had in mind for my particular situtation: I needed to be able to control the heater / cooler in my house to be able to turn the heat up a couple of days before going home so we wouldn't find ourselves sleeping in a house at 5C (40F).

So I set out on an experiment and bought a JeeLink and JeeNode with 2 relay plugs to see if I could make a JeeNode turn the heat/cool/fan on remotely.
Unsurprisingly, the hardware part of it took all of 2 hours to solder the components on the boards and debug the lousy soldering points by resoldering a couple of times.
What was more surprising is that even though I had not written any C since 2004 (so 7 years give or take) it was very easy to find examples that I could tweak to do what I needed. So in about a day's worth of work, I had a way to remotely control the HVAC.

But I needed to make this whole contraption a wee bit smarter so it could replace the regular thermostat for good. I bought a second JeeNode that I rigged with 3 temperature sensors:
  1. indoor temperature: the sensor sits right on the JeeNode board. This temperature is used to trigger the HVAC in the appropriate mode.
  2. outside temperature: the sensor sits on a window sill outside. This temperature is used mostly for monitoring purpose and it allows the central software to be a little smarter than the regular unit by avoiding, for example, turning the AC on in summer if the outside temperature drops below the target temperature, which frequently happens at night in summer in Colorado.
  3. duct temperature: the sensor is tucked in the air vent where it can measure the output air temperature of the HVAC unit. This is helpful as a feedback mechanism to make sure that the heater or cooler actually work when turned on. If it doesn't, the central software will send me an SMS with Twilio.
All in all, it took me an entire week, working on it off hours, after work or on the 2 week-ends, and the most time was actually spent on the software part, I had to contribute fixes to java libraries for Pachube where I put my temperature metrics and figure out some timing conditions using RXTX but overall it was a great learning experience.

So: what will you do with your JeeNodes?