bad ideas in usability

At my new company I unfortunately have to deal with Active Directory. I understand that AD is supposed to be the authoritative source for any information about users, groups, computers, and so on, but does the interface have to be so crammed with junk?

This has got to be the worst interface I’ve ever seen (Lotus Notes aside, but I’ve never had to administer Notes). It’s not clear where to find anything! Not only is the interface kludgy (multiple rows of tabs?) but the tab labels are totally non-intuitive. Why are there at least four tabs pertaining to e-mail (Microsoft Exchange)? What the heck is the Member Of tab for, and how does that differ from what I might find under Account?

I can’t imagine trying to administer hundreds of users with this kludgy tool. Thank God our company is only < 50 people.

home router replaced!

I finally decided to replace my FreeBSD-based Sun Ultra 10-based home router. There were a couple of reasons for this:

  1. I was running FreeBSD 5.x, which meant that the keyboard wouldn’t work — I could only control the system remotely over SSH or through a serial console. This was fixed in later versions of FreeBSD 5.x but I didn’t want to bother upgrading, since the box isn’t the fastest machine
  2. Using a desktop workstation for routing and running ppp consumes more power than it’s worth, and makes a fair amount of noise
  3. Using an 400 MHz UltraSparc III-based workstation with 512 MB of ECC RAM for a simple firewall and router seemed like a bit of overkill ๐Ÿ™‚
  4. I want to free up the Ultra 10 for testing out Solaris 10 and possibly upgrading my Solaris 9 SCSA designation.
  5. I want to (finally!) equip my home with wireless… yes, I’m a little late getting on the bandwagon.

Continue reading

The Design and Implementation of the NetBSD rc.d System

This is a moderately old paper, but I think it’s worth reading if you want to understand the rationale behind the NetBSD rc.d startup system. I think this is what is referred to on FreeBSD (which has adopted a similar mechanism) as rcNG.

The Design and Implementation of the NetBSD rc.d system

There are many things to like in this design, which is far better than the organic (to put it politely) way in which the system startup sequence of a given Linux box has evolved. For one, it has the following advantages (outlined in the paper, but I’ll detail them here if you don’t want to read it):

  • Independence from lexicographical ordering of filenames (no S90foo running before S91foo), which always struck me as having a sort of BASIC-style limitation (i.e. back in the day having to number your code lines in multiples of ten in case you wanted to insert code in between)
  • Use of dynamic dependency ordering (via a special header and the rcorder script)
  • No reliance upon a special platform-specific "function" library, as is the case in many Linuxes
  • Centralized system configuration via /etc/rc.conf — no bloated /etc/sysconfig nonsense as on many Linuxes (but this is a topic for another day)
  • Avoidance of mandatory runlevels, which I can never remember on a given Linux or Solaris machine. ("What is runlevel 5 again?")

I could go on, but I urge you to read the paper instead, where Luke demonstrates a solid design methodology and rationale and then executes on the same. This is more than can be said for Linux.

Unskilled and Unware of It: How Difficulties in Recognizing One’s Own Incompetence Lead to Inflated Self-Assessments

The title of this post is also the title of a fabulous paper published in the Journal of Personality and Social Psychology of the American Psychological Association (PDF). I mention this in the context of technology because the paper was first mentioned as a response to this post on The Daily WTF, a site exposing bad programming in a daily blog format.

First, in regards to the post — I can vouch for the fact that there is some really bad code out there, and much of that, I’m sure, comes from programmers with overinflated egos who don’t realize their own incompetence because

people who are unskilled in these domains suffer a dual burden: Not only do these people reach erroneous conclusions and make unfortunate choices, but their incompetence robs them of the metacognitive ability to realize it.

Still, part of the problem is a lack of proper management oversight — whether it be functional management, or technical management. Indeed, in many cases, bad programmers’ incompetence is rewarded because their products are seen to be "business successes" because they allegedly meet the functional requirements — never mind the fact that the applications consume far too many resources on the system, crash all the time, and cause a huge maintenance burden for the operations staff. I can provide many examples, but I’m sure I would get in trouble ๐Ÿ™‚

I considered printing out the APA paper and anonymously stuffing it in peoples’ mailboxes — not only in the mailboxes of those programmers who I feel are totally incompetent, but also in the mailboxes of their managers who still think they perform(ed) well. I decided against it not because I think I would get in trouble — they’d have no way of detecting who was the culprit — but again because their own incompetence would prevent them from detecting that the paper is targetted at them.

As Kruger and Dunning point out, the only way to resolve this dilemma is to remove the incompetence — train the bad programmers to be better programmers, and to recognize their own shortcomings. That can’t happen if bad management is preventing even the open discussion of the poor code quality.


Today’s my last day at CBC.ca. I’m moving on to a pure systems administration position with a much smaller e-business company in Toronto called Devlin e-Business Architects. I decided that working on content delivery projects like the Torino Olympics website is really not where I want to be strategically with my career, and I don’t think I ever fit into the big company mindset very well. I’ll be writing more about that once I’m not formally under the employ of said big company ๐Ÿ™‚

In the meantime I wish you all a very happy holidays and new year!

Java Virtual Machine Tuning under JVM 1.4.2

Here’s an article I wrote about tuning Sun Java JRE 1.4.2 some time ago. I’m only posting it now to save it from loss when I leave CBC.ca.

This page is intended to document some proposals and empirical data gathered while attempting to tune the JVM used for running web applications on CBC.ca’s Java servers.

Topics to be covered:

  • Impact of using different garbage collectors
  • Impact of tuning garbage collectors
  • Maximum and minimum heap size settings
  • [potentially] Impact of using different JVMs other than the Sun JVM. For example, compiling Java code into native OS code using gcj? among others.

Continue reading

Broadcom NetXtreme issues part 2

Here’s a follow-up to my previous post about the Broadcom BCM570x Gig-E adapters on HP-DL380 servers. HP pointed us to the following advisory:

Advisory: Primary Port of Integrated NC7782 Gigabit Server Adapters with NFS protocol with Certain Firmware Versions Stops Transmitting under Linux, Resulting in Lost Network Connectivity

However, reading the advisory indicates that the problem only afflicts the primary port of the Ethernet adapter. We’ve been seeing problems on the secondary port, as well as an add-on card.

This has been raised with HP, so we’ll see what they say.

Broadcom NetXtreme Gigabit Ethernet adapter problems

Recently we’ve been seeing a lot of error messages while using the Broadcom BCM570x series of Gigabit Ethernet adapters under SUSE Linux Enterprise Server 9. The symptoms are that the interface will simply hang under high traffic and refuse to pass more packets, eventually giving the error:


Dec 1 01:17:46 dev03 kernel: NETDEV WATCHDOG: eth2: transmit timed out
Dec 1 01:17:46 dev03 kernel: tg3: eth2: transmit timed out, resetting
Dec 1 01:17:46 dev03 kernel: tg3: tg3_stop_block timed out, ofs=2c00 enable_bit=2
Dec 1 01:17:46 dev03 kernel: tg3: tg3_stop_block timed out, ofs=3400 enable_bit=2
Dec 1 01:17:46 dev03 kernel: tg3: tg3_stop_block timed out, ofs=2400 enable_bit=2
Dec 1 01:17:46 dev03 kernel: tg3: tg3_stop_block timed out, ofs=1800 enable_bit=2
Dec 1 01:17:46 dev03 kernel: tg3: tg3_stop_block timed out, ofs=4800 enable_bit=2

It’s become a very serious issue for us because we have Broadcom BCM570x controllers on board all of our HP-DL380 servers. The problem seems to occur more frequently now that we’ve upgraded an SP2 (and beyond) SLES9 kernel, although we have had problems dating back several months with older kernels.

Doing some research on the Internet, I’ve found that this is a very common problem out in the field. In a summary document I prepared to management, I wrote the following:

Other customers in the field have reported the same problems running RedHat Enterprise Server 3, Debian GNU Linux, FreeBSD/NetBSD and even Novell Netware (internal communication with Novell PSE). In many of the reported incidents, customers were running identical server hardware (HP/Compaq Proliant DL-3×0 series) to CBC.ca. [HP IT Resource Centre thread #898761 where customers have reported issues with a variety of HP hardware and operating systems.]

There are a number of root causes to the problem including Linux driver instability (the Tigon3 (tg3) driver was created by reverse-engineering the Broadcom bcm5700 driver due to the low quality of the latter) and manufacturing defects (manufacturing defects with some Broadcom 5704 chips afflicted Sun’s initial customer shipment of Sun Fire V210 and V240 servers in 2003 leading to Sun Alert #55620 The impact of such defects beyond Sun is unclear because Broadcom refused to provide further details.)

Right now, we’re awaiting feedback from HP and Novell on how they plan to resolve this issue. In the meantime, we’re going to stockpile some Intel Gigabit Ethernet cards.

operating systems that hold your hand too much…

I’m all in favour of making an operating system like Linux easy-to-use. Linux’s popularity means that for many users it is the only exposure to a UNIX-like operating system that they are likely to see, and that’s why it’s important to give them the best first impression of UNIX so that they’re not turned off by it. This includes being standards-compliant and introducing as few distribution-specific hacks as possible.

I bring this up in the context of shell aliases. Today I was alarmed to see the following set by default for all users on a a SUSE Linux Enterprise Server 9 system:


alias +='pushd .'
alias -='popd'
alias ..='cd ..'
alias ...='cd ../..'
alias beep='echo -en "07"'
alias dir='ls -l'
alias l='ls -alF'
alias la='ls -la'
alias ll='ls -l'
alias ls='/bin/ls $LS_OPTIONS'
alias ls-l='ls -l'
alias md='mkdir -p'
alias o='less'
alias rd='rmdir'
alias rehash='hash -r'
alias unmount='echo "Error: Try the command: umount" 1>&2; false'
alias which='type -p'
alias you='yast2 online_update'

I get very alarmed when I see default behaviour set like this. There are a number of issues with this:

  1. It misleads new users by making them believe the behaviour of “ls” and other commands is different than what the actual default behaviour is.
  2. It introduces a set of commands to the user (e.g. “rehash”) that don’t really exist in the shell, leading to confusion if the user goes to use another UNIX machine without these aliases.
  3. It misleads users into believing that some DOS commands also exist in the Bash shell (e.g. “rd” or “md”). Rather than encouraging them to learn the correct commands, these aliases provide a crutch to the user that they are unlikely to discard. They may then use this incorrect information when describing procedures to other users. This would particularly be disastrous in an interview type situation (e.g. “Q: What is the correct command to make a directory under UNIX?”)

All of these aliases are unnecessary and imply that the personal shell alias preferences of SUSE developers are being imposed upon all users.

I would like this to serve as a call to all distribution vendors, SUSE particularly, to not ship Linux with unnecessary customizations that only serve to confuse users and introduce disparity between Linux distributions where none originally existed.

publishing free/busy information in Evolution

Ximian Evolution can publish Free/Busy information by using WebDAV, but this doesn’t seem to be documented anywhere I could find. Here’s what I did to set it up:

  • Set up a WebDAV-compliant webserver. I installed mod_dav for Apache 1.3.x.
  • Configure DAV properly, and make sure that the directory you are enabling DAV for is writable by the webserver user.
  • Configure Evolution. Select Tools > Settings, then Free/Busy Publishing. Click Add URL and in Publishing Location type in http://your-server-name/your-location/. Don’t forget to supply the username and password you set up for DAV.
  • You’ll get no diagnostics from Evolution when the publishing occurs, so you’ll have to check the webserver logs to see if it succeeded or failed.