💾 Archived View for gmi.noulin.net › mobileNews › 2761.gmi captured on 2021-12-03 at 14:04:38. Gemini links have been rewritten to link to archived content

View Raw

More Information

➡️ Next capture (2023-01-29)

-=-=-=-=-=-=-

A Plea to Software Vendors from Sysadmins - 10 Do's and Don'ts

What can software vendors do to make the lives of sysadmins a little easier?

Thomas A. Limoncelli, Google

A friend of mine is a grease monkey: the kind of auto enthusiast who rebuilds

engines for fun on a Saturday night. He explained to me that certain brands of

automobiles were designed in ways to make the mechanic's job easier. Others,

however, were designed as if the company had a pact with the aspirin industry

to make sure there are plenty of mechanics with headaches. He said those car

companies hate mechanics. I understood completely because, as a system

administrator, I can tell when software vendors hate me. It shows in their

products.

A panel discussion at CHIMIT (Computer-Human Interaction for Management of

Information Technology) 2009 discussed a number of do's and don'ts for software

vendors looking to make software that is easy to install, maintain, and

upgrade. This article highlights some of the issues uncovered. CHIMIT is a

conference that focuses on computer-human interaction for IT workers the

opposite of most CHI research, which is about the users of the systems that IT

workers maintain. This panel turned the microscope around and gave system

administrators a forum to share how they felt about the speakers who were

analyzing them.

Here are some highlights:

1. DO have a "silent install" option. One panelist recounted automating the

installation of a software package on 2,000 desktop PCs, except for one point

in the installation when a window popped up and the user had to click OK. All

other interactions could be programmatically eliminated through a "defaults

file." Linux/Unix tools such as Puppet and Cfengine should be able to automate

not just installation, but also configuration. Deinstallation procedures should

not delete configuration data, but there should be a "leave no trace" option

that removes everything except user data.

2. DON'T make the administrative interface a GUI. System administrators need a

command-line tool for constructing repeatable processes. Procedures are best

documented by providing commands that we can copy and paste from the procedure

document to the command line. We cannot achieve the same repeatability when the

instructions are: "Checkmark the 3rd and 5th options, but not the 2nd option,

then click OK." Sysadmins do not want a GUI that requires 25 clicks for each

new user. We want to craft the commands to be executed in a text editor or

generate them via Perl, Python, or PowerShell.

3. DO create an API so that the system can be remotely administered. An API

gives us the ability to do things with your product you didn't think of. That's

a good thing. System administrators strive to automate, and automate to thrive.

The right API lets me provision a service automatically as part of the new

employee account creation system. The right API lets me write a chat bot that

hangs out in a chat room to make hourly announcements of system performance.

The right API lets me integrate your product with a USB-controlled toy missile

launcher. Your other customers may be satisfied with a "beep" to get their

attention; I like my way better (http://www.kleargear.com/5004.html).

4. DO have a configuration file that is an ASCII file, not a binary blob. This

way the files can be checked into a source-code control system. When the system

is misconfigured it becomes important to be able to "diff" against previous

versions. If the file can't be uploaded back into the system to re-create the

same configuration, then we can't trust that you're giving us all the data.

This prevents us from cloning configurations for mass deployment or disaster

recovery. If the file can be edited and uploaded back into the system, then we

can automate the creation of configurations. Archives of configuration backups

make for interesting historical analysis.1

5. DO include a clearly defined method to restore all user data, a single

user's data, and individual items (for example, one e-mail message). The method

to make backups is a prerequisite, obviously, but we care primarily about the

restore procedures.

6. DO instrument the system so that we can monitor more than just, "Is it up or

down?" We need to be able to determine latency, capacity, and utilization, and

we need to be able to collect this data. Don't graph it yourself. Let us

collect and analyze the raw data so we can make the "pretty picture" graphs

that our nontechnical management will understand. If you aren't sure what to

instrument, imagine the system being completely overloaded and slow: what

parameters would we need to be able to find and fix the problem?

7. DO tell us about security issues. Announce them publicly. Put them in an RSS

feed. Tell us even if you don't have a fix yet; we need to manage risk. Your PR

department doesn't understand this, and that's OK. It is your job to tell them

to go away.

8. DO use the built-in system logging mechanism (Unix syslog or Windows Event

Logs). This allows us to leverage preexisting tools that collect, centralize,

and search the logs. Similarly, use the operating system's built-in

authentication system and standard I/O systems.

9. DON'T scribble all over the disk. Put binaries in one place, configuration

files in another, data someplace else. That's it. Don't hide a configuration

file in /etc and another one in /var. Don't hide things in \Windows. If

possible, let me choose the path prefix at install time.

10. DO publish documentation electronically on your Web site. It should be

available, linkable, and findable on the Web. If someone blogs about a solution

to a problem, they should be able to link directly to the relevant

documentation. Providing a PDF is painfully counterproductive. Keep all old

versions online. The disaster recovery procedure for a 5-year-old, unsupported,

pathetically outdated installation might hinge on being able to find the manual

for that version on the Web.

Software is not just bits to us. It has a complicated life cycle: procurement,

installation, use, maintenance, upgrades, deinstallation. Often vendors think

only about the use (and some seem to think only about the procurement).

Features that make software more installable, maintainable, and upgradable are

usually afterthoughts. To be done right, these things need to be part of the

design from the beginning, not bolted on later.

Be good to the sysadmins of the world. As one panelist said, "The inability to

rapidly deploy your product affects my ability to rapidly purchase your

products."

I should point that that this topic wasn't the main point of the CHIMIT panel.

It was a very productive tangent. When I suggested that each panelist name his

or her single biggest "don't," I noticed that the entire audience literally

leaned forward in anticipation. I was pleasantly surprised to see software

developers and product managers alike take an interest. Maybe there's hope,

after all.

Q

References

1. Plonka, D., Tack, A. J. 2009. An analysis of network configuration

artifacts. In Proceedings of the 23rd Large Installation System Administration

Conference (November): 79-91.

Acknowledgments

I would like to thank the members of the panel: Daniel Boyd, Google; leen

Frisch, Exponential Consulting and author; Joseph Kern, Delaware Department of

Education; and David Blank-Edelman, Northeastern University and author. I was

the panel organizer and moderator. I would also like to thank readers of my

blog, www.EverythingSysadmin.com, for contributing their suggestions.

LOVE IT, HATE IT? LET US KNOW

feedback@queue.acm.org

Thomas A. Limoncelli is an author, speaker, and system administrator. His books

include The Practice of System and Network Administration (Addison-Wesley) and

Time Management for System Administrators (O'Reilly). He works at Google in New

York City and blogs at http://EverythingSysadmin.com.

2010 ACM 1542-7730/10/1200 $10.00

acmqueue

Originally published in Queue vol. 8, no. 12

see this item in the ACM Digital Library

The article author is also behind The Practice of System and Network

Administration [amazon.com], truly an excellent text into the practicalities of

work in IT.

11: Meaningful error messages (Score:5, Funny)

If you want to make a sysadmin's life easier (as if any programmer ever wants

to do that), you can start by making your error and status messages 1.)

plentiful and 2.) easy to understand. Also, provide several logging levels so

we can drill down as needed, and make sure the logging levels are meaningful.

Too many programmers put just two log levels: one which shows nothing useful,

and another that spews out indecipherable hex dumps of every call it makes.

Face up to the fact that no matter how awesome your software is, it's going to

fail. Not only that, but it's going to fail in ways you never thought possible

at the worst possible times. Make sure we have enough information to figure out

what happened. Otherwise, stuff like this happens:

Program: *crash for no apparent reason*

Sysadmin: Why did you crash?

Program: Because something went wrong.

Sysadmin: What went wrong?

Program: Something.

Sysadmin: I need more detail. Increasing log level.

Program: Something bad went wrong.

Sysadmin: I need more than that. Increasing log level again.

Program: Fuck you. Here's a 16GB hex dump of system memory. Figure it out

yourself jackass.

Sysadmin: *picks up a crowbar and goes off to find the programmer*

That's plain ASCII to you... (Score:5, Insightful)

DO have a configuration file that is an ASCII file, not a binary blob.

And by ASCII we mean something that can be edited by any editor.

XML is the equivalent of a binary blob when you are up to your ass in

alligators trying to get things working again with minimal tools available.

Amendment to #2 (Score:5, Insightful)

Feel free to make a GUI for the administrative interface, but not at the

expense of an underlying CLI.

There are two ways to do this: have your GUI call the CLI when necessary, or

use a common API behind both. Other methods will lead to bitrot in one of the

interfaces, most likely the CLI.

GUIs are fine and even enjoyable to a certain extent, but the author is right

that the CLI takes priority.