This week’s column is about the nature of the software needed to go with the elections administration hardware laid out in last week’s column [Paul Murphy, “Using Tech To Fix Elections,” LinuxInsider, May 27, 2004]. In brief, the idea was to ignore political reality long enough to imagine a system in which:
- 17-inch Sunray 1g Smart Displays with custom printers are used in voting booths.
- Booths are installed in polling places consolidated to sites in, or within cable reach of, schools with Internet connections that work.
- The electoral commission pays to put (and support) Sun systems into schools and allows them to be used for educational computing (the Java Desktop System, for example) during nonelection periods — thus providing for capable on-site support and proving system connectivity in normal operations while saving schools money and showing school administrators how well systems can work.
- The voting support application runs on the local servers, but captured data is transmitted directly from the Sunray to state- and national-level data accumulators.
Properly set up, such a network could operate partially over the public Internet and yet be self-authenticating in the narrow sense that no new devices could be added without deliberate action by management; missing devices would be guaranteed to be reported to management; communications from outside the group would be recognized as being from outsiders; and all communications between elements on the network would be RSA-encrypted and inaccessible to those unauthorized to view them.
Comparing list prices for the Sun gear to known costs for Diebold and competing products suggests that this architecture will cost about 60 percent less than the PC-based alternatives and will be useful during nonelection periods in which dedicated voting stations just gather dust.
The big advantage of the Sunray for this application is that it is too simple to break. This is difficult to understand if you’re a PC user and assume that the desktop machine has to have an OS and some local software. That assumption is wrong; for all practical purposes, the Sunray doesn’t have any of those things. It’s an information toaster. You can rely on it to make electronic toast and do absolutely nothing else. You get hardware failures, of course, but these have no effect on information security because all of the information processing happens on the servers.
Elegance and Efficiency
Conversely, the big disadvantage of the PC, no matter what OS it’s running or how it’s disguised, as the host for an electronic vote-recording machine is that the auditor can never be sure what else the thing might be doing. Place a perfectly secure process on a PC running BSD — the most secure PC operating system — and you still can’t be sure the thing is really doing nothing because anyone can dream up scenarios, practical or not, in which it is secretly doing something, and there’s no way to prove them wrong. With the Sunray, you can prove it’s not doing anything because there’s nothing there to do anything with.
From a software-design perspective, the overall process should be clearly defined but no single set of software components implementing that process should be anointed as “the one right way.” Instead, the parts should be separated by clearly defined, generally text-only interfaces so that operations are easy to understand and easy to audit while many different people can develop, test and deploy competing modules for each piece. Further, the design philosophy for each piece should be to have just enough code to do what is required — and absolutely nothing more. That provides for elegance and efficiency while ensuring that the code is easy to review and modify if necessary.
The preservation of openness in the process requires that many people be able to observe it in action, raise questions and contribute to solutions. A computer system that handled everything, no matter how wonderfully designed and implemented, would be fundamentally inappropriate because it would take away the voter’s ability to participate directly in both the voting and the control operation.
From the voter’s perspective, the overall process is:
- Eligibility confirmation concludes when the poll official hands the voter either a paper ballot or, if the voter’s eligibility cannot be immediately verified, a paper ballot and an envelope with a number on it.
- That ballot, when placed in the printer input hopper (upside down or backward won’t matter), triggers the Sunray to set up for, and allow execution and recording of, exactly one set of ballot decisions.
- The voter folds the newly printed ballot and deposits it either in the ballot box or, if this is a provisional eligibility vote, in the envelope that is then sealed and deposited in the provisional ballots box. That same process, incidentally, can be applied to absentee voters for later counting and inclusion in totals.
It’s easy to think of the electoral process as proven and well understood, but that’s true only at the local level. Nationally, there is no one standard approach to the key electoral elements: voter registration, balloting and vote counting. Instead, there are on the order of 3,000 local election organizations, and each is responsible for all of the steps, including management of absentee ballots — up to 8 percent nationwide in 2000 — conducting recounts and audits, and where to make the trade-off between keeping useful voting data and protecting the voter’s privacy.
Communication and Authentication
In general, however, the election officer’s view of the process starts with voter registration and ballot preparation. The registration side of this is particularly difficult because the traditional manual means of ensuring that each voter has exactly one opportunity to vote requires generating and delivering accurate voter lists for each poll location as well as voter attendance at the poll where he or she is listed. I don’t discuss that process further here, but it should be obvious that use of a statewide voter database could eliminate most of the risks, costs and complexities associated with eligibility verification while freeing voters from the need to attend a specified — rather than convenient — polling place on or before election day.
The processing behind the ballot recognition, data collection and ballot printing takes place on the local server (not the Sunray) along with transmission to the state accumulator. Nowhere in this process, however, should the local server store that data. One obvious way to handle this is to use ordinary HTML pages embedding some JavaScript to handle direct data submission, via HTTPS, to the state accumulator. That’s the approach I want to adopt for the purposes of this column, although in real life I’d be deeply tempted to look at avoiding browser overheads and risks by using TCL/TK and the Solaris PKI facilities. Others, presumably including most Sun staff, probably would recommend using XML with Java and the J2EE servers.
This approach does create a problem, however, in that a communications or authentication failure exceeding the system’s built-in buffering capacity forces temporary local storage. As a result, the affected precinct should default to hand-counting, while votes arriving at the state accumulator after recovery will get marked as tainted and thus become subject to special review. That could be avoided if local servers normally acted as store-and-forward points.
But that probably creates vulnerabilities to insider attacks — say, by a school board employee with trusted access to multiple servers. In this design, therefore, I’m being conservative and assuming local storage is not allowed — reflecting a set of concerns, by the way, that systems like Diebold’s gloss over, along with the pointlessness of an audit trail that uses the same machine and the same software to store both the ballot results and the control data.
Languages and Interfaces
Obviously, the ability to handle encrypted data transmission without permitting local access is a requirement the selected technology must meet. But two others, although less obvious, are equally important. First, any components that aren’t standard open-source products must be generated automatically using open-source scripting languages like Perl. Second, external interfaces must generally accept and produce plain text. Taken together, these requirements ensure that “anyone can play” because the whole system is auditable and replacement components are easily tested and integrated into the process.
In this case, let’s assume that three separate Perl scripts, presumably called from another script, process a ballot definition provided as plain text by election officials to create, in order, the pages a voter sees, the ballot printer script and the SQL code needed to name and define a table to store unique results from that ballot. All of these would, of course, ensure ballot uniformity and completeness by the automatic inclusion of standard items from larger jurisdictions — that is, from the state and federal levels. All would be available to local officials for test use, but production runs would happen at the state level, have internal authenticators added and then be electronically distributed to the authorized server network in the schools.
Again, several other approaches could work at least as well, but the key is that you can open up these things and see exactly what they do at each step. Thus, tools such as Microsoft Office can be used to generate the input text files but not to process them because there is no practical way to fully audit what goes on during processing.
Breaking Down Barriers
A big part of the hidden complexity here arises because ballots can contain a large number of items, not all of them amenable to simple yes-no answer recording. This is relatively easy to handle on-screen because you just process each ballot question in turn, starting with the national and state includes, to produce appropriate HTML and any needed JavaScript. On the print side, however, there are space constraints, and it’s important to try to make hand-sorting as easy, and therefore error-free, as possible.
The thing that makes it possible to print the results of even very long and complicated ballots on one side of a piece of paper no larger than 8.5 by 14 inches is that you don’t have to print all of the choices, just the ones the voter made. As a result, a two-column format with major results separated into blocks should accommodate something on the order of 160 choices or relative rankings. At the moment, that should more than suffice. In the past, the cost of running an election has combined with falling voter turnout rates to cause ballots to grow in length and complexity, but that will change. When extended to voter-registration management, the system proposed here will reduce barriers to voting, both from the government’s perspective in terms of costs and from the voter’s perspective in terms of convenience, thus allowing simpler, more frequent balloting.
SQL code generation for unique ballot items is more straightforward, although, of course, all choices have to be recorded. At this point, I’m not proposing change to either the registration systems or the traditional means of vote counting; I’m just adding a faster electronic count and substituting a custom-printed ballot for the traditional record of voter decisions. Thus the most fundamental control on the electronic-voting side is the continuance of the traditional hand-counting approach.
Oddly enough, hand-counting, outside of court-ordered recounts, is usually done by machines. Jurisdictions wishing to continue that practice have several choices, including encoding the voter’s choices as a single number to be printed as a barcode on the ballot — very broadly part of what the people at votehere.net recommend — or just using a scanner and OCR to process the ballots. That process in itself offers opportunities for failure (induced or otherwise) that need to be examined but that are not a consequence of the electronic system because their existence is independent of it.
Digital and Analog
The reality of this particular kind of control, by the way, is that attacks like those aimed at getting control of the server or getting access to the encrypted data stream have high costs and limited payoffs because meaningful attacks have to subvert both electronic and manual methods equally. As a result, attacks intended to produce or remove a significant number of votes have to be aimed at the traditional weaknesses in the registration process rather than the voting process. The second generation of this system can therefore be expected to include registration services — thereby further lowering election costs, greatly complicating things for would-be cheaters and probably making it much easier for busy people to vote.
I had thought that asking a sample of voters to verify their ballots between printing and deposit in the ballot box would make sense, but it doesn’t. The primary reason is voter remorse, in that some voters might see this as an opportunity to change their minds and then cause electoral chaos by loudly demanding their rights and denouncing the accuracy of the vote recording process.
The printed ballots will, therefore, be the primary control, itself reliant on the traditional controls in the processes through which ballots are preprinted (top and bottom Magnetic Ink Character Recognition headers on both sides), distributed to polls, given to voters and handled afterward. Fundamentally, the electronically recorded results have to match those first from spot hand-counts and then from any formal counts or recounts.
Process and Necessity
One of the less-considered keys to making this process work is that the technology has to work. As horror stories about clerks unable to boot machines or recover from power outages have made clear, even the simplest systems fall victim to human failures. In this proposal, however, the assumption is that the systems will be installed in schools for use by students and faculty and thus will be well tested in advance and knowledgeably supported during operation. The on-site people would, for example, be the natural choice for running the voter briefings (also known as “training”) required in each polling place while hanging around waiting for technical emergencies that should never happen.
During election days, the infrastructure can be centrally controlled, but that shouldn’t be wholly so during normal runtimes and therefore would only apply partially during the “make sure it works ahead of time” stage preceding the first advance polls.
Getting that control in place will not be easy. It will require extensive implementation assistance, along with administrator training and continued support. Advertising, ageism and blind faith to the contrary, the ability to “support” Windows does not constitute computer science training and more nearly guarantees failure with both Linux and Solaris than it does success.
For example, I’ve personally seen MCSEs do things like add PCs (and switches) to the hub serving a clutch of Sunrays; shove additional network cards in a Sparc server; try to reboot a Sparc server from a Windows 2000 CD; attach a Wintel rack-mount to Unix for document storage; and put an HP N-8400 running Oracle on a 24-hour power timer because they could find no other way to ensure an automatic daily reboot. Turn 70,000 or so local Sun Server-Display combinations over to people whose experience is Wintel, and thousands will not be functional and ready for use when the next election comes up two years later.
From Sun’s Perspective
From Sun’s perspective, however, the long-term value of the opportunity here doesn’t lie in the billion dollars in hardware sales, but in the downstream credibility that comes from doing enough training, hand-holding and management support to ensure success. That would positively influence generations of school kids, their local IT support people and the decision-makers who contribute to the nation’s school and electoral boards.
Thus, getting this right is the most critical piece of the puzzle. Fundamentally, everything else is either off-the-shelf or a few weeks’ work for a couple of qualified people. But Sun’s existing training and support infrastructure simply isn’t up to the job. As a result, a dedicated national operations and support center with maybe 40 to 50 people charged with direct server administration and local support and training would have to be part of the overall package. At about US$200 million for four years, this only adds about 10 percent to the total cost and should fall easily into the range of available discounts.
At the state and national level, operations are reasonably straightforward: Simply load the database using incoming data and issue appropriate reports. To make that happen securely, the state accumulator centers should contain at least two separately administered machines with redundant, independent Internet connections. Ensuring that both machines get all the data without imposing a single point of failure is fairly easy with enterprise servers, whether from Sun or BEA, and not that hard (although kludgey) with Tomcat and related Apache technologies.
The database role — and the two machines probably should use different DBMS products — is simply to accumulate votes in tables created according to the various national and state include files and the locally unique ballots. Such transactions are trivial; decoding the packets will use far more system resources than the DB adds will. Reporting, although far more resource-intensive, is equally straightforward; simple SQL scripts called from the shell can provide official summations, while running totals can be kept for immediate display and for use by news organizations seeking preliminary but unofficial results.
Relative Simplicity
Because of this relative simplicity, almost any database product — including potentially no database product — can be used if timestamps can be reliably turned off rather than just suppressed, something I have never figured out how to do with Oracle. The point of turning this off (and potentially contravening some ill-conceived legislation) is that the combination of the time-stamped voting record with a video record of activity at the polling place could be used to reveal how at least some individuals voted, which threatens the protections offered by the secret ballot.
However, as long as unique ballot items are voted on in more than two fairly busy polls, the simple serialization imposed by the database will provide an adequate control while making it extremely difficult to reveal individual decisions by matching the order in which voters leave polls to the vote record.
The national system, of course, just accumulates from the state systems — probably by copying all applicable state- and national-level data and then generating its own reports.
The controls on this are obvious. Totals have to match across machines and between the electronic and eventual hand-counts.
So that brings us to the bottom line on this idea. This review has been extremely superficial, but it appears that there are lots of opportunities to do things right: no hardware or software showstoppers; the potential for cost savings in the 50 to 60 percent range relative to a Diebold-style alternative; opportunities to deliver increased reliability and ease of use; obvious and valuable future extensions; the ability to greatly increase confidence in the accuracy and timeliness of the reported results; and a chance to deliver giant freebies for the schools involved.
Cool, huh?
Paul Murphy, a LinuxInsider columnist, wrote and published The Unix Guide to Defenestration. Murphy is a 20-year veteran of the IT consulting industry, specializing in Unix and Unix-related management issues.