Amazon.com vs. INNOPAC

 Writing >

As part of my Master of Science course Library and Information Science: The Role of Research, I had to write a proposal for a research project. I didn't actually have to conduct the experiment, but it needed to be feasible. This paper was the result.

I presented this proposal to the class using this Freelance slideshow. presentation, which visually compares the UIs and shows some of the heuristic calculations. If you want to see the presentation, but don't have Lotus Freelance for Windows, download the free Mobile Screen Show Player. (If you have trouble downloading the player, contact me and I'll get you a copy.)

Please note, this article was written several years ago. I'm sure that in the intervening years, research on the subject has advanced.


Amazon.com vs. INNOPAC:
Which Interface is Easier to Search?

Elisabeth Riba

Simmons College
Graduate School of Library and Information Science
LIS 403
Professor James Baughman
Fall 2000

Abstract:

This study will compare the usability of simple searches between the web interfaces for Amazon.com and the INNOPAC web-based OPAC produced by Innovative Interfaces, Inc. Focusing on able-bodied adult users, the interfaces will be evaluated using two methods. Novice users will participate in direct testing of the software. Expert users will be evaluated through use of GOMS models to predict user behavior.

Introduction:

Amazon.com was founded in 1995 as an online bookseller. Over the last five years it has grown enormously. The company now claims to have over 20 million customer accounts in over 160 countries ( http://www.amazon.com/exec/obidos/subst/misc/company-info.html ), and offers "an aggregate of over 13 million titles in books, music and DVD/video" ( http://yahoo.marketguide.com/mgi/busidesc.asp?nss=yahoo&rt=busidesc&rn=A13EF ). Amazon claims to be "the leading online shopping site" and has become a business to be studied and emulated.

Most users access Amazon.com through a web browser, where they search or browse for books, music or videos in order to obtain more information and evaluate them for possible purchase. This is a similar reason why library patrons use OPACs (online public access catalogs), which are also adopting web interfaces. Although Amazon.com and OPACs serve the same basic function, there are some very fundamental differences in their business and technological models, which may result in different user experiences.

Amazon.com makes much of its money through direct sales. Its users are its customers, which provides Amazon with direct monetary feedback on the success or failure of its interface. Because Amazon.com has centralized control over its web site, it can track user interactions and change its interface more dynamically. (For an amusing example of this, see http://www.amazon.com/exec/obidos/subst/home/all-stores-ballot.html )

Although OPACs are primarily used by library patrons, it is the library staff who makes purchasing decisions. Library staffs have different computing needs from patrons, and may have other priorities in selecting an OPAC beyond ease-of-use, such as support for existing legacy systems. Because OPACs are sold to client libraries, who then customize and maintain them locally, it is more difficult for the manufacturers to detect problems in use or to roll out user interface improvements in a timely manner.

One potential advantage OPAC interfaces may have over Amazon.com would be accessibility. The American Library Association's Code of Ethics has long included a call for "equitable access" and libraries are public accommodations required to obey the Americans with Disabilities Act. Thus, OPACs must be usable by a diverse population, including the elderly and disabled. It is within Amazon.com's economic interests to reach those groups as well, but they have no legal requirements to do so. Matson & Sullivan (2000) found "a weak suggestion that there may be a fundamental relationship between content accessibility and overall usability" but theirs was a preliminary study with further research still needed.

Statement of the Problem:

The purpose of this study is to determine whether behind-the-scenes differences in business and technical models are reflected in the user experience. Is Amazon.com easier to use than a web-based OPAC? Because ease-of-use covers such a large area, this study will focus on searching behavior, which is one of the primary uses of both interfaces.

Innovative Interface's INNOPAC was chosen as the OPAC to compare with Amazon.com. Criteria for OPAC included market share and currency. Testing with an OPAC that was popular but outdated wouldn't be equitable, but testing with the most recent system would be meaningless if few libraries use it.

According to the listing of web-based OPACs on Webcats ( http://www.lights.com/webcats ), more libraries are using INNOPAC, including Simmons, than any other web-based OPAC. INNOPAC shows nearly twice the number of libraries as the runner-up. Also, the Innovative Interface web site ( http://www.iii.com ) focuses on new products and updates for web OPAC use.

Review of Related Research:

Because Amazon.com has been such a successful business, there have been countless studies of its interface, geared towards interface designers and other companies looking to follow the Amazon model.

There are fewer direct reviews of the INNOPAC interface. Some librarians have written about their own experiences adopting or evaluating the program, but there are no generalized reviews of the system.

A few people in the library science field have noted Amazon.com's success and written articles on lessons libraries can learn from Amazon. These include, but are not limited to, the interface design.

Jascó (1998) suggested that Amazon should sell "customized versions of the Amazon software" to libraries with conversion packages. Amazon could offer a "very competitive price" in exchange for a hotlinked Amazon logo that will repeat a patron's library search against the full sale site.

Coffman (1999) recommended enhancing catalog records with more information, such as "cover art, jacket blurbs, selections from the text, links to reviews, customer comments, author interviews and articles, and any other content that would help a person decide whether to request a particular book." He also thought librarians "would want to arrange the records into all kinds of different browsing categories" beyond the current subject index. But Coffman was far more interested in expanding interlibrary loans, offering self-service checkout, revamping collection development, and other ways libraries might learn from and emulate Amazon's success. It was an ambitious idea, and responses to this article can be found in many library journals.

However, as interesting as these ideas are, neither article touches on the actual search process, which is what this study will examine.

Research Design:

Hypothesis:

The primary hypothesis for this study is that Amazon.com is easier to use than an OPAC. Several sub-hypotheses will test this conjecture.

  1. Novice users can complete basic searches faster using Amazon.com than INNOPAC.
  2. Expert users can complete basic searches faster using Amazon.com than INNOPAC.
  3. Novice users will receive correct search results more often using Amazon.com than INNOPAC.
  4. Novice users will be satisfied with the search results more often using Amazon.com than INNOPAC.
  5. Novice users will express a subjective preference for the Amazon.com interface over the INNOPAC interface.

Methodology:

Testing will focus on able-bodied adult users and not deal with the additional problems faced by populations with special needs, such as children, disabled, illiterate, nonnative speakers, and similar groups. This user base will be divided into two groups -- novice and expert users. If similar results can be garnered across the extremes of user experience, this can hopefully be extended to users in the middle range. Different methods will be used to test each group.

Novice users: Novice users will participate in usability testing, which is a standard method used in the computer industry to evaluate user interfaces.

Users will be given a series of tasks they must perform using the two interfaces. Observers will take notes on the user's experience and time how long it takes to complete each task. Users will be encouraged to "think aloud" as they proceed. After each task and at the conclusion of the test, the user will be asked a series of questions to elicit their subjective reactions.

Usability testing is a professional field, and numerous books exist explaining how to conduct tests properly.

Participants will be screened via a questionnaire to determine their level of computer and library experience and ensure demographic diversity. Selected users will be ones who do not use web browsers in the home or as part of their jobs, and should make only occasional use of libraries (monthly or less). Chisman, Diller, & Walbridge (1999) provide a sample screening questionnaire in their article which will be used as a model. From this, approximately a dozen subjects will be selected to take part in the full tests.

Although one dozen users may seem a small sample from a survey perspective, research has shown that it should be sufficient for usability testing. Nielsen & Landauer (1993) studied the number of problems found in usability tests as a function of the number of users testing.

N = 1 - (1 - L) n

N = total number of usability problems

L = proportion of usability problems discovered while testing a single user

n = number of users

Testing with a large number of users is not as effective, because even if all users find the same number of problems, later evaluations tend to have more overlap. Nielsen (1999) noted that if the average tester finds 31% of the usability problems, then testing with only five users will uncover 85% of the problems and ten users will find 97.5% of problems. Likewise, Chisman, Diller, & Walbridge wrote that "both literature and [OCLC Usability Lab Director Mike] Prasse indicated that eight participants would identify 80 percent of the problems users might have with the system." Although these papers examine usability testing to find problems in a piece of software, rather than comparing usability of two different programs, this does demonstrate that this study does not need as large a sample population as suggested by Powell (1997).

Selected users will be scheduled for private, one hour tests in the usability lab (see "Institutional Resources" below).

A "facilitator" will greet all users with a standard opening statement explaining the users' role, and emphasizing that these tests are to evaluate the system, and not them. Users will be presented with a Tester's Bill of Rights based on Arlov (1997) and asked to sign a consent form to allow the test to be recorded. The facilitator will then instruct the users in how to think out loud, and will provide a brief introduction to the computer system and setup.

Before each test, the lab computer will already have a web browser running maximized and open to the main screen of the first interface to be tested. To minimize the risk of context errors, half the users will begin testing with the Amazon.com interface, and the other half will see the INNOPAC interface first.

Every user will be given a folder with each task written on a separate piece of paper. Upon instruction from the facilitator, users will take out the first task, read it aloud, and then do whatever actions they think are necessary to complete the task, stopping when they believe they have succeeded. Reading the task aloud ensures that users understand what they are supposed to do and helps observers to time the task.

A sample task might say "Find information about the book titled X." [Specific authors, titles and subjects will be provided in the actual tests, based upon their availability in the selected library OPAC and Amazon databases.]

The facilitator will allow the user to work uninterrupted. If the user asks for help, the facilitator will only provide neutral responses. Observers will take notes on the user's progress, indicating areas of the process where the user has problems or areas which work particularly well. Users will be instructed to indicate when they think they have successfully completed the tasks

The facilitator will ask the user questions to measure their satisfaction with the process and results. Bopp & Smith (1995) note that user satisfaction does not always depend upon the accuracy of the results, so both must be taken into account. The observers will decide whether the received result was correct, however this will not be conveyed to the users. [Some possible outcomes include users mistakenly giving up too soon, finding the wrong record, not recognizing the correct record as the answer and continuing to search, and so on.]

Then, users will go on to the subsequent tasks in turn. These will include searches by author, subject, author and title, and sorting by date. Each task will be conducted in the same manner.

After five tasks have been completed, the users will switch to the other interface (to Amazon.com if they were previously using INNOPAC, to INNOPAC if they were using Amazon). They will then conduct another five tasks using this interface. These tasks will be identical in function to the previous tasks, but with different authors, titles and subjects.

Users will then return to the first interface for more tasks, reinforcing the existing skills and adding a few more complex elements, such as date limitations. Overall, users will be presented with a total of thirty tasks, split equally between the two interfaces.

After completing all the tasks, the facilitator will ask the user to comment on both interfaces in general, as opposed to looking at each specific task. Users will be asked whether they prefer one interface over the other, and if a preference exists, which one and why. Users will also be given the opportunity to ask questions of their own.

Expert users: Advanced users are more likely to be familiar with one or the other interface, but may not have equal experience with both. Without an unbiased population, user testing may not provide as reliable results. Instead of conducting qualitative testing, therefore, quantitative techniques should suffice.

As users become more familiar with an interface, it can be assumed that they will discover faster ways to use that interface. Calculating the fastest and most efficient searches can simulate what a highly experienced user will encounter.

Card, Moran and Newell (1983) developed GOMS models (Goals, Operators, Methods for achieving the goals, Selection rules for choosing methods) to describe user behavior. This method has also proven reliable as a tool to predict the amount of time required for users to execute those tasks.

Given a task and method of performing that task, the Keystroke-Level Model can be used to predict "the time an expert user will take to execute the task using the system, providing he uses the method without error." In brief, the time it takes to perform a task is the sum of the times to perform the gestures that comprise that task. By means of laboratory experiments, Card et al. determined a set of timings and heuristic rules for different operations: pressing a key (K), pointing with a mouse (P), homing hand(s) on keyboard or mouse (H), mentally preparing for the next step (M), and waiting for system response (R).

Keystroke-level modeling will be applied to every task in the usability tests to determine how to conduct the fastest search in each interface.

Treatment of the data:

Novice users: Once all testing is complete, every task will have been conducted in both interfaces by six users each. For each task/interface combination, researchers can calculate the average time needed for completion, the success ratio, and user satisfaction rating. These can then be compared with the numbers for that same task conducted against the other interface.

Different tasks on the same interface can be compared across time, to judge the learning curve. The earliest tasks are expected to take longest, and there may be bias towards whichever interface the user encounters first, but it should be possible to compare how users times and success rates improve.

At the end of each test session, users are asked to express an overall preference for one system over the other. This will be checked against the users' starting interface to see if there is a bias towards the interface encountered first.

If a clear difference is shown between the two interfaces, observers can use their notes and go back to the videotapes to determine specific areas within the interface that may have helped or hindered users. This qualitative analysis can then be used as a basis for future interface improvements.

Expert users: Once keystroke-level modeling has established an optimal time for each task against each interface, these numbers can be compared to see if either interface is consistently faster than the other.

Institutional Resources:

Lotus Development Corporation, where I work, has a usability lab which would be used to conduct the tests. This room features computers with Internet connections, cameras trained on the user and screen to videotape the tests, and a separate observer's room behind a one-way mirror.

Limitations of the Study:

This study only compares Amazon.com with one Web-based OPAC. This same methodology could be used to compare other OPACs with Amazon.com or with other OPACs. Also, other tests could be conducted, further analyzing search through different means or focusing on other areas of the interface (browsing rather than searching, for example).

There is also room to study other populations besides able-bodied adults, such as children, elderly, disabled, and other groups.

Works Cited:

Arlov, L. (1997). GUI design for dummies. Foster City, CA : IDG Books.

Bopp, R. & Smith, L. (1995). Reference and information services (2nd ed.). Englewood, Colorado : Libraries Unlimited.

Card, S., Moran, T., & Newell, A. (1983). The Psychology of human-computer interaction. Hillsdale, NJ : Lawrence Erlbaum Associates.

Chisman, J., Diller, K., & Walbridge, S. (1999, November). Usability testing: a case study. College & research libraries, 60 (6), 552-569.

Coffman, S. (1999, March). Building Earth's LARGEST library: driving into the future. Searcher, 7 (3), 34-47.

Coffman, S. (1999, July/August). The Response to "Building Earth's largest library." Searcher, 7 (7), 28-32.

Hancock-Beaulieu, M., Robertson, S., & Neilson, C. (1990). Evaluation of online catalogues: an assessment of methods. West Yorkshire, UK : British Library Board.

Jascó, P. (1998, November). If I were Amazon's Jeff Bezos, I would... Information today, 15 (10), 28-29.

Matson, R. & Sullivan, T. (2000, November 16-17). Barriers to use: usability and content accessibility on the web's most popular sites. Proceedings of ACM Conference on Universal Usability. Washington : ACM. 139 - 144.

Nielsen, J., & Landauer, T. (1993, April 24-29). A Mathematical model of the finding of usability problems. Proceedings of ACM INTERCHI '93 conference. Amsterdam : ACM. 206-213.

Nielsen, J. (2000, March 19). Why you only need to test with 5 users. Retrieved from the World Wide Web: http://www.useit.com/alertbox/20000319.html

Powell, R. (1997). Basic research methods for librarians. London: Ablex Publishing

Raskin, J. (2000). The Humane interface. Reading, MA : ACM Press.



Valid HTML 4.01!

Copyright © 2000 - 2004 Elisabeth Riba,
All Rights Reserved