Data Roundup
n 1994, the Patent and Trademark Office was dotted with what Chief Information Officer Dennis Shaw calls "islands of automation." PTO had data all over the place. Its data storage technology required 12 workers to spend most of their time checking the status of stored data and moving it among various systems and servers.
To lasso all that information and provide better access to patent and trademark records, PTO decided that year to move to a more modern and modular systems architecture. "This means that we manage the infrastructure separate from the applications and manage the data separate from the applications," Shaw says. "Each one of these components has a different life cycle. But the data never wears out."
Moving to a new technology architecture meant PTO had to get a handle on its data. So, in 1996, it decided to create a central data storehouse. But the move away from the older optical storage technology has not been easy, especially with almost 50 terabytes of data-more than the combined contents of every book in the Library of Congress.
As agencies do more business via their computer systems, they are creating ever-larger masses of information. And new forms of data, such as pictures and audio files, require more storage capacity than simple text or numbers. As a result, data storage requirements are growing exponentially.
IT managers must choose wisely among the different models for controlling their data. If they don't, agencies face being crushed by an information avalanche.
The New Data Center
There is some good news amid the data deluge. Unit costs for data storage products are dropping. The cost of a gigabyte of storage has dropped 25 percent since the beginning of 1999, according to Forrester Research Inc., a market research company in Cambridge, Mass. But the amount of data is growing faster than costs are dropping, so total storage expenditures are increasing rapidly. Forrester expects storage spending to grow from 4 percent of IT budgets this year to 17 percent by 2003.
Rising expenditures are accompanied by rising expectations. Users at hundreds or even thousands of individual PCs expect to retrieve data from agency servers instantly, 24 hours a day. Taking servers out of action to upgrade them or back them up can be tough. The problem is compounded by the lack of consistency in where data is stored. With today's fast data communications, a user can store data on his or her own PC, or on networked servers.
Storage accounts for 75 percent of server prices today, according to Forrester. But storage isn't necessarily what servers do best. IT managers have found that if they stop using servers for storage and leave them to do what they do best-processing-managing and safeguarding the data becomes much easier. The next step is bringing all that data to a central storage device. This device should have features that protect important data even if internal components fail. The device also needs high-speed connections so users can get data easily and quickly. Storage management software could also be added into this mix.
Two new storage technologies are helping agencies get a handle on their heterogeneous data. Both are based on redundant array of independent disk (RAID) technology. RAID protects data by spreading it across multiple hard drives so that if any single drive fails, the data is mirrored by the remaining drives.
The first hot storage technology that uses RAID is network-attached storage (NAS). NAS is attached to a local area network (LAN) or wide area network. With NAS, you don't just store your data and forget about it. NAS gives users access to data that is stored centrally and is safeguarded through the use of RAID. But the NAS unit doesn't have to be brought down like a typical server whenever data needs to be backed up. Plus, NAS is easy to install.
"NAS is nothing more than another way of consolidating data," Reardon says. "NAS is really a file server approach to storage." Servers typically provide onboard data processing, and application processing and serving. The difference is that file serving is all NAS does.
Vendors offering NAS include EMC Corp., IBM Corp., Procom Technology Inc. and Quantum Corp. PTO's is one of the most unique NAS implementations in government because it switched storage technology from optical drives to a RAID with more than 50 terabytes of capacity.
The PTO's information must be available at all times for patent and trademark examiners who have research quotas. PTO examiners research applications for patents and trademarks on various types of intellectual property. PTO stores 30 million patent documents on its computer system. That includes all U.S. patent documents back to 1790 plus collections, as well as European and Japanese patents. These files contain images, such as engineering drawings and chemical formulas.
Once PTO put its research data on NAS "we saw the average search time go from 28 to 9 seconds," Shaw says. That translates into improved productivity for examiners. The move from optical storage to NAS also enabled Shaw to redirect about a dozen people to work on other IT duties. NAS system operation requires fewer, but more highly skilled, people, he says.
Right behind NAS in the race to replace old-fashioned data storage methods are storage area networks (SANs). Robert Gray, a storage systems analyst at International Data Corp., a market research company in Framingham, Mass., says the SAN market will grow from $3.7 billion this year to $11.4 billion in 2002.
A SAN features a central, free-standing set of RAID disk drives. This central unit, or cluster of units, is connected to network servers via Fibre Channel-a high-speed data connection.
A SAN uses software to monitor and allocate data. "The SAN concept removes the requirement of a storage device being dedicated to a particular environment or server," says Kevin Reardon, director of strategy for IBM's Technology Group. "The concept moves storage as an element of the network. The network now includes servers, clients and storage devices." SANs move data to a more prominent position in the enterprise.
Prices in the Fibre Channel and SAN market are dropping at 40 percent per year, says Gray. And as the technology has matured, prices have come down into the range where agencies are willing to take chances. SAN vendors with federal business include EMC Corp., IBM Corp. and Xiotech Corp.
While SANs are promising, there are concerns about this emerging technology. "I'm more concerned with making Fibre Channel work before I pursue a SAN," says PTO's Shaw. Standards still are being hashed out to enable Fibre Channel, SAN technology and storage management software to work with products from multiple vendors.
On top of those potential problems is the fact that SANs are not easy to install. Connecting the SAN to the network and ensuring access to it from all nodes is not easy. "SANs are written about more than actually done," says Shaw.
Nevertheless, some experts are dazzled by SANs' promise. "IT executives have lost control of storage," says Gray. "SANs give them back a modicum of control." The government has a number of early SAN adopters.
One is the Library of Congress, which this fall began installing a SAN system produced by EMC Corp. The SAN should be up and running in the spring of 2000.
The library stores 5 million digitized items on the Web. These include Civil War era photographs and even a picture of the first ever telegraph message by Samuel Morse, which said, "What hath GOD wrought?" A picture or manuscript is digitized by the use of scanning technology. The library also has analog audio files that have been converted into digital format for storage and online access.
"Whenever you get into the digitizing of analog materials, you're talking about a massive amount of storage-the files get big really big," says Judy Stork Kittleman, deputy director of information technology at the library. "One of the major things that has driven our move to a SAN is the exponential growth of our storage requirements."
When Kittleman started working at the library 12 years ago, she recalls, "the entire computer room was jammed full of disk drives for old mainframe-attached storage devices. Altogether, it amounted to half a terabyte of storage." Now, the library has over 60 terabytes of RAID disk storage in its data center and it takes up just 10 percent of space the old mainframe storage occupied.
Until now, all the library's data storage had to be dedicated to a specific server, either a PC-style server running Microsoft Corp.'s Windows NT operating system, or a Unix computer from IBM Corp. Now that data is being centralized in an EMC Corp. SAN, data from the Unix and the NT computers can be saved and retrieved on the same storage system. "We're hoping for much more operational efficiency," says Kittleman.
"Data access speed is one main reason for doing this SAN [implementation]," says Dwight Beeson, chief of the library's Systems Engineering Group. "The centralized management of our storage that this SAN provides is also important." Beeson says the sharing of data between unlike servers is important.
Managing Data
Government IT managers and storage vendors stress the importance of managing data. Though PTO and the Library of Congress have taken different data-storage paths, the roads lead to the same place. Both have chosen to centralize data, thereby reducing the staffing needed to administer storage, provide better data redundancy, and enable uninterrupted user access to data.
PTO and the library have massive data management requirements, but even smaller agencies are centralizing their data.
The Agriculture Department's National Agriculture Statistics Service (NASS) had six servers, each with self-contained disk drives and a tape backup system. But the servers were 5 years old, and both memory and disk space were exhausted. To perform data backups, IT managers had to take the servers out of service. This meant 350 NASS workers who were creating reports on agricultural production didn't always have access to their data.
So NASS bought two new servers for application processing and shifted all data over to a small SAN. "We needed a new solution, and one of our desires was to go to central storage," says Dave Losh, a NASS computer specialist. "We wanted data redundancy and the ability to expand as our needs changed." NASS now uses 200 gigabytes of storage but has room to grow.
NASS' SAN has also changed the computer staff. "We have less people and there is less maintenance," Losh says.
The Interior Department's Fish and Wildlife Service (FWS) consolidated its five servers into a single production server and a backup server and installed a SAN. "The servers' lives are three years, [but] the [SAN's] is five to 10 years," says Floyd Child, LAN manager at FWS.
FWS workers use computer-aided design and geographic information systems, which require substantial data. The information "has to be actively available," says Patricia Percy, FWS computer specialist. "The SAN is the heart and soul of our 200-person office," Child says. "The data on the SAN is manipulated everyday."
"We are trying to reduce administration costs," Child says. "We're going from five administrators to one or two."
Agencies are seeing the rebirth of the data center. But it's a smarter data center that takes up less room. While managing storage is getting easier and may require fewer people, the task will never go away.