The Knowledge Banks
Data warehouses turn nuggets of information into research gold mines, transforming scattered records into useful profiles and analyses. The system saves agencies time, staff and money, but the initial plunge comes with a price.
ike many other government organizations, until recently the Transportation Department found it difficult-if not impossible-to share quality information among the numerous offices and agencies that live beneath its umbrella. One reason is that subordinate agencies like the Federal Aviation Administration and the National Highway Traffic Safety Administration control their own operational systems, data and formatting standards. Although these individual systems were good at collecting, tracking and maintaining important details, they were generally incompatible with other systems and therefore inaccessible to personnel outside the agency.
Senior managers were unable to obtain the strategic, enterprise-wide information they needed to make important decisions. And even when data was available, it often arrived late and in an almost unusable form. For example, agencies usually relayed their monthly budget data in accounting documents whose different codes and numbers required a great deal of translation and analysis.
"At the beginning of this Administration, there was no real management reporting available for financial information," says Richard Gates, a financial specialist in Transportation's Office of the Assistant Secretary for Budgets and Programs. "You could do a lot of work pulling things together by hand and then having someone type that information into spreadsheets, but you could never get a full picture of what was going on in the department in any sort of timely manner."
The solution, DOT officials decided, was to build a data warehouse, a system that uses complicated hardware, software and user tools to pull-actually, borrow-strategic information from disparate operational databases into one large centralized "warehouse" that everyone can access. With all the hard facts in one place, specialized tools can manipulate the data to deliver not just quantitative but qualitative information-trends, reports, analysis, correlations and aberrations-to management.
Once set up, a data warehouse delivers tangible benefits almost immediately, including the ability to do more work with less employees, a reduction in paperwork and access to information that wasn't available previously.
DOT uses its data warehouse to track individual program budgets, determine how much money is obligated to each state and see what grants are issued to which people for what purposes. The data warehouse also generates monthly reports that summarize trends.
"What we've found so far in tracking our data is that there were some things that just came up in the course of a year where we were able to catch problems while they were still small and allocate funds accordingly or get people to cut back in various areas when necessary," Gates says.
Senior Transportation officials are sold on data warehousing, having already developed plans to pull in data from additional sources so agency performance can be measured and budgets formulated.
What is a Data Warehouse?
In many ways, the term "warehouse" in the context of this technology is a misnomer, implying that data is going to be stored away for long periods of time. In reality, it should be thought of more as a mall or mart, because information is transported from out-of-the-way and in some cases forgotten data sources to a place where it can actually be seen, browsed through and utilized. What's more, the shoppers in this store will find not just codes, numbers and raw data, but nicely packaged information.
"The ultimate goal here is to put information as close as possible to the person who needs it or who wants to make use of it," says George Wilson, a computer specialist helping to develop a data warehouse for the Health Care Financing Administration.
Data warehousing is analogous to landscaping a backyard, says Pat Garvey, deputy director of the information management division at the Environmental Protection Agency. "When you're figuring out what kind of look you want for your yard, you don't just look at one corner at a time, you take in the whole view."
As classically defined, a data warehouse involves pulling data from defined sources and integrating that data so that it is no longer simply operational but strategic information. One reason for such a narrow interpretation is that until recently, data warehousing was used almost exclusively in the business world, most notably in retail outlets like Wal-Mart. Corporate managers with an eye on the bottom line utilize the technology to detect consumer buying habits and formulate strategic marketing plans accordingly. For example, by scanning each sale into a data warehouse, grocery stores have determined that men in their 20s who purchase beer on Fridays after work are also likely to buy a pack of diapers. Thus, a display of Pampers or another brand might be set up in the beer aisle, or merchants will put one (but not both) of the products on sale on Friday evenings.
When it comes to the federal government, however, a data warehouse's function is harder to pigeonhole, mainly because the public sector's mission is a lot more diverse than collecting stacks of sales receipts. Certainly, a large number of agencies can benefit by using data warehousing to track trends and help make enterprise-wide decisions, as seen within the Transportation Department, but many agencies intend to use it for purposes that don't really fit the traditional terminology.
Federal Warehouses Grow
Most technology watchers believe data warehousing and government agencies are a perfect pair. That's because data warehouses work best for large organizations with mission-critical data scattered in a variety of incompatible systems-as is the case with most federal agencies. Moreover, because the technology allows distinct but related data to come together in one place, it promises to overcome the government's ill-gotten reputation, says one industry watcher, of "having a right hand that doesn't know what the left is doing." When Social Security checks continue to be mailed out to a recipient who is no longer living, it's because death certificates are recorded by one system and benefits paid out by another. And never the twain shall meet-until now.
Early on, the competitive business world kept a tight, almost proprietary hold on data warehousing, and for some time, the federal government was slow to jump on the bandwagon. Over the last year, however, agencies have begun developing full-blown data warehouses.
The Health Care Financing Administration, for example, plans to use the data warehouse it's developing to help find fraud and abuse in the Medicare program. The Defense Investigative Service uses a data warehouse to assign work to field offices and set priorities for personnel security investigations. The Education Department hopes to use data warehousing to create a one-stop shop for education statistics.
"We in the federal government don't have a bottom line per se," says Gloria Parker, director of the information resources management group at the Education Department. "A big part of our business is collecting and disseminating information and providing service to our constituents."
Parker notes that a prototype has been developed utilizing real data from the National Center on Educational Statistics. If approved by senior officials, she says, the data warehouse will eventually be expanded to include 15 principal offices.
"We want to make it so that the public or anyone else that needs this information doesn't have to run around to a lot of different places looking for it," she says. "In this way, they can access through the Internet or some other source, look at the data, query it, download it, do with it whatever it is that they need to do, but at the same time, be confident that the information they're viewing is complete and correct."
The EPA is already reaping benefits from using a data warehouse, known as Envirofacts. It brings together all the agency's regulatory data in categories like water quality, air quality, solid wastes and hazardous wastes. Users can judge the environmental health of a facility or a location.
"Previously, the public had a hard time determining whether the corporation down the street was a good neighbor or not because the information on any violations were kept in different places," says Garvey. "Now you can go into the system, look under that corporation's name and see their smokestack release, see what chemicals they've put in their landfills, see what pollutants might be emitted in their water."
Setting Up a Data Warehouse
One reason the federal government has been cautious in creating data warehouses has to do with the cost and complexity of building them. Data warehouses cost anywhere from $50,000 to $4 million, depending on how robust the undertaking. Also, human and political complications arise when agencies begin defining data sources and standardizing their data.
The biggest problem lies in the fact that each legacy, or established, system labels the same categories differently. Gender, for example, may be distinguished as "M" or "F," "0" or "1," or some other identifier. A department could be labeled Finance by one system and Department 6 by another. At EPA, "when it came to referring to a regulated entity, our Superfund folks used the word 'site,' our water folks used the word 'facility' and another system used the word 'company,' " says Garvey. "It complicates things considerably."
As with any project, setting up a data warehouse involves comprehensive planning. Steps that should be taken include the following:
- Commitment, understanding and input should be obtained from key people, especially thosewho run the databases and who might feel threatened by this move onto their turf.
- Consider such issues as: What is the business objective? Who will be served? What types of analysis tools will they need in using the data?
- The sources of data required need to be identified. What operational systems, external databases or flat files from some other data source are going to populate the data warehouse?
- The data within the various systems must be "scrubbed" so it is consistent. This is the most laborious and time-consuming aspect of building a data warehouse. Formatting issues like how gender and other simple fields are going to be identified is tough enough, but scrubbing also involves standardizing how statistics are computed and making sure decisions derived from the warehouse are based on the most appropriate data.
"The rule of thumb is that you've got to pick a data warehousing project small enough in size that you can accomplish something meaningful within a year and then slowly expand from there," says Will Workman, director of strategic marketing for decision support in the government division of Oracle Corp., which sells information management software. He says at least one unnamed government organization trying to set up an ambitious data warehouse has spent the last three years struggling with complicated data definitions and scrubbing.
Data warehousing promises a number of advantages to government organizations including decreased operating costs, increased sharing among internal offices and outside agencies, reduced burden on employees and citizens who research information, and faster turnaround for projects.
However, those already setting up data warehouses mention perhaps the most critical benefit. "Data warehousing will help take up the slack as we continue to downsize our personnel," says William Hughes, deputy director of Investigations Control and Automation for the Defense Investigative Service. "People will have information readily accessible and they can do the job they were hired to do rather than spend their time making a bunch of phone calls and running around trying to dig the information up."
NEXT STORY: Linking Supercomputers