A team of Penn State and Iowa State researchers has tested and rated three “smart” classification methods capable of detecting the telltale patterns of entry and misuse left by the typical computer network intruder.They found that one, called “rough sets,” currently overlooked by the industry, is the best.
The researchers report that computer security breaches have risen significantly in the last three years. In February 2000, Yahoo, Amazon, E-Bay, Datek and E-Trade were shut down due to denial-of-service attacks on their web servers.
The U.S. General Accounting Office (GAO) reports that about 250,000 break-ins into Federal computer systems were attempted in one year and 64 percent were successful. The number of attacks is doubling every year and the GAO estimates that only one to four percent of these attacks will be detected and only about one percent will be reported.
Dr. Chao-Hsien Chu, associate professor of information sciences and technology and of management science and information systems at Penn State, began the study when he was on the faculty at Iowa State University.
The results were published in the current issue (Vol: 32, No. 4) of the journal, Decision Sciences. His Iowa State co-authors are Dr. Dan Zhu, assistant professor of management information systems, and Dr. G. Premkumar, associate professor of management information systems, and Xiaoning Zhang, Chu’s former master’s student.
“No network security system or firewall can ever be completely foolproof,” Chu says. “So there is always a need for a ‘watchdog’ to patrol the network and signal when an intrusion occurs. Commercially available ‘watchdog’ systems depend on traditional statistical techniques. However, the newer ‘smart’ methods promise to have a significant impact on accuracy.”
Even the cleverest intruder leaves electronic footprints on breaking and entering a secure computer data network such as bank, medical or credit records. The new “smart” methods can collect information from a variety of sources within the network, “learn” the patterns typical of a perpetrator trying to gain a level of control similar to that of the people who legitimately operate the network, and make a reasoned prediction about whether the pattern represents intrusion or not.
The team focused on three “smart” approaches, known as data mining techniques, namely: neural nets, inductive learning and rough sets. All three data mining techniques can collect information, “learn” and make reasoned predictions.
Neural nets and inductive learning have previously been used in intrusion detection and research by others has found these methods to be successful and effective. Chu notes that rough sets, a relatively new approach, has not been applied to intrusion detection.
The researchers say their study is the first to evaluate and compare multiple data mining methods, including rough sets, in the intrusion detection context.
The researchers report that the rough sets method does not require any preliminary or additional information about the data and can work with missing values and less expensive or alternative sets of measurements. The method can work with imprecise values where a pair of lower and upper approximations replaces imprecise or uncertain data.
It is also able to discover important facts hidden in the data and express them in the natural language of decision rules. A powerful method for characterizing complex multidimensional patterns, rough sets has been successfully applied in knowledge acquisition, forecasting and predictive modeling, and decision support.
In their study, the team used data from the privileged program - sendmail, a program in use in virtually every Unix site that has email. They write, “The data includes both normal and abnormal traces. The normal trace is a trace of the sendmail daemon and several invocations of the sendmail program. During the period of collecting these traces, there are no intrusions or any suspicious activities happening. The abnormal traces contain several traces including intrusions that exploit well-known problems in Unix systems.”
The average classification accuracy rate for the three programs was as follows: rough sets 75.68 percent accurate; neural nets 69.78 percent accurate; and inductive learning 51.16 percent accurate.
In addition, the team found that training the programs on equal amounts of normal and abnormal sequences leads to better learning and a more accurate classification. Whether the data was represented as binaries or as integers (neural nets cannot use both), did not significantly affect performance.
They conclude, “The tremendous growth in the Internet and electronic commerce has created serious challenges to network security. Advances in data mining and knowledge discovery provide new approaches to network intrusion detection.”
[Contact: Dr. Chao-Hsien Chu, A'ndrea Elyse Messer]
07-Mar-2002