Tel: 706-542-3265 (ask for customer support)
Fax: 706-542-4807
*Status: product. Relatively inexpensive
*Updated: 1994
*Name: Knowledge Seeker
*Description: A data-mining tool that extracts multiple cause-and-effect
relationships from a data set and displays them interactively as a graphic
decision tree. It is designed for the analysis of industrial-strength,
"real-world" data sets. Price $899 (Windows) and $799 (DOS)
*Source: AI Expert, April 1994
*Platform(s): Windows, DOS
*Contact: Angoss Software, 430 King St., W., Suite 201, Toronto M5V 1J5,
Canada, (416) 593-1122, fax (416) 593-5077
*Status: product.
*Updated: 1994-03-21
*Name: MLC++
*Description: A Machine Learning Library in C++.
A library of C++ classes and tools for supervised classification
learning.
While MLC++ provides general learning algorithms that can be used by
end users, the main objective is to provide researchers and experts
with a wide variety of tools that can accelerate algorithm
development, increase software reliability, provide comparison tools,
and display information visually.
More than just a collection of existing algorithms, MLC++ is an
attempt to extract commonalities of algorithms and decompose them for
a unified view that is simple, coherent, and extensible.
Here is
full information
on MLC++
*Induction algorithm: Decision trees, decision graphs, decision tables,
nearest-neighbors (Instance-based methods), naive bayes,
perceptron, winnow.
*Other tools: Accuracy estimation (holdout, cross-validation, bootstrap),
feature subset selection (wrapper),
discretization algorithms (binning, entropy, 1R).
*Source: Ron Kohavi.
*Platform(s): Unix, Sun (ObjectCenter C++). Could be ported to
other unix machines/compilers, but requires good template support.
*Contact:
Ronny Kohavi ronnyk@CS.Stanford.edu
*Status: public domain (object code), source code available to
selected sites.
*Updated by: Ron Kohavi on 1995-02-23
*Name: OC1
*Description: multivariate decision tree induction system
*Comments: OC1 (Oblique Classifier 1) is a multivariate decision tree
induction system designed for applications where the instances have
numeric feature values. OC1 builds decision trees that contain linear
combinations of one or more attributes at each internal node; these
trees then partition the space of examples with both oblique and
axis-parallel hyperplanes. OC1 has been used for classification of
data from several real world domains, such as astronomy and cancer
diagnosis. A technical decription of the algorithm can be found in
the AAAI-93 paper by Sreerama K. Murthy, Simon Kasif, Steven Salzberg
and Richard Beigel. A postscript version of this paper is provided
with the package.
OC1 is a written entirely in ANSI C. It incorporates a number of
features intended to support flexible experimentation on real and
artificial data sets. We have provided support for cross-validation
experiments, generation of artificial data, and graphical display of
data sets and decision trees. The OC1 software allows the user to
create both standard, axis-parallel decision trees and oblique
(multivariate) trees.
*Platform(s):
*Contact:
TO OBTAIN OC1 BY ANONYMOUS FTP
The latest version of OC1 is available free of charge, and may be
obtained via anonymous FTP from the Department of Computer Science at
Johns Hopkins University.
To obtain a copy of OC1, click here , or
type the following commands:
UNIX_prompt> ftp blaze.cs.jhu.edu
[Note: the Internet address of blaze.cs.jhu.edu is 128.220.13.50]
Name: anonymous
Password: [enter your email address]
ftp> bin
ftp> cd pub/oc1
ftp> get oc1.tar.Z
[This announcement is also contained in pub/oc1.]
ftp> bye
[Place the file oc1.tar.Z in a convenient subdirectory.]
UNIX_prompt> uncompress oc1.tar.Z
UNIX_prompt> tar -xf oc1.tar
[Read the file "README", to get cues to other documentation files, and
to run the programs.]
If you have any comments, questions or suggestions, please contact
Sreerama K. Murthy or
Steven Salzberg or
Simon Kasif
Department of Computer Science
The Johns Hopkins University
Baltimore, MD 21218
Email: murthy@cs.jhu.edu (primary contact)
salzberg@cs.jhu.edu
kasif@cs.jhu.edu
OC1 IS INTENDED FOR NON-COMMERCIAL PURPOSES ONLY. OC1 may be used,
copied, and modified freely for this purpose. Any commercial use of
OC1 is strictly prohibited without the express written consent of
Sreerama K. Murthy, Simon Kasif, and Steven Salzberg, at the
Department of Computer Science, Johns Hopkins University.
*Status: public domain, prototype
*Updated: Tue, 12 Oct 93 from salzberg@blaze.cs.jhu.EDU
*Name: SE-Learn
*Description: An SE-tree-based induction and classification tool.
Set Enumeration (SE) trees provide the basis for an induction and
classification framework which generalizes decision trees. In this
framework, called SE-Learn, rather than splitting according to a
single attribute, one recursively branches on all (or most) relevant
attributes. A single SE-tree economically embeds many decision trees,
supporting a more expressive representation. SE-Learn benefits from
many techniques developed for decision trees, e.g.,
attribute-selection and pruning measures. In particular, SE-Learn can
be tailored to start off with anyone's favorite decision tree, and
then improve upon it via further exploring the SE-tree. This
hill-climbing algorithm allows trading time/space for added accuracy.
Current studies show that SE-trees are particularly advantageous in
domains where (relatively) few examples are available for training,
and in noisy domains. Finally, SE-trees provide a unified framework
for combining induced knowledge with knowledge available from other
sources.
- Rymon, R.
(1993), An SE-tree-based Characterization of the Induction
Problem. In Proc. of the Tenth International Conference on Machine
Learning, Amherst MA, pp. 268-275.
- Rymon, R.
(1994), On Kernel Rules and Prime Implicants. Proc. of the Twelfth
National Conference on Artificial Intelligence, Seattle WA,
pp. 181-186.
- Rymon, R.
(unpublished), Where Do SE-trees Perform? (Part I).
A LISP implementation of SE-Learn, not to be used for any commercial
purpose, is freely available from Ron Rymon. It includes a choice of
exploration policies and resolution criteria, as well as hill-climbing
from common decision trees: GINIindex (CART), Information Measure and
Gain Ratio (ID3, C4.5), and Chi Square statistic (ChAID). An enhanced
C version is currently under development by Modeling Labs.
*Discovery methods: Classification
*Platform(s):
- LISP version available in source code. Successfully used with
Lucid, and Allegro dialects.
- Enhanced C version will work on Unix and DOS/Windows platforms.
*Contact: Ron Rymon (Rymon@ISP.Pitt.edu)
*Name: XPERTRule
*Description: Inductive Rule Learning,
*Comments:
*Platform(s): Windows ?
*Contact: CINCOM Systems, 1-800-543-3010
*Status: product.
*Updated: 1994-03-15
*Name: @Brain
*Description: Neural network tool
*Platform(s): DOS, Windows ?
*Status: product.
*Contact: Talon Development, 414-962-7246
*Name: 4Thought
*Description: neural net based tool to make predictions in financial and
marketing environments. Intended for data-knowledgeable users, who know
little or nothing about neural nets.
*Comments: Price/performance ratio appears poor (6500 british pounds),
but the reviewer describes it as "fun to use". The only neural net used
is a basic MLP with one or two hidden layers, and either a small or a
large learning step size. No other parameters can be set, and also the
export of graphical / model information from 4Thought is very restricted.
*Source: Review of `4Thought' / by A. Harvey and S. Toulson. -
International Journal of Forecasting (Amsterdam) 10 (1994.06) nr.1
p.35-41 (7 refs)
*Platform(s): Windows, fast 486 PC, preferably math-coprocessor
*Contact: Right Information Systems Ltd, 9 Westminster Palace Gardens,
Artillery Row, London SW1P1RL, UK
*Status: product
*Updated by: Sandra Oudshoff on 1994-10-20
*Name: AIM
*Description: A modeling tool that uses abductive modeling technology to
learn relationships from a database of examples. Uses 1-, 2, and
3-dimensional polynomials.
*Platform(s): Windows, DOS
*Contact: Abtech Corp., 508 Dale Ave, Charlottesville, VA,
22903, (804) 977-0686, fax (804) 977-9615
*Status: product.
*Updated: 1994-03-15
*Name: BrainMaker
*Description: tool for training backprop neural nets
*Discovery methods: Neural Networks (back-prop)
*Comments:
BrainMaker package includes:
The book Introduction to Neural Networks
BrainMaker Users Guide and reference manual
300 pages , fully indexed, with tutorials, and sample networks
Netmaker
Netmaker makes building and training Neural Networks easy, by
importing and automatically creating BrainMaker's Neural Network
files. Netmaker imports Lotus, Excel, dBase, and ASCII files.
BrainMaker
Full menu and dialog box interface, runs Backprop at 750,000 cps
on a 33Mhz 486.
*Source: comp.ai.neural-nets FAQ
*Platform(s): DOS, Windows, Mac
*Contact:
Company: California Scientific Software
Address: 10024 Newtown rd, Nevada City, CA, 95959 USA
Phone: 800-284-8112 or 916 478 9040
Tech Support: 916 478 9035
Fax: 916 478 9041
Email: calsci!mittmann@gvgpsa.gvg.tek.com (flakey connection)
*Status: product.
*Updated: 1994-10-27 by GPS
*Name: MATLAB Neural Network Toolbox
*Description: a complete engineering environment for neural network
research, design, and simulation. Offers over fifteen proven network
architectures and learning rules.
*Discovery methods: Classification
*Comments: Includes backpropagation, perceptron, linear, recurrent,
associative, and self-organizing networks. Fully open and customizable.
*Source: Product ad in PC AI, Nov/Dec 1994
*Platform(s): PCs, Macs, and Workstations
*Contact: The Math Works, 24 Prime Park Way, Natick, MA 01760-1500
fax: 508-653-6284, e-mail: info@mathworks.com.
A very nice WWW page is at
http://www.mathworks.com
*Status: product
*Updated: 1994-12-29 by Gregory Piatetsky-Shapiro
*Name:ModelWare
*Description:
*Comments: proprietary modeling algorithm
*Platform(s): Windows, DOS, ?
*Contact: Teranet IA, 800-663-8611
*Status: product
*Updated: 1993
*Name: N-train
*Description: statistical and Neural network tool
*Comments:
*Platform(s): Windows, DOS, ?
*Contact: Scientific Consultant Services, 516-696-3333
*Status: product
*Updated: 1993
==== Classification: Rule Discovery Approach
*Name: Datalogic/R
*Description: Software for data mining and decision support
using a rough set-based system for knowledge discovery,
predictive modeling, and reasoning.
*Platform(s): MS-DOS
*Contact: contact: Reduct Systems, Regina, Canada. (306) 586-9408,
fax (306) 586 9442.
*Status: product.
*Updated: 1994-03-15
*Name: Data Surveyor
*Description:
Data Surveyor
is a data mining tool for the discovery of
strategic relevant information from large databases.
*Discovery methods: Induction of classification rules
*Source: information by author
*Comment: uses a separate front and back-end. The front end directs the
mining process. The back-end is a fast, parallel, main memory
database server, which performs all massive data handling.
*Platform(s): Back-end currently runs on (parallel) Unix systems,
front-end runs on Unix workstations and MS-Windows.
*Contact:
Marcel Holsheimer
CWI, P.O. Box 94079
1090 GB Amsterdam
The Netherlands.
E-mail: marcel@cwi.nl
tel. +31-20-592 4134, fax +31-20-592 4199,
*Status: product
*Updated by: Marcel Holsheimer on 1994-12-20
*Name: IDIS
*Description:
IDIS is the Information Discovery System that analyzes databases by
itself and discovers patterns and rules. IDIS automatically decides
what to look at, generates hypotheses, discovers hidden and unexpected
patterns, rules of knowledge, graphs and anomalies. The results are
displayed within a hypermedia environment.
IDIS examines databases with a set of built-in data analysis
algorithms that automatically form hypotheses about what is relevant.
It then tests the hypotheses to generate interesting and unexpected
rules and graphs that characterize the database. The automatic
hypotheses formation and testing cycle continues until important rules
and patterns emerge.
IDIS pre-analyzes large databases to discover the important graphs to
be displayed. The analyses of IDIS may be focused by the user towards
a specific task or IDIS may be set to roam freely through the
database. The system outputs rules and graphs which characterize
data. Both numeric and non-numeric data values are shown in two and
three dimensional hypermedia graphs.
*Comments:
IDIS works on several databases such as Oracle, Sybase, etc. It works
both in client server and stand alone models. IDIS has discovered
more rules in more applications than any other program. Discoveries
have been made by IDIS in many areas such as point-of-sales data,
quality control, finance, banking, the petroleum industry,
agriculture, science, business forecasting, forest fire prevention,
chemical structure identification, securities trading, crime detection
and medical diagnosis, among others.
*Source: IntelligenceWare
*Platform(s): Windows, DOS, Unix, various SMP and MPP system.
*Contact:
IntelligenceWare
5933 West Century Blvd
Los Angeles, CA 90045
tel. (310) 216-6177, fax (310) 417-8897
*Status: Commercial product since 1991. Previously the IXL system.
IDIS is the new system.
*Updated: 1994-07-22
*Name: PQ->R
*Description: A program for Computer Aided Induction of general rules
(if ..
then) from cases. Functions: 1) automatic inductive classification of a
minimal chain of independent variables that would predict a user selected
dependent variable 2) interactive construction of hypotheses based on a
lookahead facility
*Comments: Can handle only up to 1200 cases and 50 variables with up to 15
attributes each
*Source: Sandra Oudshoff
*Platform(s): DOS, minimum 286 with 500K free program memory
*Contact: Finite Epistemics, Passeerderstraat 76, 1016 XZ
Amsterdam, Holland, tel +31 20 624 7137
*Status: product
*Updated by: Sandra Oudshoff on 1994-07-27
==== Classification: Genetic Algorithm approach
*Name: GAAF
*Description: GAAF is a genetic algorithm based tool for the approximation
of mathematical formulae out of raw data. These formulae then capture the
relationship within these data. GAAF overcomes the problems of neural
network or statistical approximation methods in several ways. It can
generate a symbolic representation for any kind of function, including
discontinuous ones. Overfitting is avoided by the ability to separate raw
data in a development and a validation sample and by implementing several
powerful statistical robustness tests. By representing the generated models
in a mathematical, simple to understand form the generated models are easy
to explain or analyse.
*Comments:
*Source: Product folder
*Platform(s): High performance IBM PC compatible machine under Windows 3.1
*Contact: Cap Volmac, Division Service Development, Dolderseweg 2, 3712 Huis
ter Heide, Holland, tel +31 3404 35411, fax +31 3404 31174
*Status: product
*Updated by: Sandra Oudshoff on 1994-07-27
*Name: FUGA
*Description: FUGA, Financial modelling Using Genetic Algorithms, is a
financial modelling tool based on the GAAF toolbox . FUGA allows for the
development of models in divergent financial domains such as credit scoring,
risk management and product marketing. Additional functionalities allow
financial operators to speed up the search process and it has reporting
facilities targeted at financial managers.
*Comments:
*Source: product folder
*Platform(s): High performance IBM PC compatible machine under Windows 3.1
*Contact: Cap Volmac, Division Service Development, Dolderseweg 2, 3712 Huis
ter Heide, Holland, tel +31 3404 35411, fax +31 3404 31174
*Status: product
*Updated by: Sandra Oudshoff on 1994-07-27
==== Classification: Nearest Neighbour
*Name: PEBLS
*Description: PEBLS is a nearest-neighbor learning system designed for
applications where the instances have symbolic feature values. PEBLS
has been applied to the prediction of protein secondary structure and
to the identification of DNA promoter sequences. A technical
description appears in the article by Cost and Salzberg, Machine
Learning journal 10:1 (1993).
*Comments:
Version 3.0 incorporates a
number of additions to version 2.1 (released in 1993) and to the
original PEBLS described in the paper:
S. Cost and S. Salzberg. A Weighted Nearest Neighbor
Algorithm for Learning with Symbolic Features,
Machine Learning, 10:1, 57-78 (1993).
PEBLS 3.0 now makes it possible to draw more comparisons between
nearest-neighbor and probabilistic approaches to machine learning, by
incorporating a capability for tracking statistics for Bayesian
inferences. The system can thus serve to show specifically where
nearest-neighbor and Bayesian methods differ. The system is also able
to perform tests using simple distance metrics (overlap, Euclidean,
Manhattan) for baseline comparisons. Research along these lines was
described in the following paper:
J. Rachlin, S. Kasif, S. Salzberg, and D. Aha. Towards a Better
Understanding of Memory-Based and Bayesian Classifiers. {\it
Proceedings of the Eleventh International Conference on Machine
Learning} (pp. 242-250). New Brunswick, NJ, July 1994, Morgan
Kaufmann Publishers.
*Source: ML list
*Platform(s): PEBLS 3.0 is written entirely in ANSI C.
It is thus capable of running on a wide range of platforms.
*Contact:
TO OBTAIN PEBLS BY ANONYMOUS FTP
________________________________
The latest version of PEBLS is available free of charge, and may
be obtained via anonymous FTP from the Johns Hopkins University
Computer Science Department.
To obtain a copy of PEBLS, type the following commands:
UNIX_prompt> ftp blaze.cs.jhu.edu
[Note: the Internet address of blaze.cs.jhu.edu is 128.220.13.50]
Name: anonymous
Password: [enter your email address]
ftp> bin
ftp> cd pub/pebls
ftp> get pebls.tar.Z
ftp> bye
[Place the file pebls.tar.Z in a convenient subdirectory.]
UNIX_prompt> uncompress pebls.tar.Z
UNIX_prompt> tar -xf pebls.tar
[Read the files "README" and "pebls_3.doc"]
For further information, contact:
Prof. Steven Salzberg
Department of Computer Science
Johns Hopkins University
Baltimore, Maryland 21218
Email: salzberg@cs.jhu.edu
*Status:
PEBLS 3.0 IS INTENDED FOR RESEARCH AND EDUCATIONAL PURPOSES ONLY.
PEBLS 3.0 may be used, copied, and modified freely for this purpose.
Any commercial or for-profit use of PEBLS 3.0 is strictly prohibited
without the express written consent of Prof. Steven Salzberg,
Department of Computer Science, The Johns Hopkins University.
*Updated by: GPS on 1994-10-20
==== Classification: Other approaches
*Name: Clementine
*Description: Based on a visual programming interface which links data
access, manipulation and visualisation together with machine learning
(decision tree induction and neural networks).
Trained rules and networks can be exported as C source code.
Uses a graphical 'building block' approach to develop
applications. Underlying technologies include decision tree induction
and neural networks.
*Source: Product brochure.
*Platform(s): Sun, DEC, HP, SG
*Contact:
Colin Shearer (colin@isl.co.uk), Tom Khabaza (tomk@isl.co.uk)
Integral Solutions Ltd, 3, Campbell Court, Bramley,
Basingstoke RG26 5EG, UK
Phone: +44 1256 882028 Fax: +44 1256 882182
*Status: product
*Updated by: Tom Khabaza, 1994/09/20
*Name: DISCOVER-IT
*Description: ?
*Platform(s): DOS, Windows ?
*Contact: SourceCode Inc, 800-294-5840
*Status: product.
*Updated: 1993
*Name: HCV
*Description: representative of the extension matrix approach
based family of attribute-based induction algorithms, originating with
J.R. Hong's AE1.
By dividing the positive examples (PE) of a specific concept in a
given example set into intersecting groups and adopting a set of
strategies to find a heuristic conjunctive rule in each group which
covers all the group's positive examples and none of the negative
examples (NE), the HCV algorithm can find a rule in the form of
variable-valued logic for the concept based on PE against NE in
low-order polynomial time. If there exists at least one conjunctive
rule in a given training example set for PE against NE, the rule
produced by the HCV algorithm must be a conjunctive one. The rules in
variable-valued logic generated by the HCV algorithm have been shown
empirically to be more compact than the decision trees or their
equivalent decision rules produced by the ID3 algorithm (the
best-known induction algorithm to date) and its successors in terms of
the numbers of conjunctive rules and conjunctions.
The term ``HCV'' in this description indicates the current
implementation (Version 1.0) of the HCV algorithm in SICStus Prolog
which runs on SUN3, SPARC and DEC workstations. In this
implementation, HCV can classify more than 2 classes of examples by
incorporating the AQ technique developed in the
generalization-specialization strategy based family of induction
algorithms. It takes a set of pre-classified training examples
(vectors of attribute-values) as its input and produces a set of rules
as its output classifying the training examples. It also allows you to
evaluate the rules' accuracy in terms of a set of pre-classified
testing examples.
*Comments: To use the program, you must prepare your training and testing
examples in the form of ASCII files in a fixed format. During the
program execution, all you need to do is provide your file names. All
outputs and some intermediate results are given on the screen and
stored in your own-specified file.
References:
X. Wu, HCV User's Manual (Release 1.0 June 1992), DAI Technical Paper
No. 9, 30 pp., Department of Artificial Intelligence, University of
Edinburgh, 1992.
X. Wu, The HCV Induction Algorithm, Proceedings of the 21st ACM
Computer Science Conference, S.C. Kwasny and J.F. Buck (Eds.), ACM
Press, U.S.A., 1993, 168--175.
*Platform(s): It runs on SUN3, SPARC and DEC workstations under Unix or
Ultrix with SICStus Prolog, and PCs under the DOS environment.
*Contact: Xindong Wu, Dept of Computer Science, James Cook University,
Townsville, Australia Qld 4812
Email: xindong@cs.jcu.edu.au.
To get the full manual, click here.
*Status: HCV (Version 1.0) is available electronically for academic use at no
cost, and for commercial use by arrangement with the author.
*Updated: 4/94
*Name: Information Harvesting
*Description: A tool for rule discovery in databases ?
*Comments: shown at AAAI-93, but was very expensive (~ $40,000)
*Platform(s): ?
*Contact: Ryan Corp., 53 Wall Street, Fifth Floor, New York, NY 10005
212-858-7730
*Status: product.
*Updated: 1993
*Name: NEXTRA
*Description: Tool for knowledge acquisition for an expert system.
Can synthesize rules from user preferences. Nice graphical abilities.
*Comments: ?
*Platform(s): Mac, ?
*Contact: Neuron Data, 156 University Ave., Palo Alto, CA 94301.
1-800-876-4900
*Status: product
*Updated: 1993
=== Deviation Detection
*Name: EXPLORA
*Description: An interactive system for discovery of interesting patterns
in databases.
*Comment: See also papers in KDD book, 1991, KDD-91 and KDD-93 proceedings.
*Platform(s): Mac
*Contact: Willi Kloesgen, GMD, D-53757 Sankt Augustin, e-mail: kloesgen@gmd.de
If you are interested in using the system, you can get it via anonymous
ftp from ftp.gmd.de in directory gmd/explora: Open a connection to
"ftp.gmd.de" or 129.26.8.90, and transfer the file "Explora.sit.hqx" from the
directory "gmd/explora". The file "README" informs about the
installation of Explora. An user manual is included.
*Status: public domain, prototype
*Updated: 1993
=== Dependency Derivation
*Name: TETRAD II
*Description: A multi-module program that assists in the construction of
causal explanations for sample data and their use in prediction. With
continuous variables the program will aid in the search for "path
models" or "structural equation models;" with discrete data the program
will construct and update a Bayes network from sample data and user
knowledge of the domain; the program includes Monte Carlo facililities.
*Comment: Proofs of the asymptotic correctness of all but one of the
search modules are available in P. Spirtes, C. Glymour and R. Scheines,
Causation, Prediction and Search, Springer Lecture Notes in Statistics,
1993. Should be available as of September 1, 1994.
*Source: C. Glymour
*Platform(s): DOS, Unix
*Contact: Erlbaum Statistical Software.
The system is available along with a book
Richard Scheines, Peter Spirtes, Clark Glymour, and Christopher Meek.
TETRAD II: Tools for Discovery.
Lawrence Erlbaum Associates, Hillsdale, NJ, 1994.
*Status: product
*Updated: 1994-07-22
=== Clustering
*Name: AUTOCLASS
*Description: AutoClass is an unsupervised Bayesian classification
system for independent data. It seeks a maximum posterior probability
classification.
*Comment:
AUTOCLASS III - AUTOMATIC CLASS DISCOVERY FROM DATA
( NASA Ames Research Center )
The program AUTOCLASS III, Automatic Class Discovery from Data, uses
Bayesian probability theory to provide a simple and extensible approach to
problems such as classification and general mixture separation. Its the-
oretical basis is free from ad hoc quantities, and in particular free of
any measures which alter the data to suit the needs of the program. As a re-
sult, the elementary classification model used lends itself easily to ex-
tensions.
The standard approach to classification in much of artificial intelli-
gence and statistical pattern recognition research involves partitioning
of the data into separate subsets, known as classes. AUTOCLASS III uses the
Bayesian approach in which classes are described by probability distribu-
tions over the attributes of the objects, specified by a model function and
its parameters. The calculation of the probability of each object's mem-
bership in each class provides a more intuitive classification than abso-
lute partitioning techniques.
AUTOCLASS III is applicable to most data sets consisting of indepen-
dent instances, each described by a fixed length vector of attribute val-
ues. An attribute value may be a number, one of a set of attribute specific
symbols, or omitted. The user specifies a class probability distribution
function by associating attribute sets with supplied likelihood function
terms. AUTOCLASS then searches in the space of class numbers and parameters
for the maximally probable combination. It returns the set of class prob-
ability function parameters, and the class membership probabilities for
each data instance.
LANGUAGE: LISP
MACHINE REQUIREMENTS: MACHINE INDEPENDENT
PROGRAM SIZE: APPROXIMATELY 18,256 SOURCE STATEMENTS
DISTRIBUTION MEDIA: .25 inch Tape Cartridge in TAR Format
PROGRAM NUMBER: ARC-13180
DOMESTIC - DOCUMENTATION PRICE: $21.00 PROGRAM PRICE: $900.00
INTERNATIONAL - DOCUMENTATION PRICE: $42.00 PROGRAM PRICE: $1800.00
REFERENCES
P. Cheeseman, et al. "Autoclass: A Bayesian Classification System",
Proceedings of the Fifth International Conference on Machine Learning,
pp. 54-64, Ann Arbor, MI. June 12-14 1988.
P. Cheeseman, et al. "Bayesian Classification", Proceedings of the
Seventh National Conference of Artificial Intelligence (AAAI-88),
pp. 607-611, St. Paul, MN. August 22-26, 1988.
J. Goebel, et al. "A Bayesian Classification of the IRAS LRS Atlas",
Astron. Astrophys. 222, L5-L8 (1989).
P. Cheeseman, et al. "Automatic Classification of Spectra from the Infrared
Astronomical Satellite (IRAS)", NASA Reference Publication 1217 (1989)
P. Cheeseman, "On Finding the Most Probable Model", Computational Models
of Discovery and Theory Formation, ed. by Jeff Shrager and Pat Langley,
Morgan Kaufman, Palo Alto, 1990, pp. 73-96.
R. Hanson, J. Stutz, P. Cheeseman, "Bayesian Classification with
Correlation and Inheritance", Proceedings of 12th International Joint
Conference on Artificial Intelligence, Sydney, Australia. August 24-30,
1991.
*Platform(s): Common Lisp on Unix and Mac.
AUTOCLASS III, ARC-13180, is written in Common Lisp, and is
designed to be platform independent. This program has been
successfully run on Symbolics and Explorer Lisp machines. It has
been successfully used with the following implementations of
Common LISP on the Sun: Franz Allegro CL, Lucid Common Lisp, and
Austin Kyoto Common Lisp and similar UNIX platforms; under the
Lucid Common Lisp implementations on VAX/VMS v5.4, VAX/Ultrix
v4.1, and MIPS/Ultrix v4, rev. 179; and on the Macintosh personal
computer. The minimum Macintosh required is the IIci. This
program will not run under CMU Common Lisp or VAX/VMS DEC Common
Lisp. A minimum of 8Mb of RAM is required for Macintosh
platforms and 16Mb for workstations. The standard distribution
medium for this program is a .25 inch streaming magnetic tape
cartridge in UNIX tar format. It is also available on a 3.5 inch
diskette in UNIX tar format and a 3.5 inch diskette in Macintosh
format. An electronic copy of the documentation is included on
the distribution medium. Domestic pricing is $900 for the program,
and $21 for the documentation -- there is a 50% educational discount.
International pricing is $1800 for the program, and $42 for the
documentation -- there is *no* educational discount.
Sun is a trademark of Sun Microsystems, Inc. UNIX is a registered
trademark of AT&T Bell Laboratories. DEC, VAX, VMS, and ULTRIX are trade-
marks of Digital Equipment Corporation. Macintosh is a trademark of Apple
Computer, Inc. Allegro CL is a registered trademark of Franz, Inc.
COSMIC, and the COSMIC logo are registered trademarks of the National
Aeronautics and Space Administration. All other brands and product names
are the trademarks of their respective holders.
*Contact:
AutoClass III is the official released implementation of AutoClass
available from COSMIC (NASA's software distribution agency):
COSMIC
University of Georgia
382 East Broad Street
Athens, GA 30602 USA
voice: (706) 542-3265 fax: (706) 542-4807
telex: 41- 190 UGA IRC ATHENS
e-mail: cosmic@@uga.bitnet or service@@cossack.cosmic.uga.edu
*Status: product
*Updated: 1994
*Name: COBWEB/3
*Description: A portable
implementation of an algorithm for data clustering and incremental
concept formation
(
long description here).
*Source:
COSMIC's PROGRAM CATALOG.
*Platform(s): PC with 16 MB RAM
*Contact: cobweb@ptolemy.arc.nasa.gov or
COSMIC
University of Georgia
382 East Broad Street
Athens, GA 30602 USA
voice: (706) 542-3265 fax: (706) 542-4807
telex: 41- 190 UGA IRC ATHENS
e-mail: cosmic@@uga.bitnet or service@@cossack.cosmic.uga.edu
*Status: product
*Updated by: GPS, 1994-07
*Name: DataEngine
*Description: Uses fuzzy systems, neural nets and their combination.
Applications are developed by graphically linking together function blocks
(similar to Clementine). Includes:
a) fuzzy clustering methods (uses Fuzzy C-Means or FCM)
b) rule-based fuzzy methods
c) neural nets (back prop and kohonen)
d) fuzzy neuro methods
e) signal processing module
f) basic module (stat and math functions, spreadsheet data editor)
*Source: AI Watch article, Feb 94
*Platform(s): PC with 16 MB RAM
*Contact: Management Intelligenter Technologien GmbH, Aachen, Germany,
Tel: +49 2408-94 580, Fax: +49 2408-94 582
Price: basic module (DM 2498), signal processing module (DM 598), other
module (DM 998), complete package (DM 7000)
*Status: product
*Updated by: Hing-Yan Lee, hingyan@iti.gov.sg, 1994-06-20
*Name: SDISCOVER
*Description: This tool discovers regular expression style
motifs in each family among a set of families of sequences.
This will soon be extended to trees.
We use the edit distance for sequences as the measure of similarity
and a variety of distance measures including edit
distance, alignment distance and top-down edit distance in the tree case.
On protein data (SWISS-PROT),
we have shown the motifs to be significant by using them successfully
as classifiers.
*Discovery methods: Generate and Test, Clustering, Sampling.
*Comments: The tool works by encoding the sequences into
a suffix tree and traversing the tree to generate candidate motifs.
We then evaluate the activity of a candidate motif
by comparing it with the sequences in the family.
*Source: from the authors
*Platform(s): Windows, DOS and Unix.
*Contact:
Jason Tsong-Li Wang, Department of Computer and Information
Science, New Jersey Institute of Technology, University Heights,
Newark, NJ 07102, jason@vienna.njit.edu, phone: (201) 596-3396,
fax: (201) 596-5777.
Dennis Shasha, Department of Computer Science,
Courant Institute of Mathematical Sciences, New York University,
251 Mercer Street, New York, NY 10012, shasha@cs.nyu.edu,
phone: (212) 998-3086, fax: (212) 995-4122.
*Status: binary is freely available, prototype,
server available on the Internet.
*Updated by: Jason Wang on 1994-11-17
=== Visualization
*Name: Data Desk
*Description: stat. package with excellent graphics
*Comments: In article <16023@lhdsy1.lahabra.chevron.com>, tgorb@rrc.chevron.com
(Joe Gorberg) writes:
On the mac side a good visualization tool I like and recommend is
Data Desk (you can get it from Egghead and MacWarehouse). Its a
stat. package with excellent graphics for x-y-z rotating plots,
histograms and much more. It really has helped me get value out
of neural nets and understanding the data.
*Platform(s): Mac
*Contact: Egghead and MacWarehouse
*Status: product
*Updated: 1993
*Name: NetMAP
*Description: data mining and visual relationship mapping
*Source: Computing (a UK magazine), 20 Jan 94
*Comments: ?
*Platform(s): ?
*Contact: Software AG
*Status: product
*Updated: 1994-02-01
*Name: PV-Wave
*Description: data visualization tool
*Platform(s): Unix, ?
*Contact: Visual Numerics, 5105 East 41st Avenue, Denver CO 80216-9952,
1-800-447-7147
*Status: product
*Updated: May 1994
*Name: WinViz
*Description: A data analysis tool utilizing visualization . Supports
the use of parallel coordinates technique to present multi-dimensional
datasets. An interactive visual query facility on the parallel
coordinates is also available.
*Platform(s): Windows
*Contact: Information Technology Institute, 71 Science Park Drive,
Singapore 0511, Republic of Singapore.
*Status: product
*Updated by: Hing-Yan Lee, hingyan@iti.gov.sg, 1994-06-20
=== Statistics
*Name: BBN Cornerstone
*Description: A user-friendly, integrated software package for
accessing, visualizing, analysis and presentation of data. Can import
data from several popular databases, including Oracle, SYBASE, and Informix.
*Comments: Nice user interface
*Platform(s): HP, Sun (as of 7/94). Soon Windows NT and Windows 4.0
*Contact: James Fitzgerald, BBN, 150 CambridgePark Drive, Cambridge, MA 02140, tel: 617-873-8191, fax 617-873-4751, e-mail: jfitz@bbn.com
*Status: product
*Updated: 1994-07-14
*Name: PC-MARS
*Description: A software package for developing models of non-linear
multivariable processes from past input/output data.
*Comments: Useful for predicting future outputs. Advertised as an
alternative to neural networks, helps user to understanfd the process
being modelled. Provides graphical tools.
*Platform(s): IBM PC and compatibles.
*Contact: Data Patterns, 528 S. 45th street, Philadelphia, PA 19104,
(215) 387-1844. 495
*Status: product
*Updated: 12/1992
=== Dimensional Analysis
*Name: CrossTarget
*Description: This product provides a flexible way of looking at data and
"drilling down" into your data. It has a spreadsheet format
with some graphical tools.
*Platform(s): ?
*Contact: Dimensional Insight, Inc., 99 South Bedford Street
Burlington, Mass. 01803, 617-229-9111
*Updated: 1992-12
*Name: Cross/Z
*Description:
*Discovery methods: Clustering, ?
*Comments: Use fractal compression
techniques to compress huge datasets to manageable sets of non-linear
coefficients. Included are data mining tools, such as chaos-theory-based
cluster analysis.
*Source: Internet posting
*Platform(s): ?
*Contact:
Cross/Z International, Inc.
9 Park Place
Great Neck, NY 11021
516 482 6300
516 482 6463 fax
*Status: product
*Updated by: GPS on 1995-01-25
*Name: Essbase Multi-Dimensional OLAP Server
*Description: Essbase is a high-performance multi-dimensional analytical
engine for OLAP (On-Line Analytical Processing.) It allows very rapid
analysis of extremely large data sets. Essbase is fully client/server
32-bit, multithreaded, SMP enabled. Essbase supports an unlimited number of
dimensions, and an unlimited number of members per dimension. Essbase
provides an Open API for client access, and works with a number of popular
front-end tools.
*Discovery methods: N/A: Essbase acts as a server to a range of analytical
front-end tools. Essbase provides vastly superior performance as compared
with a typical relational database.
*Comments: See also the comp.databases.olap usenet newsgroup
for a discussion of
On-Line Analytical Processing (OLAP), a relatively new category of
analytical tools defined by Dr. EF Codd.
*Source: ddruker@arborsoft.com
*Platform(s): Windows, Mac, OS2, NT, Unix
*Contact:
Arbor Software Corporation
1325 Chesapeake Terrace
Sunnyvale, CA, 94089
1-800-858-1666
ddruker@arborsoft.com
*Status: commercial software product
*Updated by: Dan Druker, ddruker@netcom.com 12/2/94
=== Other methods
*Name: FOIL 6.0
*Description: Learns relations from data.
FOIL6.0 is a fairly comprehensive (and overdue) rewrite of FOIL5.2.
The code is now more compact, better documented, and faster for most
tasks. The language has changed to ANSI C.
*Platform(s): Unix
*Contact:
To get FOIL6.0 by anonymous ftp
click here, or
ftp to ftp.cs.su.oz.au (129.78.8.208). Login as anonymous with your email
address as password. The file is "~ftp/pub/foil6.sh" (a shar file).
Comments and bug reports are most welcome!
Ross Quinlan , Mike Cameron-Jones
*Updated: Fri, 29 Oct 1993 11:07:26 +1000
=== Multistrategy Tools:
*Name: Darwin
*Description: Comprises 4 tools
a) StarMatch - uses memory-based reasoning technology to compare, in
parallel, the characteristics of one database record to all others to
find similar situations that can be used to predict outcomes
b) StarNet - uses neural network technology to define groups
c) StarTree - uses a parallel implementation of classification and
regression trees technology (CART) to develop segmentation rules that
define clusters
d) Star Gene - uses simulated evolutionary techniques to optimize
forecasting algorithms
*Platform(s): Thinking Machines
*Contact: Thinking Machine, 245 First Street, Cambridge, MA 02142-1264,
Tel: (617) 234-1000, Fax: (617) 234-4444
*Status: product
*Updated by: Hing-Yan Lee, hingyan@iti.gov.sg, 1994-06-20
*Name: DataMariner
*Description: DataMariner combines classical statistical
techniques with inductive machine learning to
discover multivariate relationships in numerical and discreet data.
The product consists of a set of tools for KDD, including:
clustering algorithms, automatic formation of new attributes,
simplifying attributes, rule induction, incremental rule
induction, rule pruning, cross-validation, rule evaluation
and a graphical display of rules. The rule induction
algorithm has several unique features that differentiate it from
the ID3-based algorithms. These include the per-class nature
of the algorithm and a well-founded treatment of noise and
unknown values. The tools are integrated into
a desktop-style graphical user interface, but are also
available as independent command line programs.
*Discovery methods: clustering, visualization, classification
*Comments: See C. Bryant's paper in KDD Coloquim, 1-2 Feb 1995.
*Source: Logica UK Ltd
*Platform: Sun workstations, Solaris 1 (SunOS 4) or Solaris 2
*Status: product
*Contact:
Richard Dallaway
Logica UK Ltd
Stephenson House
75 Hampstead Road
London NW1 2PL
UK
Phone: +44 (0)171 637 9111
Fax: +44 (0)171 344 3621
Email: dallawayr@logica.com
*Updated by: Richard Dallaway 13 Feb 1995
*Name: DBlearn
*Description: An integrated system for finding characteristic and
classification rules
from data in relational databases. It applies an attribute-oriented
induction method and the system has been tested on several large databases
with good performance.
Here is an
overview of dblearn.
*Platform(s): Unix
*Contact: Jiawei Han (han@cs.sfu.ca), at Simon Fraser U., Canada.
*Status: research system
*Updated: 1994-06-20
*Name: EMERALD (version 2)
*Description: a system of machine learning and discovery tools
for education and research.
*Comments:
The Artificial Intelligence Center at George Mason University has developed
EMERALD (version 2), a system of machine learning and discovery tools
for education and research. It introduces users to five different
learning programs, provides explanations how they work, and allows users to
experiment with them by designing their own problems, made up
from predefined objects. The system has well-designed and attractive
interface, utilizing color graphics. Rules learned by the system are
automatically translated to English and spoken by a speech synthesizer.
The system has already been delivered to many universities, including many
in Europe, where the system was demonstrated at several Summer schools.
The system includes several learning systems integrated at the user's level:
1) for learning rules from examples,
2) for learning structural descriptions of objects,
3) for conceptually grouping objects or events,
4) for discovering rules characterizing sequences, and
5) for learning equations based on qualitative and quantitative data.
It is envisioned that users could add their own modules in the future
that represent other learning paradigms.
*Platform(s): EMERALD runs on a Sun Workstation with a color
monitor. Sun Common Lisp and OpenWindows (version 2 or higher) are
required. A Sun Pascal library is necessary to run the Pascal
applications. While not necessary, DecTalk voice synthesis device is
highly recommended to enhance the presentation.
The system is delivered on a high-density 1.5" tape unless other arrangements
are made.
*Contact:
Dr. Janusz Wnek
Assistant Director for Research Management
Center for Artificial Intelligence
George Mason University
4400 University Dr.
Fairfax, VA 22030, USA
jwnek@aic.gmu.edu
tel. (703) 993-1717
fax. (703) 993-3729
*Status: public domain, prototype
*Updated: 1993
*Name: INSPECT
*Description: It is an MSDOS-based tool for the interpretation of data
(a lot of graphics, visualisation, PCA, MLR, neural networks, etc).
*Comments:
From: hlohning@email.tuwien.ac.at (Hans LOHNINGER)
Newsgroups: comp.ai.neural-nets
Subject: Re: RBF NN Good Function Estimator?
Date: 11 Nov 1993 14:00:37 GMT
I have been working with RBF Networks for some time and had no problem
in approximating whatever function I wanted. ... If you are interested
in another RBF implementation (in order to verify your results) feel free to
download INSPECT. It is an MSDOS-based tool for the interpretation of data
(a lot of graphics, visualisation, PCA, MLR, neural networks, etc).
It runs on IBM PCs (at least 286, 486 recommended), exhibits a graphical
user interface and provides some of the more important techniques of data
interpretation. This software is now available in an early test version via
anonymous ftp.
Of course, if you are seriously using INSPECT, I would appreciate
any bug reports or suggestions for further development.
The server address:
machine: ftp.tuwien.ac.at (128.130.66.22)
directory: Sources/NeuralNet/Inst-of-Chem.
files: i-prgXXX.exe, and i-docXXX.exe
The files in this directory are self-extracting MSDOS files and contain both
the program files (i-prgXXX.exe) and the documentation (i-docXXX.exe, which
is a bit fragmentary). The characters 'XXX' in the file-names stand for the
version number (currently around 067).
Installation:
1. Copy these files from the ftp-server to a dedicated directory on the PC
2. Run i-docXXX.exe (this extracts a PostScript file which contains the
documentation, approx. 1200 kByte, and some README file with the
latest information on INSPECT)
3. Run i-prgXXX.exe (this extracts the program files and some sample
data, approx. 800 kByte)
4. Read the documentation (Installation of INSPECT), install it (a very
simple task) and go.
*Source: posting 6025 of comp.ai.neural-nets
*Platform(s): IBM PC (at least 286, 486 recommended)
*Contact:
***********************************************
** Dr. Hans Lohninger **
** Institute of General Chemistry **
** Technical University Vienna **
** Lehargasse 4/152 **
** A-1060 Vienna, Austria **
** email: hlohning@email.tuwien.ac.at **
** fax: ++43-1-587-4835 **
** voice: ++43-1-58801-5048 **
***********************************************
*Status: public domain prototype
*Updated by: A.M.Oudshoff@research.ptt.nl on 1994-07-26
*Name: Mobal
*Description: Mobal 3.0 is an enhanced version
of the GMD knowledge acquisition and machine learning system for first-
order KBS development on Sparc workstations. Mobal is a multistrategy
learning system that integrates a manual knowledge acquisition and
inspection environment, a powerful first-order inference engine, and
various machine learning methods for automated knowledge acquisition,
structuring, and theory revision.
*Comments:
As the most visible change, the new release 3.0 no longer requires
Open Windows, but features an X11 graphical user interface built
using Tcl/Tk. This should make installation trouble-free for most
users, and through its networked client-server structure, allows easy
integration with other programs.
As a second change resulting from work in the ILP ESPRIT Basic
Research project, Mobal 3.0 now offers an "external tool" facility
that allows other (ILP) learning algorithms to be interfaced to the
system and used from within the same knowledge acquisition
environment. The current release of Mobal includes interfaces to
GOLEM by S. Muggleton and C. Feng (Oxford University), GRDT by V.
Klingspor (Univ. Dortmund) and FOIL 6.1 by R. Quinlan and M. Cameron-
Jones (Sydney Univ.).
*Platform(s): Unix
*Contact:
GMD grants a cost-free license to use Mobal for academic purposes.
The system can be obtained from
here , or
by ftp to ftp.gmd.de,
directory /ml-archive/GMD/software/Mobal
(login anonymous, password your E-Mail address).
For details about the scientific background of Mobal, see
the book "Knowledge Acquisition and Machine Learning", by K. Morik,
S. Wrobel, J.-U. Kietz and W. Emde (Academic Press, 1993). A user
guide is available via FTP.
*Status: public domain
*Updated: 1994-07-06
*Name: Recon
*Description: provides data analysts and decision-makers with a suite of data mining services. It combines
Top-down data mining. Analysts propose relationships that may hold
among the data based on their knowledge and models. Recon validates the
relationships against the data and helps analysts refine them.
Bottom-up data mining. Recon automatically extracts relationships
from data, and analysts use them to augment their models.
Recon's data mining methods include: deductive database, rule
induction, clustering, visualization, neural networks, and nearest
neighbor.
Full information is here.
*Comments: Recon's open architecture allows it to interface with
a variety of data sources, including relational databases
(Oracle, Sybase, DB2, etc),
spreadsheets, proprietary databases and domain-specific databases, and
ASCII files.
Recon is also expensive. Data analysis by Lockheed engineers is included, as
is customization of the selected data mining algorithm for the customer's
operation environment, plus training of the customer's data analysts on the
use of the system. The simplest contract is $30.000, the most extensive
contract is priced at over $250.000
*Platform(s): Unix
*Contact:
Dr. Evangelos Simoudis (simoudis@aic.lockheed.com)
Lockheed AI Center,
3251 Hanover Street, Palo Alto CA 94304
Voice: (415) 354-5271 Fax: 415-424-3425
*Status: Lockheed product
*Updated by: Sandra Oudshoff on 1994-07-27
Click here to return to Knowledge Discovery Mine