Dataset generator
Datgen, formerly SCDS, is a computer program that generates data to systematically test programs that consume data. These synthetic datasets can be used to validate learning algor…
DELVE - Data for Evaluating Learning in Valid Experiments
Data for Evaluating Learning Valid Experiments: A standardized environment designed to evaluate the performance of methods that learn relationships based primarily on empirical dat…
HS3D - Homo Sapiens Splice Sites Dataset
HS3D (Homo Sapiens Splice Sites Dataset) is a database of Homo Sapiens Exon, Intron and Splice regions extracted from GenBank primate sequences Rel.123. The aim of this data set …
National Space Science Data Center
Provides access to a wide variety of astrophysics, space physics, solar physics, lunar and planetary data from NASA space flight missions, in addition to selected other data a…
Reuters-21578 Text Categorization Corpus
A classic benchmark for text categorization algorithms.
The StatLib Datasets Archive
A repository of datasets used in statistics and machine learning.
Web->KB dataset
Web pages partitioned into classes, with hyperlink data. The dataset has been used for text categorization and learning to extract symbolic knowledge from the World Wide Web.