Dataset generator
Datgen, formerly SCDS, is a computer program that generates data to systematically test programs that consume data. These synthetic datasets can be used to validate learning algor…
DELVE - Data for Evaluating Learning in Valid Experiments
Data for Evaluating Learning Valid Experiments: A standardized environment designed to evaluate the performance of methods that learn relationships based primarily on empirical dat…
HS3D - Homo Sapiens Splice Sites Dataset
HS3D (Homo Sapiens Splice Sites Dataset) is a database of Homo Sapiens Exon, Intron and Splice regions extracted from GenBank primate sequences Rel.123. The aim of this data set …
National Space Science Data Center
Provides access to a wide variety of astrophysics, space physics, solar physics, lunar and planetary data from NASA space flight missions, in addition to selected other data a…
Reuters-21578 Text Categorization Corpus
A classic benchmark for text categorization algorithms.
The StatLib Datasets Archive
A repository of datasets used in statistics and machine learning.
Web->KB dataset
Web pages partitioned into classes, with hyperlink data. The dataset has been used for text categorization and learning to extract symbolic knowledge from the World Wide Web.
Showing 20–9 of 9 results