Baseball Salaries Dataset
NAME: Pay for Play: Are Baseball Salaries Based on Performance?
SIZE: 337 observations, 18 variables
The article associated with this dataset appears in the Journal of Statistics Education, Volume 6, Number 2 (July 1998). (http://www.amstat.org/publications/jse/v6n2/datasets.watnik.html)
Mitchell R. Watnik
Department of Mathematics and Statistics
University of Missouri-Rolla
Rolla, MO 65409-0020
Further notes: http://www.amstat.org/publications/jse/datasets/baseball.txt
1 - 4 Salary (in thousands of dollars)
6 - 10 Batting average
12 - 16 On-base percentage (OBP)
18 - 20 Number of runs
22 - 24 Number of hits
26 - 27 Number of doubles
29 - 30 Number of triples
32 - 33 Number of home runs
35 - 37 Number of runs batted in (RBI)
39 - 41 Number of walks
43 - 45 Number of strike-outs
47 - 48 Number of stolen bases
50 - 51 Number of errors
53 Indicator of "free agency eligibility"
55 Indicator of "free agent in 1991/2"
57 Indicator of "arbitration eligibility"
59 Indicator of "arbitration in 1991/2"
61 - 79 Player's name (in quotation marks)
Pima Indian Diabetes Dataset
Abstract: From National Institute of Diabetes and Digestive and Kidney Diseases; Includes cost data (donated by Peter Turney)
National Institute of Diabetes and Digestive and Kidney Diseases
Donor of database:
Vincent Sigillito (vgs '@' aplcen.apl.jhu.edu)
Research Center, RMI Group Leader
Applied Physics Laboratory
The Johns Hopkins University
Johns Hopkins Road
Laurel, MD 20707
Data Set Information:
Several constraints were placed on the selection of these instances from a larger database. In particular, all patients here are females at least 21 years old of Pima Indian heritage. ADAP is an adaptive learning routine that generates and executes digital analogs of perceptron-like devices. It is a unique algorithm; see the paper for details.
1. Number of times pregnant
2. Plasma glucose concentration a 2 hours in an oral glucose tolerance test
3. Diastolic blood pressure (mm Hg)
4. Triceps skin fold thickness (mm)
5. 2-Hour serum insulin (mu U/ml)
6. Body mass index (weight in kg/(height in m)^2)
7. Diabetes pedigree function
8. Age (years)
9. Class variable (0 or 1)
** UPDATE: Until 02/28/2011 this web page indicated that there were no missing values in the dataset. As pointed out by a repository user, this cannot be true: there are zeros in places where they are biologically impossible, such as the blood pressure attribute. It seems very likely that zero values encode missing data. However, since the dataset donors made no such statement we encourage you to use your best judgement and state your assumptions.
Smith, J.W., Everhart, J.E., Dickson, W.C., Knowler, W.C., & Johannes, R.S. (1988). Using the ADAP learning algorithm to forecast the onset of diabetes mellitus. In Proceedings of the Symposium on Computer Applications and Medical Care} (pp. 261--265). IEEE Computer Society Press.