Baseball Salaries Dataset From http://www.amstat.org/publications/jse/jse_data_archive.htm NAME: Pay for Play: Are Baseball Salaries Based on Performance? TYPE: Census SIZE: 337 observations, 18 variables The article associated with this dataset appears in the Journal of Statistics Education, Volume 6, Number 2 (July 1998). (http://www.amstat.org/publications/jse/v6n2/datasets.watnik.html) SUBMITTED BY: Mitchell R. Watnik Department of Mathematics and Statistics University of Missouri-Rolla Rolla, MO 65409-0020 Further notes: http://www.amstat.org/publications/jse/datasets/baseball.txt Columns 1 - 4 Salary (in thousands of dollars) 6 - 10 Batting average 12 - 16 On-base percentage (OBP) 18 - 20 Number of runs 22 - 24 Number of hits 26 - 27 Number of doubles 29 - 30 Number of triples 32 - 33 Number of home runs 35 - 37 Number of runs batted in (RBI) 39 - 41 Number of walks 43 - 45 Number of strike-outs 47 - 48 Number of stolen bases 50 - 51 Number of errors 53 Indicator of "free agency eligibility" 55 Indicator of "free agent in 1991/2" 57 Indicator of "arbitration eligibility" 59 Indicator of "arbitration in 1991/2" 61 - 79 Player's name (in quotation marks) —————————— Pima Indian Diabetes Dataset http://archive.ics.uci.edu/ml/datasets/Pima+Indians+Diabetes Abstract: From National Institute of Diabetes and Digestive and Kidney Diseases; Includes cost data (donated by Peter Turney) Original Owners: National Institute of Diabetes and Digestive and Kidney Diseases Donor of database: Vincent Sigillito (vgs '@' aplcen.apl.jhu.edu) Research Center, RMI Group Leader Applied Physics Laboratory The Johns Hopkins University Johns Hopkins Road Laurel, MD 20707 (301) 953-6231 Data Set Information: Several constraints were placed on the selection of these instances from a larger database. In particular, all patients here are females at least 21 years old of Pima Indian heritage. ADAP is an adaptive learning routine that generates and executes digital analogs of perceptron-like devices. It is a unique algorithm; see the paper for details. Attribute Information: 1. Number of times pregnant 2. Plasma glucose concentration a 2 hours in an oral glucose tolerance test 3. Diastolic blood pressure (mm Hg) 4. Triceps skin fold thickness (mm) 5. 2-Hour serum insulin (mu U/ml) 6. Body mass index (weight in kg/(height in m)^2) 7. Diabetes pedigree function 8. Age (years) 9. Class variable (0 or 1) ** UPDATE: Until 02/28/2011 this web page indicated that there were no missing values in the dataset. As pointed out by a repository user, this cannot be true: there are zeros in places where they are biologically impossible, such as the blood pressure attribute. It seems very likely that zero values encode missing data. However, since the dataset donors made no such statement we encourage you to use your best judgement and state your assumptions. Relevant Papers: Smith, J.W., Everhart, J.E., Dickson, W.C., Knowler, W.C., & Johannes, R.S. (1988). Using the ADAP learning algorithm to forecast the onset of diabetes mellitus. In Proceedings of the Symposium on Computer Applications and Medical Care} (pp. 261--265). IEEE Computer Society Press. [Web Link]