Publication Rate of Eprints
How many eprints from the various archives are published? How has this number changed over time?
The plot below shows the fraction of non-conference eprints published from the 3 major HEP arxives since their inception. Published is here defined as peer-reviewed journal, not conference proceedings. There is some lack pf precision in this definition within the data itself, but this should only affect a small (1-2%) percentage of the papers. Note that the fraction does not include papers noted as conference proceedings as wither published or non-published. Anything submitted to a conference is excluded from the study altogether. Roughly 10% (25%, 50%) of hep-th (ph,ex) papers are conference affiliated, and there are some that are misidentified in SPIRES. This misidentification is another source of small errors in the numbers, smaller in magnitude than the above.
Conference papers were excluded because while they are published in the sense that they are made public, they are not generally considered published literature. So for these papers the author effectively does not have publication as an option. Thus this graph should be viewed as a measure of published literature in the arXives compared with literature that exists solely as an arXiv eprint. This might provide an interesting handle on the use of eprints, and the decision to publish them.
Please click on the graph to see a clearer version. Note that the x-axis is at 50%, not 0, and also remember that the various arXives started at different times. Please note that the last year's data (from 2003 eprints) seems to show a decline, but this is probably due to a time lag until publication of these preprints.
The three upper lines are the fraction of (non-conference) eprints that have 50 or more citations (SPIRES topcites) that have been published. As one might expect, these more important, or more exposed, papers are more likely to be published. Here there may be a slight downward trend. Also note that of the 3 arXives, hep-ex is the most likely to publish their papers. Again, this is probably not unexpected.
The raw data used for this plot is available here.