Interesting People mailing list archives
IP: Frequency of top 1,000 USENET words
From: Dave Farber <farber () cis upenn edu>
Date: Sun, 27 Dec 1998 03:23:21 -0500
From: Mike Radow <mradow () inx inx net> - It is hoped that this will be useful to others... In building "word-to-token" compressed files of technical text, we've had good experience with this file. We've used this for several years and the distribution is a good fit for the distribution of our text. Unlike other "general text" frequencies, this list was generated from USENET traffic. My sincere thanks to Lee Maixner, for locating this URL...: Linkname: top1000.use URL: http://wiretap.spies.com/Gopher/Library/Article/Language/top1000.use /\/\...snipped... Date: Tue, 19 Jan 1993 20:43:44 GMT Subject: Re: Top 1000 English words ... Top 1000 English words Culled from one year of USENET traffic, here is my list of the top 1000 words, along with percentage of occurence: (this is from a database of 343945617 total scanned words). -- Rick Walker 4.01838 the 2.43805 to 2.05957 of 1.95582 a 1.70176 I 1.68549 and 1.32531 is 1.23345 in 1.14749 that 0.811128 it .. 0.0109892 science 0.0109852 interface 0.010977 Americans 0.0109578 action 0.0109552 entire 0.0109494 below 0.0109288 Has \/\/ Mike - Mike Radow <---> mradow () inx net
Current thread:
- IP: Frequency of top 1,000 USENET words Dave Farber (Dec 27)