💾 Archived View for spam.works › mirrors › textfiles › internet › top1000.use captured on 2023-11-14 at 10:26:35.
⬅️ Previous capture (2023-06-16)
-=-=-=-=-=-=-
From: walker@hpl-opus.hpl.hp.com (Rick Walker) Date: Tue, 19 Jan 1993 20:43:44 GMT Subject: Re: Top 1000 English words Message-ID: <63140014@hpl-opus.hpl.hp.com> Organization: HP Labs, High Speed Electronics Dept., Palo Alto, CA Newsgroups: comp.sources.wanted Lines: 1014 In comp.sources.wanted, zzassgl@uts.mcc.ac.uk (Geoff Lane) writes: > I'm looking for an ftp'able list of the 1000 most common English words in > non-technical sources (otherwise I'd just scan the man pages :-) Culled from one year of USENET traffic, here is my list of the top 1000 words, along with percentage of occurence: (this is from a database of 343945617 total scanned words). -- Rick Walker 4.01838 the 2.43805 to 2.05957 of 1.95582 a 1.70176 I 1.68549 and 1.32531 is 1.23345 in 1.14749 that 0.811128 it 0.809861 for 0.713653 you 0.608371 on 0.607637 be 0.572971 have 0.550857 are 0.537898 with 0.516607 not 0.495937 this 0.492865 The 0.453028 or 0.450606 as 0.428827 was 0.36647 but 0.333821 at 0.323635 In 0.319617 from 0.318724 by 0.296894 an 0.293426 if 0.284022 they 0.278568 about 0.274626 would 0.271224 can 0.26783 one 0.267568 my 0.260243 will 0.258019 all 0.257 X 0.247188 article 0.243054 do 0.235852 edu 0.232097 has 0.213221 like 0.212694 there 0.2122 me 0.211624 writes 0.210243 out 0.209659 your 0.207384 what 0.205958 which 0.202603 UUCP 0.201669 some 0.200472 so 0.192512 we 0.191887 more 0.182256 who 0.18066 any 0.180247 don't 0.17788 up 0.173927 get 0.172152 am 0.171357 A 0.170564 If 0.170116 just 0.167853 he 0.167345 no 0.16423 other 0.163039 people 0.158253 know 0.155878 only 0.155658 their 0.155111 than 0.152515 This 0.152292 It 0.151887 think 0.151685 when 0.151365 them 0.149513 been 0.147283 time 0.14672 had 0.140304 were 0.132269 And 0.131215 Note 0.12924 C 0.129151 COM 0.128763 his 0.123732 should 0.119261 N 0.117737 m 0.116341 S 0.11565 use 0.115605 R 0.113397 P 0.112006 then 0.110857 also 0.109158 good 0.108499 how 0.108498 B 0.108333 could 0.10553 way 0.105457 T 0.104614 very 0.104014 W 0.103779 into 0.101045 E 0.0991447 com 0.0991244 much 0.0982094 M 0.0980431 make 0.09755 because 0.095987 these 0.0945879 does 0.0936212 see 0.0934654 may 0.0930397 O 0.0922899 As 0.0896299 Page 0.0894723 pm 0.0891153 even 0.089067 You 0.0890048 two 0.088464 want 0.0851347 it's 0.0847974 L 0.082975 most 0.082905 new 0.0828994 many 0.082898 well 0.0824665 s 0.0822866 such 0.0821202 system 0.0820755 really 0.0820668 first 0.0812326 HP 0.0803764 same 0.0796571 those 0.0793701 Response 0.0786967 our 0.0786668 now 0.0785711 say 0.0785665 work 0.0785514 being 0.0783275 used 0.0777565 Oct 0.0775259 EDU 0.0775201 U 0.0756343 too 0.0755451 anyone 0.0752037 here 0.0748964 where 0.0747072 over 0.0746071 What 0.0736154 right 0.0735061 But 0.0723516 problem 0.0721922 did 0.0705678 something 0.0704021 go 0.0673641 There 0.0671432 its 0.0671045 her 0.0667504 back 0.0665373 file 0.0661927 We 0.0652429 D 0.0648393 i 0.0646422 still 0.0630114 need 0.0622991 said 0.0622799 find 0.0620444 years 0.0615077 off 0.061089 things 0.0607174 ve 0.0607049 him 0.0604052 after 0.0603767 point 0.0596905 before 0.0596487 etc 0.0594036 cs 0.0593248 take 0.0591858 us 0.0591079 going 0.0587689 They 0.0584691 might 0.0576827 mail 0.0574533 since 0.0574283 never 0.0569497 better 0.0561673 read 0.0561156 name 0.0558126 got 0.0556262 long 0.0553567 someone 0.0550901 she 0.0550023 can't 0.0546895 why 0.0546485 last 0.0545013 few 0.0543528 All 0.0541775 My 0.0541118 number 0.0532997 must 0.0531747 using 0.0528424 own 0.0527156 little 0.0527057 doesn't 0.052645 made 0.0517547 down 0.051682 believe 0.0516814 d 0.0511659 He 0.0511566 net 0.0510409 So 0.050062 while 0.0500474 line 0.0500012 both 0.0499713 around 0.0499297 another 0.0495322 through 0.0494651 For 0.0488467 thing 0.0486359 AT 0.0485498 without 0.0481492 case 0.0481082 Also 0.0478078 No 0.0477389 between 0.047659 year 0.0475011 set 0.0469141 sure 0.0463751 probably 0.0462047 seems 0.045984 Jan 0.0458253 University 0.04575 enough 0.0455557 didn't 0.0455226 e 0.0453653 different 0.0452851 least 0.0451914 J 0.0450065 group 0.0445518 program 0.0443544 else 0.044255 BITNET 0.0441331 put 0.0440875 F 0.0438642 lot 0.0436729 DIRECT 0.0434138 John 0.043313 each 0.0432295 V 0.0431565 It's 0.0430463 information 0.0429042 ATT 0.0426422 part 0.042316 How 0.0423087 Any 0.0423015 question 0.0422421 old 0.0421189 real 0.0419372 course 0.0419139 anything 0.0416958 fact 0.0410928 H 0.0407198 When 0.0406695 best 0.0403119 call 0.0402939 c 0.0402872 end 0.0397944 give 0.039726 help 0.0396595 DEMAND 0.0394621 uunet 0.039426 At 0.0391236 Is 0.0389291 come 0.0387951 called 0.0386855 person 0.0386259 either 0.0384776 under 0.0384511 run 0.0382811 try 0.0381839 done 0.0381415 American 0.0380845 Mar 0.0380168 though 0.0375745 always 0.0375522 list 0.0374344 uucp 0.0373818 look 0.0372809 news 0.0370582 world 0.036948 thought 0.0367012 far 0.0366023 again 0.0365096 rec 0.0361479 available 0.0361237 seen 0.0359987 quite 0.0358661 rather 0.0358626 To 0.0358557 less 0.0357996 life 0.035651 One 0.0356385 day 0.0356341 problems 0.0353434 Aug 0.0353219 great 0.0352093 software 0.0350933 found 0.0350582 tell 0.0350006 women 0.0349558 every 0.0348759 ARPA 0.0347113 code 0.0344874 ever 0.0344022 against 0.034381 bit 0.0340132 place 0.0339821 version 0.0339734 After 0.0339507 general 0.0339356 data 0.0338289 support 0.0337402 Apple 0.033689 having 0.0336364 mean 0.0335995 above 0.0334873 heard 0.0332651 Thanks 0.0332622 doing 0.0330506 able 0.0329508 high 0.0328726 From 0.03287 next 0.0328546 state 0.0328395 change 0.0328322 G 0.0325924 book 0.0325194 Now 0.0324484 talk 0.0322513 Well 0.0322173 K 0.0321208 New 0.0320242 possible 0.0319638 please 0.031903 bad 0.0318925 Does 0.0318042 seem 0.0317838 US 0.0316672 man 0.0315538 Berkeley 0.0314951 following 0.031448 send 0.031432 example 0.0314198 several 0.0313986 isn't 0.0313922 Computer 0.0313349 reason 0.0312421 That 0.0311532 trying 0.0311282 getting 0.0309331 you're 0.0309322 true 0.0309151 feel 0.0308075 wrong 0.0307822 type 0.0307688 let 0.0307418 stuff 0.0307348 keep 0.0307083 n 0.030606 hard 0.0305048 left 0.0304981 idea 0.0304836 show 0.0302597 post 0.0302248 says 0.0300417 power 0.0300306 remember 0.0298428 looking 0.0298213 Why 0.0297855 until 0.0297768 game 0.0296652 local 0.0296428 David 0.0295771 non 0.0295262 ago 0.0294893 May 0.0293895 ll 0.029375 others 0.0292927 car 0.029129 control 0.029038 Are 0.02899 hp 0.0289235 actually 0.0288322 posting 0.0287659 Apr 0.0287615 that's 0.0287577 three 0.0285813 yet 0.028522 message 0.0283937 o 0.028303 x 0.02821 away 0.0281414 computer 0.0279189 machine 0.0278782 makes 0.0277375 interested 0.0276709 files 0.0275997 kind 0.0275608 Sep 0.0274956 large 0.0273863 sun 0.0272561 current 0.0272287 already 0.0272255 order 0.0271898 small 0.0271619 means 0.027131 times 0.0271153 government 0.0271145 Feb 0.027056 space 0.0270258 free 0.0270188 systems 0.027004 running 0.0268609 second 0.0267862 Q 0.0267502 Y 0.0267057 However 0.0265626 money 0.0264341 nothing 0.0264091 home 0.0263207 level 0.026317 music 0.0261309 CA 0.0261222 start 0.0259922 issue 0.0259509 men 0.0259428 An 0.0257596 whether 0.0256988 given 0.0256384 test 0.0256296 user 0.0256055 big 0.0255017 pretty 0.0254831 based 0.0254488 Please 0.0254409 Sun 0.025108 On 0.0250112 address 0.02489 once 0.0245928 misc 0.024423 agree 0.0243803 area 0.0243509 Systems 0.0243483 include 0.024073 write 0.0240346 mind 0.0240195 rutgers 0.0239616 comp 0.0239552 experience 0.0237895 memory 0.0237761 original 0.023775 Of 0.0236421 discussion 0.0235473 DAILY 0.0235098 Z 0.0234901 word 0.0233746 God 0.0233525 understand 0.0233237 UNIX 0.0233179 uk 0.0232473 matter 0.0232386 Not 0.02319 THE 0.0231574 during 0.0231548 play 0.0230691 won't 0.0229039 standard 0.0228981 making 0.0228475 hand 0.0227638 drive 0.0227213 days 0.0226076 copy 0.0225562 whole 0.0224829 Do 0.0223684 human 0.0223341 works 0.0223314 PC 0.0223134 Steve 0.0222846 interesting 0.0222742 System 0.0222454 Dec 0.022241 Just 0.0222242 cannot 0.0221096 Yes 0.0220791 often 0.0219997 disk 0.0219715 side 0.0219651 maybe 0.0218843 These 0.0217694 nice 0.0217505 came 0.0216901 public 0.0216871 Some 0.0216566 Mr 0.0215267 source 0.0214964 Dave 0.0214845 guess 0.0214833 HOURLY 0.021481 open 0.0214089 NOT 0.0213039 almost 0.0212906 full 0.0211414 h 0.0211371 buy 0.0210783 important 0.0210295 response 0.0209908 ask 0.0209728 return 0.0209266 simply 0.0208859 Mark 0.0207222 went 0.020682 hope 0.0205096 told 0.0204811 tried 0.0204512 wanted 0.0204181 story 0.0203864 process 0.0203381 saying 0.0203381 form 0.0203148 Another 0.0202837 p 0.0202762 love 0.0202634 couple 0.0201796 gets 0.0201692 law 0.0201686 answer 0.02015 live 0.02015 city 0.0200988 Since 0.0200476 comes 0.0200401 working 0.0199779 AND 0.0198386 goes 0.0198369 country 0.0197642 sort 0.0196894 major 0.0196694 Mike 0.0196665 per 0.0195854 haven't 0.0195711 Inc 0.0195138 everyone 0.0195008 Bell 0.0194467 cost 0.0194284 command 0.0194173 Michael 0.019367 care 0.0193345 words 0.0193068 usually 0.0193062 company 0.0192719 posted 0.0192091 Bob 0.019201 water 0.0191734 b 0.0191321 groups 0.0191059 opinions 0.0190888 Nov 0.0190597 reading 0.0190562 Actually 0.0190332 instead 0.0188966 job 0.0188771 written 0.0188364 size 0.0188091 Or 0.018805 single 0.0187963 wouldn't 0.0187445 sense 0.0187393 pay 0.0187187 value 0.0186768 programs 0.0186122 language 0.0186053 short 0.0185762 lines 0.0185733 soc 0.0184896 att 0.0184829 Then 0.0184358 questions 0.018423 San 0.0183948 Jul 0.0183291 certainly 0.0183256 Jim 0.0183183 mod 0.0182994 later 0.0182142 Anyone 0.018202 note 0.0181718 speed 0.0181697 saw 0.0181072 similar 0.0180226 week 0.0180014 character 0.017979 Can 0.0179674 light 0.0179633 Paul 0.0179496 friend 0.0179206 certain 0.0179028 difference 0.0178674 including 0.0178618 info 0.0178043 myself 0.0177531 ac 0.0177426 access 0.0177319 responses 0.0177127 hear 0.0176316 within 0.0176098 however 0.0176048 add 0.0175839 g 0.017572 Bill 0.017515 correct 0.0174676 Science 0.0174455 become 0.0173647 text 0.0172516 Center 0.0172231 top 0.0171716 asked 0.0171687 error 0.0171489 known 0.017101 IBM 0.0170568 perhaps 0.0170498 consider 0.0170323 sound 0.0170262 easy 0.0170164 price 0.0170105 started 0.0169306 especially 0.0168724 rights 0.0168666 stop 0.0168105 rest 0.0167948 everything 0.0167774 games 0.0167768 talking 0.016773 LOCAL 0.0167372 recently 0.0167268 whatever 0.0167189 particular 0.0165826 half 0.0165692 low 0.0165628 simple 0.016525 sex 0.016516 define 0.0165061 network 0.0164977 subject 0.0164953 except 0.0164122 ones 0.0163837 provide 0.0163683 class 0.0163648 fine 0.0163613 Chinese 0.0163314 check 0.0163273 woman 0.0163197 took 0.0163125 months 0.0162994 II 0.0162784 interest 0.0162758 along 0.0162587 She 0.0162287 turn 0.0162136 America 0.0161345 due 0.0161188 clear 0.016118 Amiga 0.0161177 close 0.016113 past 0.0161104 mit 0.0160898 children 0.0160636 By 0.016036 That's 0.0159325 via 0.0159165 points 0.0158676 team 0.0158641 phone 0.0158612 argument 0.0158243 various 0.0158092 result 0.015797 although 0.0157912 school 0.015783 opinion 0.0157731 worth 0.0157298 deal 0.0157051 Oh 0.015692 books 0.0156897 mode 0.0156632 service 0.0156539 Don't 0.015565 together 0.0154891 MT 0.0154745 f 0.0154452 mine 0.0154306 there's 0.0154045 night 0.0153955 tape 0.0153821 cause 0.0153423 guy 0.0153297 wasn't 0.0153074 common 0.0152972 int 0.0152966 effect 0.0152873 Jun 0.0151588 position 0.0151405 Maybe 0.0150887 series 0.0150858 head 0.0150442 r 0.0150352 likely 0.0150268 needs 0.0149916 itself 0.0149559 t 0.0149457 characters 0.0149402 situation 0.0149236 comments 0.0149233 device 0.0149064 DEDICATED 0.014896 key 0.0148826 unless 0.0148396 special 0.0148215 move 0.0148064 window 0.0147779 users 0.0147759 request 0.0147608 leave 0.0147215 allow 0.0146971 Box 0.0146759 hplabs 0.0145886 Anyway 0.0145814 yes 0.0145593 sent 0.0145575 personal 0.0145479 Software 0.0145337 aren't 0.0145322 self 0.0145288 mentioned 0.0144936 OF 0.0144851 ucbvax 0.0144514 char 0.0144479 BTW 0.0144072 Tom 0.0144011 claim 0.0143848 taken 0.0143529 Robert 0.0143511 Scott 0.0143389 record 0.0143092 future 0.0143052 function 0.0143043 takes 0.0142976 uses 0.0142412 child 0.0142174 Because 0.014197 field 0.0141633 exactly 0.0141537 longer 0.0141511 view 0.0141496 four 0.0141485 Most 0.0141325 themselves 0.0141299 Unix 0.0141284 happen 0.0141043 unix 0.0141005 expect 0.014099 Internet 0.0140516 students 0.0139656 room 0.0139621 Peter 0.0139508 w 0.0139502 changed 0.0139391 front 0.0139284 today 0.013915 rate 0.0138757 society 0.0138574 BA 0.0138502 business 0.0138374 they're 0.0138295 recent 0.0138272 With 0.0137853 movie 0.0137781 sources 0.013744 numbers 0.0137208 SPW 0.0137083 main 0.0136897 needed 0.0136586 screen 0.0136475 California 0.0136434 wrote 0.0136385 anyway 0.0136254 early 0.0136135 product 0.0136045 friends 0.0135431 ca 0.0135362 Alan 0.0135347 issues 0.0135344 performance 0.0135248 machines 0.0135207 Your 0.0135132 Research 0.0135114 board 0.0134658 lost 0.0134646 anybody 0.0134257 page 0.0134085 looks 0.0133978 amount 0.0133943 house 0.0133867 articles 0.0133739 wants 0.0133739 Who 0.013369 BSD 0.0133193 Re 0.0133097 First 0.0132847 results 0.0132634 CD 0.0132387 newsgroup 0.0132326 hit 0.0132146 wish 0.0132012 cc 0.0131695 gun 0.0131617 knows 0.0131608 root 0.0131349 market 0.013132 ST 0.0131282 statement 0.0131198 necessary 0.013116 fun 0.0131149 design 0.0130957 month 0.0130916 USA 0.0130867 thinking 0.0130817 date 0.0130692 history 0.0130538 happened 0.0130518 ALL 0.0130329 term 0.0130044 hours 0.0129991 State 0.0129922 soon 0.0129808 break 0.0129363 death 0.0129305 bitnet 0.0129189 card 0.0129157 names 0.0129102 Richard 0.012907 MA 0.0128686 lots 0.0128663 legal 0.0128613 choice 0.012861 evidence 0.0128596 minutes 0.0128404 war 0.0128401 St 0.0128285 body 0.0127921 taking 0.0127767 Even 0.0127668 ideas 0.0127564 research 0.0127471 yourself 0.0127372 Perhaps 0.0127119 release 0.0126869 involved 0.0126828 format 0.0126241 useful 0.0125947 Joe 0.0125799 server 0.0125732 Although 0.0125415 writing 0.0125363 chance 0.0125357 While 0.0125311 black 0.0125197 BERKELEY 0.0125031 he's 0.012502 assume 0.0124947 cmu 0.0124787 Dr 0.0124732 upon 0.0124552 kill 0.0124528 received 0.0124502 required 0.0124462 gov 0.0124275 playing 0.0124084 output 0.0124057 sounds 0.0123679 weeks 0.0123615 cup 0.0123226 air 0.0123171 radio 0.0123104 willing 0.0123002 couldn't 0.0122938 Mac 0.0122918 MIT 0.0122915 changes 0.0122828 near 0.0122784 film 0.0122781 complete 0.0122476 Here 0.0122365 reasons 0.0122159 played 0.0122112 vote 0.0122043 present 0.0121653 related 0.0121633 cases 0.0121609 TV 0.0121493 political 0.012092 quality 0.012072 currently 0.0120327 environment 0.0120295 string 0.0119975 learn 0.011944 paper 0.0119339 color 0.0119243 parts 0.0119173 hold 0.0119039 advance 0.0118932 postmaster 0.011892 AIDS 0.0118879 OK 0.0118871 fast 0.0118862 model 0.0118859 force 0.0118757 rules 0.0118748 China 0.0118699 hardware 0.011865 York 0.0118417 cut 0.0118301 considered 0.0117995 directory 0.0117978 object 0.0117929 Department 0.0117774 sometimes 0.0117681 difficult 0.0117513 outside 0.0117216 album 0.0117158 save 0.0116995 specific 0.0116812 completely 0.0116803 doubt 0.0116588 laws 0.0116373 DEC 0.0116071 food 0.0116059 calls 0.0116024 folks 0.0115919 total 0.0115504 usr 0.0115405 re 0.0115309 contact 0.0115239 James 0.011519 domain 0.0114972 higher 0.011496 April 0.011485 site 0.0114698 shows 0.0114666 normal 0.0114422 Andrew 0.0114419 directly 0.0114256 TO 0.0114219 white 0.0114158 among 0.0113974 coming 0.0113582 Jeff 0.0113341 English 0.011332 family 0.0113306 sci 0.0113285 religion 0.0113277 supposed 0.011321 UX 0.0113003 sys 0.0112622 solution 0.0112617 Barry 0.0112608 culture 0.011259 dead 0.0112527 development 0.0112367 reasonable 0.0112247 decwrl 0.0112224 create 0.011209 decided 0.0112076 appropriate 0.0111634 knowledge 0.0111573 behind 0.0111489 DOS 0.011066 CS 0.0110622 berkeley 0.0110387 exist 0.0110326 BBS 0.0110035 suggest 0.0110023 buffer 0.0109892 science 0.0109852 interface 0.010977 Americans 0.0109578 action 0.0109552 entire 0.0109494 below 0.0109288 Has