This is an old bug list about the old Web Access Gateway, which is no longer maintained, having been largely replaced by my stylesheets for low vision and Web Adjuster. This page is now for historical interest only.
The following is a list of some outstanding gateway bugs, in no particular order. It is mostly in terse note form. The numbering is subject to change.
Redirect to ssl version when TYPE https url - do it with a Location directive (if CAN_SWITCH_SSL is defined in platform.h)
Also, getting images over non-SSL (in an SSL page) is a potential privacy compromise if unauthorised persn is snooping the net (& someone cld compromise _integrity_ of SSL pages by chg the char images) - document or fix
(but no big problem because the browser should warn anyway)
protocol://user:pass@host:port - the user:pass bit might sometimes be incorrectly handled (might matter if someone encodes their links like that)
e.g. on http://www.jython.org/cgi-bin/faqw.py?req=index (try follow a link; doesn’t work until you press OK) (should probably re-write them somewhere)
The gateway does not recognise the ISO designators for Cyrillic, Esc - L and Esc - A. This is because I don’t know which ISO designator goes with which code page.
fread /other/nobackup/*.count (and *.freqtbl) into an array of ints; find max, max-1 etc; top N in reverse order (to 4d?)
(tbl’s: try using .py prototype to get them into text 1st. vice versa?)
M.T. sent these URLs http://kanji.zinbun.kyoto-u.ac.jp/~yasuoka/CJK.html http://web.kyoto-inet.or.jp/people/tomoko-y/biwa/wnn/iso-2022.html
- Get a frequency table for Cyrillic - Improve auto-detect code (maximise chars that fall within the highest-frequency range?) - Rename “DOS Russian” - add Cp866 - IBM - Cp855 - KOI8-R - some errors in the table; see http://koi8.pp.ru/utf-8.koi8-r.htmlu - koi8.pp.ru/koi8-r_unicode.txt - What I will try to do is get the mapping tables into a human-editable form. Then if you like you can edit them. But it may be some time before I can do that. - charset= stuff (need alias table) (modify .tbl files? or do it separately) - [ CU Slavonic & East European Society ] ; [ [CU Yugoslav Society] (about 40 members on soc-cuyu) ]
/usr/share/i18n/charmaps could be useful
This file (gateway.bugs) is translated to HTML & updated by the website update script; it should be done by the gateway update script (to pageroot) like the help file is.
Maybe have “The latest version is N.N.N” at top of access.html (use htp.def?) and rsync it (or Makefile - rsync won’t work due to date stamp problems)
If you put a non-standard colour in the URL and then select the “colours” button, it gets lost because it is not one of the options. Maybe if none of the options match the current value, add a new one that does (quoting the HTML figure or something).
Maybe add something to onMouseOver and onMouseOut
html2xhtml ok but script problm (do *after* proc) (ok for now...) (or just hack it - “write out the comments inside script, *maybe* w/out <!– –>”) (lower pri: put <html> </html> in if not already there) (do we get the ?xml? thing, + this, into mytest itself?) Also it would be nice to upgrade the HTML spec to 4 (esp. tables) (lower pri: integrate it with the C++ HTML filter, & remove the code that’s made redundant by it)
(Apart from all those links that say “here” - if a blind person, or a mobile phone user, is trying to get a summary of a page by getting the computer to just output the links, they get “here, here, here”. Not to worry - one of these days I’ll add an option to my web mediator to handle them.)
Meaningless ALT tags (“Click here!”) - http://www.fujitsu.co.jp/hypertext/hdd/drive/disk_e.html
Gateway: Need to do something about entire sentences being in capitals (make them title case instead) (but leave acronyms etc alone)
Option to strip width & height from embed? <embed src=”x.mid” width=2 height=0 autostart=true loop=true>
tracttext.cc, libz (zlib1g-dev) Chris Lightfoot (saved in MiscStuff) non-HTML and plug-in sort out; strings; swf thing (sep CGI? system command??); http://www.flashgallery.co.uk/ source
swf: As predicted the content was minimal, but I could extract the links to the other subpages which was sufficient to obtain the required information.
Without naming the guilty parties I’ve encountered a fair number of such sites on the Univeristy Societies webserver.....
Lois: If you just want to get the bare text out of a Word doc, the Unix ‘strings’ command can be useful. At least on the few that I’ve tried.
natwest.com sortout (collapse newlines option) : <p><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br> <p>If you have remained on this page ...
Access gateway bug (embedded stylesheets need URL redirection!) (plus mention it in the presentation)
Feb 9 09:45:24 ssb22 /usr/sbin/imgserver: Error 404 on URL “head”
pc358.nmus.pwf.cam.ac.uk - - [09/Feb/2001:09:45:22 +0000] “GET /cgi-bin/access?Ac=A&Au=http://perch.tripod.co.jp/ HTTP/1.1” 200 11888 “http://ssb22.joh.cam.ac.uk/cgi-bin/access?Ac=A&Au=http://www.nsknet.or.jp/~m-saito/index.htm” “Mozilla/4.0 (compatible; MSIE 5.01; Windows NT)”
<ul style=”line-height: 150%; list-style-image: url(’images/headline01.gif’); margin-left: 35%”>
cssa: can we pick up on these and get down to one word? news/news about/about activity/activity
lynx -source http://www.embjapan.org.uk/viewer.html|grep japan2001 was wrong (emailed the webmaster Feb9)
Might be message drift - check carefully (incl. help.htm)
This hit the “invalid ISO designator?” thing but it was a space encoded in ISO-7 or something 27 44 65 32 27 40 66 Esc , A space Esc ( B
Also: “2000-11-22: NEEDATTENTION” (stuff that may break Shift-JIS & UTF-8)
Before “alternative base URL”: “Preferred image style” (default, Simplified, Traditional, Korean) (if MULTIPLE_STYLES_SUPPORTED) ENV_PREFERRED_STYLE needs documenting and adding to the UI For Traditional, Aeus=t NB a blank value is OK (default)
ALLOW_USER_PROXY_SETTING maybe ?
Bug: body onLoad doesn’t get executed on “enable scripts” (since body is replaced) See also NEEDATTENTION in access.c++ / “BODY” re colour override (mixing author’s and user’s)
asahi.com: Images without HEIGHT & WIDTH causes Netscape to load *all* images before displaying any of the page
Check monash & japan2001 img server stats from time to time
imgserver Might have been an alarm clock - socket had been registered to listen, so OS accepted it, but waiting for it to get back to select() Blocking write etc?
watch the japan2001 imgserver
Your home directory was unavailable (due to a server upgrade), hence all the messages. You invoke a cron job once every minute, so your home directory was probably inaccessible to Nexus for 142 minutes.
Your “cron” job on nexus ./isitup localhost || (pkill imgserver ; ulimit -n 1024; ./imgserver)
produced the following output:
Alarm Clock Terminated
(is it “isitup” that does this?) yes, 10sec timeout (Does it get stuck anywhere? gdb???)
Sometimes runs out of quota Compress the data file ? (zcat) (careful...) (or include portable decompression source..)
server.c++ HTTP/1.1 pipelining (o/p buf retry, watch max size [but could just drop connection when sent current lot], etc) (How many browsers/proxies/etc implement this anyway?) (IE *might*)
DONE added expires and last-modified (does make a difference!)
Ignore net/khttpd (buggy & kernel crash!)
Got “ab” - Apache HTTP server benchmarking tool
/usr/sbin/ab -k -t 60 -c 10 Image server: Requests per second: 249.74 Transfer rate: 65.45 kb/s received Apache: Requests per second: 1045.35 Transfer rate: 3452.46 kb/s received
50 times faster !? Get a profile !
/usr/sbin/ab -k -t 60 -c 10 http://ssb22.joh.cam.ac.uk:7080/t/6211.gif /usr/sbin/ab -k -t 60 -c 10 http://ssb22.joh.cam.ac.uk/
From flevit: Image server: Requests per second: 74.82 Transfer rate: 19.62 kb/s received Apache: Requests per second: 53.47 Transfer rate: 176.63 kb/s received
Still 10 times higher transfer rate, but requests/sec not much higher (other thing could be a localhost thing)
(add other gifs 1st; get through Cam proxy; transformations; remember decompress)
See “Unicode” section re getting them
Unicode imgs - they’re proportional!
zcat -f /var/log/syslog*|grep “Error 404”|sed -e “s/.*URL \”//” -e “s/\”//”|sort|uniq
Chinese stuff: COULD get it from TeX, if can find a way of auto-cropping the PostScript & cnvt to a bitmap format
Unicode has now gone beyond 16-bits (slides need update)
gateway & unicode (multiple “spellings” of accent-add etc) “Filesystem case-sensitivity (was Re: Picking up hermes mail)” on ucam.comp.linux
[but might be post-Unicode 3.0]
20000..2A719 : 42,778 : CJK Unified Ideographs, Extension B (These constitute all remaining unencoded ideographs from the Kangxi Dictionary, the Han Yu Da Zidian, a set of 6356 characters from Japan, 908 Hong Kong government characters, 169 characters from Korea, 29,794 characters from TCA in Taiwan, and 4050 characters from Vietnam.) : 00-Feb-02 Accepted : 00-Sep-25
etc
About the Online Code Charts
These charts are provided as a convenient online reference to the character contents of the Unicode Standard, Version 3.0 but do not provide all the information needed to fully support individual scripts using the Unicode Standard. Proper Unicode support requires considerably more than providing glyphs for characters, and requires consulting the Unicode Standard and the Unicode Technical Reports.
You may freely use these code charts for personal or internal business uses only. You may not incorporate them into any product or publication, or otherwise distribute or archive them without express written permission from the Unicode Consortium.
The information on these pages may be update from time to time. The Unicode Consortium is not liable for errors or omissions in these charts or the standard itself.
Blocks
The Unicode Standard divides its codespace into a number of blocks.
The chart index contains a table of most of the blocks; missing are blocks of unassigned characters, and blocks of characters with no visual representation such as the surrogate blocks and private use area. You can also go to a full character chart for each block (except for the Han ideographs and Hangul syllables).
Fonts
The fonts used in these charts were provided to the Unicode Consortium by a number of different font designers. Note that the glyphs in these charts are only representative; there can be wide variation in the glyphs used to represent any particular character, as discussed in the standard.
SOME mapping tables (Windows): http://oss.software.ibm.com/icu/charset/ Also ftp://ftp.unicode.org/Public/MAPPINGS/
You may embed references to the glyph images on the Unicode site in your own web pages. For example, to display a Euro sign (U+20AC) you can use the following HTML:
<IMG SRC=”http://charts.unicode.org/Glyphs/20/U20AC.gif”>
The subdirectory to use within the Glyphs/ directory is the first two hexadecimal digits of the Unicode code point. The set of glyphs available covers all of Unicode 3.0 with the exception of Han ideographs and Hangul syllables. However, you should only make occasional use of these glyphs. If there is too much web traffic the Unicode Consortium may be forced to discontinue this service.
(see source of http://www.unicode.org/charts/web.html for codepoints)
http://charts.unicode.org/unihan/unihan.acgi$0x4E95 (generates links to cached images; not permanent; but URLs quite regular so hit the main page first and then get the cached images, only if haven’t already got the image) 3400-9FFF and F900-FAFF ftp://ftp.unicode.org/Public/UNIDATA/Unihan.txt
http://www.cl.cam.ac.uk/~mgk25/ucs-fonts.html xmbdfed (package installed) o Export of XBM files from glyph bitmap editors. (well, can export to HEX, which can probably be converted) but mgk25’s fonts are wrong sizes http://czyborra.com/unifont/
Oh dear, this leaves the gateway: http://www.askntl.com/adverts/adverts.asp?url=/telephone/great-value-calls/default.asp&image=/adverts/468by60/phone-bill.gif
(Can we start with a HEAD request if we’re using own code? What about the overhead of having to re-connect if no keep-alive? etc)
kingston.com mirrors navigation:
<select name=”site” size=1 onChange=”javascript:formHandler()”> <option selected value=””>Worldwide sites <option value=”http://www.kingston.com/sproot/”><font size=”1” face=”verdana, arial”>Argentina</a></option> <option value=”http://www.kingston.com/germany/”><font size=”1” face=”verdana, arial”>Austria <option value=”http://www.kingston.com.br/”><font size=”1” face=”verdana, arial”>Brazil</a></option> <option value=”http://www.kingston.com/sproot/”><font size=”1” face=”verdana, arial”>Chile</a></option> <option value=”http://www.kingston.com/denmark/”><font size=”1” face=”verdana, arial”>Denmark <option value=”http://www.kingston.com/europe/”><font size=”1” face=”verdana, arial”>Europe <option value=”http://www.kingston.com/finland/”><font size=”1” face=”verdana, arial”>Finland <option value=”http://www.kingston.fr/”><font size=”1” face=”verdana, arial”>France</a> <option value=”http://www.kingston.com/germany/”><font size=”1” face=”verdana, arial”>Germany</a></option> <option value=”http://www.kingston.com/ukroot/”><font size=”1” face=”verdana, arial”>Ireland</a></option> <option value=”http://www.kingston.com/israel/”><font size=”1” face=”verdana, arial”>Israel</a></option> <option value=”http://www.kingston.com/italy/”><font size=”1” face=”verdana, arial”>Italy</a></option> <option value=”http://www.kingston.co.jp/”><font size=”1” face=”verdana, arial”>Japan</a></option> <option value=”http://kingston.softbank.co.kr/”><font size=”1” face=”verdana, arial”>Korea</a></option> <option value=”http://www.kingston.com/sproot/”><font size=”1” face=”verdana, arial”>Latin America</a></option> <option value=”http://www.kingston.com/sproot/”><font size=”1” face=”verdana, arial”>Mexico</a></option> <option value=”http://www.kingston.com/nl/”><font size=”1” face=”verdana, arial”>Netherlands</a></option> <option value=”http://www.kingston.com/norway/”><font size=”1” face=”verdana, arial”>Norway</a></option> <option value=”http://www.kingston.com/spain/”><font size=”1” face=”verdana, arial”>Spain</a></option> <option value=”http://www.kingston.com/sweden/”><font size=”1” face=”verdana, arial”>Sweden</a></option> <option value=”http://www.kingston.com/germany/”><font size=”1” face=”verdana, arial”>Switzerland</a></option> <option value=”http://www.kingston.com/ukroot/”><font size=”1” face=”verdana, arial”>United Kingdom</option> <option value=”http://www.kingston.com/sproot/”><font size=”1” face=”verdana, arial”>Uruguay</a></option> </select>
line spacing etc (stylesheets? gateway “spacing” button?? with text explaining it’s only CSS-aware browsers) P {word-spacing: 10px} P {letter-spacing: 5px} P {line-height: 12pt}
“Access” oops: [Access Systems America] - http://www.access-us-inc.com/ Provider of a microbrowser which is used in many I-Mode devices.
Showcase of Japanese Keitai Culture http://ssb22.joh.cam.ac.uk/cgi-bin/access?Ac=@&Aeck=PREF%3DID%3D4ce132f816f47144:TM%3D982953159:LM%3D982953159%2C.google.com&Au=http://nooper.co.jp/showcase/%3Fl%3Den
The HTTP User-Agent: header identifies an i-Mode browser with a string something like DoCoMo/1.0/F50i for the older 501 models, and something like DoCoMo/2.0/F502i/c10 for the newer models. The first part of the string says DoCoMo indicating that it is an i-Mode client. The next part indicates the supported HTML version number. The third part indicates the device model number The fourth part, only available on certain 502 models, indicates the current cache size. As with WAP devices, an i-Mode device can only accept a certain amount of data in one go. The number is in kilobytes, and the default size is 5KB. [Does this include the images??] Screen size is very small, usually no more than 16 (English) characters by 6-8 lines. <HTML> <HEAD> <TITLE>Main MENU</TITLE> </HEAD> <BODY> <FONT COLOR=RED>Main MENU</FONT> <BR> <IMG SRC=ad_small.gif ALIGN=RIGHT> <A HREF=new.tcl ACCESSKEY=”1”>News</a> <BR> <A HREF=addr.tcl ACCESSKEY=”2”>Directory</a> </BODY> </HTML> The ACCESSKEY attribute of a hyperlink provides one-key access to select and follow the URL, from the phone’s numeric keypad. [but is the number included?] [Don’t have to include this - the recommended implementation does it by default] The i-Mode phone terminals do not support HTTP Cookies at this time. Note that on i-Mode phones, the password field in the HTTP authentication dialog box which pops up only supports entry of numeric passwords. Authorization: header is present. If not, it issues a WWW-Authenticate challenge WWW-Authenticate: Basic realm=”ACS_iMode” return 401 “text/html; charset=shift_jis” “please login” (or “incorrect login”)
For compressing HTML (& removing unwanted tags), see http://www.w3.org/TR/1998/NOTE-compactHTML-1998 (table in Appendix A of supported tags & attribs) Images can be nightmarish (esp. large ones; transferred & scaled down) Please ensure each page uses less than 5KB of data volume. (Depending on the tags being used, some pages cannot be displayed even though they contain less than 5KB of data.) We recommend a data volume per page of less than 2KB. The maximum length of a character string is 200 bytes after URL encoding. The maximum length of a URL that can be input directly is 100 bytes. The maximum length of a URL that can be added to the bookmark list is 100 bytes. The maximum length of the title of a page/bookmark is 24 bytes.
i-mode users are responding to banner ads and e-mail advertising to a far greater extent than standard Web users. I-mode - the “i” is for information
There is a basic data charge per packet, 0.3 YEN (approx. US-cent 0.3) per data packet transmitted of 128 byte. As an example, looking at the basic imode-Menu, the standard DoCoMo welcome screen or user interface, will set you back about 2.7 YEN (i.e. approx. US-cent 2.7). There are no connection time charges for imode. In addition there are other charges for using email and for premium subscription services.
imode emails have to be shorter than 250 Kanji (double byte characters), or shorter than 500 Roman Characters (single byte characters) The default email address of imode users is 090xxxxxxxx@docomo.ne.jp, where “090xxxxxxxx” is the mobile telephone number.
For example  is an icon of a sun shining.
SJIS+imgs; Remove all images (ALT?); disable status line scripts; don’t add “end of web page”; don’t put [ ]; don’t show date stamp. Also: Don’t add TITLE= to any HR; don’t add META tags; compact space; ’ to ‘; compress the options; MAYBE compress Au= in some other way as well (besides removing http://); remove things like <b> <i> etc that are not supported (don’t use colours instead - it will drive the size up)
geometry is really 16x7, but lynx margins take up 4 more lines xterm -geometry 16x11 -e lynx -nocolor -nopause -noreverse -nounderline (formatting can be bad) Some phones have 20x8 (10 kanji) xterm -geometry 20x12 -e lynx -nocolor -nopause -noreverse -nounderline (formatting can be wrong, e.g. centre etc)
gateway.bugs: <DIV ID=”incoming” STYLE=”display:none”> (means don’t display; stripping the STYLE will cause it to do so. + don’t strip content if JavaScript enabled.) (Do we really want to take out this text though? But at least count it as a banner? Option???)
lynx -trace: outputs stuff to a file called Lynx.trace
—————————827779986791670271271312593 —————————–827779986791670271271312593
in cgilib.c++ CGIEnvironment::tryDecodingMultipart() See all **** stuff esp. boundary
CONTENT_TYPE=multipart/form-data; boundary=—————————10617267281005157210847669114 CONTENT_LENGTH=4339 Input: —————————–10617267281005157210847669114 Content-Disposition: form-data; name=”iconid”
5 —————————–10617267281005157210847669114 Content-Disposition: form-data; name=”message”
test —————————–10617267281005157210847669114 Content-Disposition: form-data; name=”A1attachment”; filename=”codepoints.html” Content-Type: text/html
Or: Content-Disposition: form-data; name=”A1attachment”; filename=”random_seed”
P....\n
aftr don’t store remote session IDs, have “store remote session IDs even across servers” (default No)
Cookies: Need to default the domain (not to everything!) when setting (although this would increase the size of the URLs...)
Note: %26 (&) and %3D (=) seem to occur a lot in the cookie - better % compression system ? (%m for aMpersand and %q for eQuals ? Somehow code all ASCII that must be %-escaped?) (watch we don’t send this fancy stuff to remote servers!) How did the cookies get so big anyway? The problem seems to be unique to Yahoo
Edit the cookies on the form?? Gateway cookies: Should be OK, because not getting *image* cookies. 4K URL limit is a worry! (when user not supporting cookies) (Temp: clear all cookies when reaches maximum size? cut down?) (really carry cookies when no longer browsing their source domain? e.g. search engine cookies)
All FORMS: METHOD=”post” (careful; some browsers put warnings up)
border=0 css ? (can it be done? Which browsers need border=0, do they all have CSS support, etc) check “s
(see localusr.c++) NB insecure etc (unless using SSL, and even then, watch Location box, cache, history, etc)
gateway proper framesets (gateway.bugs? fair amnt of coding) (but jp etc) If a certain var is present, instead of charset, URL box, date stamp, etc (or rewind once done), have “[Expand this frame]” (no BR) Or just call the string [Options] It links to the page with the var clear & target=_top Put var in when doing a FRAMESET Keep it when doing a link iff name is not _top (& it’s already present) (may still fail if new NAMEs for new windows, but not to worry - cn still get a “expand this frame”)
Plugin: file.swf [Enable plugins] [Extract text] (& links; use swf code if necessary) <P>(plugin: jsb.mid [download] [activate plugins] [hide plugins])</P>
Old notes (might no longer need them) -
<EMBED src=”$FILENAME$” width=$WIDTH$ height=$HEIGHT$ type=”application/x-Sibelius-Score” alt=”$FILENAME$” codebase=”http://www.sibelius.com/cgi/plugin.pl” pluginspage=”http://www.sibelius.com/cgi/plugin.pl”>
codebase and pluginspage should now be substituted Netscape: Takes codebase and goes ?application/whatever, ignores pluginspage, changes msg to “click here after installing”. How do you get (or prevent) the adverts window?
Chinese table problem Chinese table in Japanese?
Have commented out the pinchATable(“Cp33722”,”IBM eucJP/5050”,f,0); - REALLY returns max bytes =3 (in EUC) Need better decompilation Cp964 (AIX TW) really is 4 bytes max; need to sort out & comment back in
Need to sort out //if(neverBelow127) throw(new IOException(“Didn’t expect neverBelow127 to be true here”));
TEContainer.h: Implement void setAutoDetectMimeCharsetIfCharsetIsAppropriateToLanguage(const char* mimeDesignator) {};
LOG the charsets (and the detected results) as people use web pages?
Need to find official list, really
Some of these may be MIME charsets: iso-8859-1 Shift_JIS big5 gb2312 euc-kr euc-jp windows-1250 windows-1251 windows-1253 iso-8859-9 utf-8 x-mac-roman x-mac-ce ks_c_5601-1987 ? x-gb2312-11 x-euc-tw x-cns11643-1 x-x-big5 ...
HZ-GB-2312
o iso-2022-jp (see Section 3.1.3) o iso-2022-jp-2 (see Section 3.1.3) o iso-2022-kr (see Section 3.1.4) o iso-2022-cn (see Section 3.1.5) o iso-2022-cn-ext (see Section 3.1.5) o iso-8859-1
ISO- ? -[0-9]?
- UCS-2 0x6F22 0x5B57 - UCS-4 0x00006F22 0x00005B57 UCS-2: FEFF, also escape sequences (Level 3 = supports all characters) UCS-2 Level 1 <ESC> % / @ 0x1B252F40 162 UCS-2 Level 2 <ESC> % / C 0x1B252F43 174 UCS-2 Level 3 <ESC> % / E 0x1B252F45 176
JIS X 0221-1995 == ISO 10646-1:1993 (based on Unicode 1.1)
See ftp://ftp.tiu.ac.jp/jis/ re JIS X 0213-199X etc Also get ISO sequences for all the other encodings (+ MIME charset etc)
MISSING: <OPTION VALUE=”Cp1125”>Ukraine: IBM PC</OPTION>
EXTRA: pinchATable(new CharToByteCp856());
Cp33722 and Cp942 need Yen substitution
<OPTION VALUE=”Cp037”>Misc: CP 037</OPTION> <OPTION VALUE=”Cp437”>Misc: DOS 437</OPTION> <OPTION VALUE=”Cp850”>Misc: DOS Latin-1</OPTION> <OPTION VALUE=”Cp500”>Misc: EBCDIC 500V1</OPTION> <OPTION VALUE=”Cp1046”>Misc: IBM EBCDIC</OPTION> <OPTION VALUE=”Cp285”>Misc: IBM UK</OPTION> <OPTION VALUE=”8859_1”>Misc: ISO 8859-1</OPTION> <OPTION VALUE=”8859_2”>Misc: ISO 8859-2</OPTION> <OPTION VALUE=”8859_3”>Misc: ISO 8859-3</OPTION> <OPTION VALUE=”8859_4”>Misc: ISO 8859-4</OPTION> <OPTION VALUE=”8859_9”>Misc: ISO 8859-9</OPTION> <OPTION VALUE=”MacDingbat”>Misc: Macintosh Dingbat</OPTION> <OPTION VALUE=”MacRoman”>Misc: Macintosh Roman</OPTION> <OPTION VALUE=”MacSymbol”>Misc: Macintosh Symbol</OPTION> <OPTION VALUE=”Cp1252”>Misc: Windows Latin-1</OPTION>
// Need to add more encodings
// See ftp://unicode.org/pub/MappingTables/
Cp936 is GB2312 with some corrections
charset=HZ-GB-2312 (NB One class may have several charsets)
/* Two-byte ISO codes: JIS X 0208-1990: Esc & @ before JIS X 0208-1983 @ = JIS C 6226-1978 DONE A = GB2312 DONE B = JIS X 0208-1983 DONE C = KSC5601 D = JIS X 0212-1990 E = ISO-IR-165:1992 (Lunde: “ISO-IR-165:1992 can be considered a superset of GB 2312-80, GB 6345.1-86, and GB 8565.2-88” [and more - ssb]) DONE G-M = planes for CNS11643 (1-7)
Problem: ¦Y¹M¬+=¯ª+)W¡A´s¹Mªe´=¤s¤t¡A+ª¹M¸gº·¨s-y (is that GB2312?) - access needs to decode whole table!
Look at http://www.fontlab.com/download.htm (check legal permissions)
http://www.unicode.org/img/CJKtr8/UFA2D.gif
CJK images + readings: Use all the data on http://charts.unicode.org/unihan/unihan.acgi$0x4E95 and other stuff (takes a long time to get it) See the HTML & reverse-engineer?
Complete text file ftp://ftp.unicode.org/Public/2.0-Update/UNIHAN.TXT (but still need the different imgs)
ftp://ftp.unicode.org/Public/UNIDATA/UnicodeData-Latest.txt has the “LATIN-CAPITAL-LETTER-D” etc text
Blocks index http://charts.unicode.org/Unicode.charts/normal/Unicode.html
See mapping tables checkConverter - expand to USE the other mappings
void setAutoDetectMimeCharsetIfCharsetIsAppropriateToLanguage(const char* mimeDesignator) {}; has not been implemented (te-container.h) Eg. WINDOWS-1251 (may not be exactly what’s stored) (And there’s a “not yet implemented anyway” in literals.h)
Unihan database also has definitions
Norway needs a different table from Denmark Also Finnish & Sweden
“Decoding prior to native decoding” thing: Problem if HTML sequences are *MEANT*! Decode only if chars in this range are always in HTML?
Have a “Language for messages” button?
NB Greek&Russian may be in most CJK, also Jp in C and K, also C in CJK; also UTF-8 etc (especially Chinese)
JIS no ‘escape’ thing? (ie. take $B as Esc $B etc)
Stuff to check: // NEEDATTENTION Check the following! if(!shiftOutRequired) isoEncodingInUse=-1; // So resets immediate ones at end of line
// *** Need to sort out misc folder! // Sort JIS folder
ssb22:/tmp/brian$ export RSYNC_RSH=ssh
ssb22:/tmp/brian$ rsync -v silas@brian.accu.org:access/platform.h . (needs password)
Packages rsync sftp
<EMBED SRC=”jsb.mid” HIDDEN=true AUTOSTART=true>
Remove (or don’t) HIDDEN and AUTOSTART -> Allow background music to start automatically
Images: Give the button text as images before the button! Also need to write SELECT as RADIO
problemExtentions[] could be more elegant / less storage etc
Spam trap: Would it be better with sleep?
<HTML lang=”fr”> <EM lang=”ja”>some Japanese</EM> <P lang=”es”>...Interpreted as Spanish... <P>...Interpreted as French again...
<ABBR title=”Idaho”>ID</ABBR> <ACRONYM title=”World Wide Web”>WWW</ACRONYM> (have an acronyms dictionary?)
Black = #000000 Green = #008000 Silver = #C0C0C0 Lime = #00FF00 Gray = #808080 Olive = #808000 White = #FFFFFF Yellow = #FFFF00 Maroon = #800000 Navy = #000080 Red = #FF0000 Blue = #0000FF Purple = #800080 Teal = #008080 Fuchsia= #FF00FF Aqua = #00FFFF
In the near future, browsers will display grouped lists with expanding and collapsing levels of detail. To group items, use the OPTGROUP element (with the SELECT element). For example:
<FORM action=”http://somesite.com/prog/someprog” method=”post”> <P><SELECT name=”ComOS”> <OPTGROUP label=”Comm Servers”> <OPTGROUP label=”PortMaster 3”> <OPTION label=”3.7.1” value=”pm3_3.7.1”>PortMaster 3 with ComOS 3.7.1 <OPTION label=”3.7” value=”pm3_3.7”>PortMaster 3 with ComOS 3.7 <OPTION label=”3.5” value=”pm3_3.5”>PortMaster 3 with ComOS 3.5 </OPTGROUP> <OPTGROUP label=”PortMaster 2”> <OPTION label=”3.7” value=”pm2_3.7”>PortMaster 2 with ComOS 3.7 <OPTION label=”3.5” value=”pm2_3.5”>PortMaster 2 with ComOS 3.5 </OPTGROUP> </OPTGROUP> <OPTGROUP label=”Routers”> <OPTGROUP label=”IRX”> <OPTION label=”3.7R” value=”IRX_3.7R”>IRX with ComOS 3.7R <OPTION label=”3.5R” value=”IRX_3.5R”>IRX with ComOS 3.5R </OPTGROUP> </OPTGROUP> </SELECT> </FORM>
The new FIELDSET element groups form controls while the LEGEND element labels each group. For example,
<FORM action=”http://somesite.com/adduser” method=”post”> <FIELDSET> <LEGEND>Personal information</LEGEND> <LABEL for=”firstname”>First name:</LABEL> <INPUT type=”text” id=”firstname” tabindex=”1”> <LABEL for=”lastname”>Last name:</LABEL> <INPUT type=”text” id=”lastname” tabindex=”2”> ...more personal information... </FIELDSET> <FIELDSET> <LEGEND>Medical History</LEGEND> ...medical history information... </FIELDSET> </FORM>
Give each frame a title
IFRAME as well as FRAME
Provide alternative text for all image submit buttons
<INPUT TYPE=”image” SRC=”bobbylogo.gif” ALT=”The bobby logo” WIDTH=200 HEIGHT=200>
Option for button with show URL
Cache-Control: no-cache Pragma: no-cache Expires: 0
<META HTTP-EQUIV=”Window-target” CONTENT=”_top”>
Options: Content-language: en-GB Window-target: _top
<META HTTP-EQUIV=”Set-Cookie” CONTENT=”cookievalue=xxx;expires=Friday, 31-Dec-99 23:59:59 GMT; path=/”>
Can we put <PRE> around text/plain ?
export LS_COLORS=” alias ls=”ls –color=auto” export PS1=”\h:\W\\$ “ (caps W for last part of dir only)
COMPILER_USES_LSB_MSB_INTS: Perhaps create alternative versions of data files for other compilers, add to installation instructions (plus a test) Haven’t tested that it produces the same output!
Sort L_NO_FREQTBL out
Japanese frequency table!
arrows consisting of dashes and greater-than signs –> etc
Korean: What about ISO-2022-KR and EUC-KR? And ISO646? QP? How do they relate to KS-C-5601?
Unreproducable bug report - Korean pages looking like Japanese - suspecting wrong language selection
The link to the gateway’s home page should probably go through the gateway (but what if the installation is not working?) Also help.htm links (and it needs more processing).
favicon.ico redirect to loc of original page ???
gateway: If <FORM> and </FORM> does not match in “banner”, DO NOT MOVE IT!!! (eg. http://access.adobe.com/simple_form.html)
HTML4 forms can have “disabled” controls - option to remove them?
Document AecL (background hover colour) & link into options (NB say “(read help)”) uses css In some browsers (e.g. some versions of Konqueror), you have to also select “Don’t add status line code to links” (under the Options button) for this to work.
- with larger selection of colours? (how organised? rows?) or some sort of selector?
<p> lots of times is left intact
Error: Failed to find help text for option AeI
Error: Failed to find help text for option Aefn~ssb22/mytest
email providers PRE, NOBR (zh chars). Also TEXTAREA
“=on” in the checkbox options can just be “=” in the links, nothing in the cookies, and “value=1” in hidden form options
gateway bug: http://dmoz.org/cgi-bin/add.cgi?where=Computers/Multimedia/Music_and_Audio/Software/Composition/Fractal_and_Generative cuts the banner in the middle of a FORM, resulting in the SUBMIT button being invisible in Netscape
sometimes detects PC HK/TW rather than Big5 - no great problem (a few symbols don’t display, e.g. cdot (u+2022) sometimes rendered as u+2027 and image not available). Might want some kind of detection bias but it won’t be easy (really want a fuzzy logic system of some sort)
gateway sig-11 faults in strlen in HttpHeader::readHttpEquivs() when document HEAD has the following bogus tag:
<meta http-equiv=Content-typecontent=”text/html; charset=utf-8”>
(i.e. if there is a missing space before ‘content’)
All material © Silas S. Brown unless otherwise stated. Apache is a registered trademark of The Apache Software Foundation. Javascript is a trademark of Oracle Corporation in the US. Mozilla is a registered trademark of The Mozilla Foundation. PostScript is a registered trademark of Adobe Systems Inc. TeX is a trademark of the American Mathematical Society. Unicode is a registered trademark of Unicode, Inc. in the United States and other countries. Unix is a trademark of The Open Group. Windows is a registered trademark of Microsoft Corp. Any other trademarks I mentioned without realising are trademarks of their respective holders.