Nmap Development mailing list archives

Non-ASCII user names on Windows; internationalization fun


From: David Fifield <david () bamsoftware com>
Date: Wed, 10 Sep 2008 14:29:22 -0600

On Mon, Sep 08, 2008 at 01:07:28PM -0600, David Fifield wrote:
This morning after the 4.75 release my email contained a bunch of new
automatic Zenmap crash reports, all for the same crash:

'utf8' codec can't decode byte 0x9c in position 28: unexpected code byte
http://sourceforge.net/tracker/?group_id=142490&atid=752647&set=custom&_assigned_to=1897047&_status=1&_category=100&_group=100

This has been hitting a lot of people--there are 65 essentially
identical crash reports since Monday. Luckily I have access to Windows
now so I committed a series of changes that fixes it. What follows is a
summary of some related issues, which will be interesting to you if
you're into internationalization, character encodings, and that kind of
thing.

After I fixed the "can't decode byte" bug I found a related problem. In
my tests I created an account with the user name "Kurt Gödel". Zenmap
instructs Nmap to create a temporary XML file in a temporary directory.
For this account that directory was c:\docume~1\kurtgö~1\locals~1\temp\.
Therefore the XML written by Nmap was

<?xml version="1.0" ?>
<?xml-stylesheet href="nmap.xsl" type="text/xsl"?>
<!-- Nmap 4.75 scan initiated Wed Sep 10 11:16:58 2008 as: nmap -T4 -F -oX 
c:\docume~1\kurtgö~1\locals~1\temp\zenmap-bcbuy6.xml 192.168.0.1 -->
<nmaprun scanner="nmap" args="nmap -T4 -F -oX c:\docume~1\kurtgö~1\locals~1\temp\zenmap-bcbuy6.xml 192.168.0.1" 
start="1221067018" startstr="Wed Sep 10 11:16:58 2008" version="4.75" xmloutputversion="1.02">

Nmap doesn't declare an encoding in its XML output, so the encoding is
presumed to be UTF-8 (http://www.w3.org/TR/REC-xml/#NT-EncodingDecl).
But the ö character was written as a raw \xf6 byte, which is not UTF-8.
Both Zenmap and Internet Explorer refused to open the file, and rightly
so, because this is a fatal error in XML.

So I modified xml_convert to escape any character whose value is greater
than 0x7F. That makes sure its encoding is ASCII and therefore UTF-8. I
think that's the best we can do when handling strings that have an
unknown encoding, such as command line arguments.

So now Kurt Gödel can run Zenmap and run scans, and save things to the
database as expected. I still get an error message (not a crash) when
trying to save results to C:\Documents and Settings\Kurt Gödel, because
the gtk.FileChooserDialog returns file names in UTF-8 on Windows, not in
the file system encoding. I'm still looking into that.

I originally tried creating an account with the name "Paul Erdős".
That's an o-double-acute, a bit more exotic than an o-umlaut. Windows
created two home directories! "Paul Erdős" and "Paul Erdos". The two
directories were like mirrors of each other: if I created a file in one
it would appear in the other. My guess is that perhaps Windows created
the ő with a combining character that sometimes gets ignored. Apart from
these strangenesses Zenmap appeared to work fine.

I decided to press my luck and created an account with the user name
本因坊秀策. (Those are Japanese kanji. I hope they don't get mangled by
mail readers or the mailing list archiver.) With this account Zenmap
gives an error message at startup saying it can't create the user
configuration directory. The directory it's trying to create is
C:\Documents and Settings\?????\.zenmap. That's not your mail reader,
those are five literal question mark characters. That comes straight out
of the HOME environment variable, and I don't know what to do about it.
Python is perfectly capable of reading the home directory's file name:
os.listdir(u"C:\\Documents and Settings").

The Windows I'm trying this on is on a FAT32 partition. Does anyone know
if using NTFS would have any change? If someone wants to try it, it's
pretty easy. Install the East Asian language files (under Regional and
Language Options in the Control Panel), create an account, and copy and
paste something from Wikipedia in the user name field.

David Fifield

_______________________________________________
Sent through the nmap-dev mailing list
http://cgi.insecure.org/mailman/listinfo/nmap-dev
Archived at http://SecLists.Org

Current thread: