Opened 9 years ago

Last modified 3 years ago

#623 new defect

Shares not working properly with out-of-ASCII characters

Reported by: anonymous Owned by: quinox
Priority: normal Milestone: Release 1.3.0
Component: nicotine Version: SVN
Keywords: Cc:

Description

I have an interesting problem. I have a few folders/files containing out-of-ASCII characters in their names (fe. Strålar). If I build my shares, I get errors about the files (or whole folders) beeing dropped as nonexistent and the names in console are displayed as undecoded UTF-8 (two characters for the double byte characters). The folders/files also don't shown when I browse my shares.

Now, the interesting thing is, that some people are actually able to see those files and even download them without any problems (and the names display correctly in the upload window).

System: Windows 7, Python 2.6.5, GTK 2.20.1, Pango 1.28, PyGTK 2.17.1, latest SVN version of nicotine, running from source

Change History (9)

comment:1 Changed 9 years ago by anonymous

I'm a bit fuzzy on the filename logic on Windows.

To get to the matter of things, your FS uses Latin-1/8859-1 and your files have characters from this codepage like Strålar, you can view these filenames without problems in Internet Explorer?

If so it sounds fixable, somewhat, somehow.

comment:2 Changed 9 years ago by anonymous

My system (Czech language) is using the windows-1250 codepage (Linux equivalent would be iso-8859-2, but they are not fully compatible). But NTFS uses Unicode (UTF-16) to store names of files on disk. So in filenames, I can use any Unicode character and it will display correctly in explorer/internet explorer/even firefox, provided that I have a font that supports it.

I actually found something about it, don't know if it's relevant, but I will post it anyway:

Windows NT/2000/XP always write filenames to the the underlying filesystem as Unicode. So in theory, Unicode filenames should work flawlessly with Python.

Unfortunately, win32 actually provides two sets of APIs for interfacing with the filesystem. And in true Microsoft style, they are incompatible. The two APIs are:

A set of APIs for Unicode-aware applications, that return the true Unicode names. A set of APIs for non-Unicode aware applications that return a locale-dependent coding of the true Unicode filenames.

Python (for better or worse) follows this convention on win32 platforms, so you end up with two incompatible ways of calling os.listdir() and open():

When you call os.listdir(), open(), etc. with a Unicode string, Python calls the Unicode version of the APIs, and you get the true Unicode filenames.

(This corresponds to the first set of APIs above).

When you call os.listdir(), open(), etc. with a non-Unicode string, Python calls the non-Unicode version of the APIs, and here is where the trouble creeps in. The non-Unicode API's handle Unicode with a particular codec called MBCS.

MBCS is a lossy codec: Every MBCS name can be represented as Unicode, but not vice versa. MBCS coding also changes depending on the current locale. In other words, if I write a CD with a multibyte-character filename as MBCS on my English locale machine, then send the CD to Japan, the filename there may appear to contain completely different characters.

comment:3 Changed 9 years ago by anonymous

Excellent, that should help me quite a bit. I might take a look at it this weekend

comment:4 Changed 9 years ago by diaspar

Seems like duplicate of #345

comment:5 Changed 9 years ago by anonymous

It does sound like it's a dupe, I'll keep both open and when I fixed this one I'll see if the other one can be closed as well

comment:6 Changed 9 years ago by diaspar

Would be great. This annoys me since 1.2.10 came out.

comment:7 Changed 9 years ago by quinox

I managed to reproduce the problem on my Windows machine, and I think I've fixed it.

I've added another Windows-specific function with r1418 to deal with filelist building. It's a big ugly since it converts strings to unicode and back, but I couldn't use 100% unicodes because the rest of N+ isn't built for it, so this will have to do for now.

  • Do SVN up and perform a rescan. The out-of-ASCII directories shouldn't be dropped and they should show up in your filelist
  • If it still doesn't work please remove the *.db files from %APPDATA%/nicotine/ and try again. (Because of the dropping not all DB files contain information about the same files/directories)

The directory should stick around even after restarting N+ etc.

comment:8 Changed 9 years ago by anonymous

Confirming fixed, the files show up correctly (even the ones that use Japanese letters and Russian azbuka). The output in the console still doesn't display correctly (I mean that 2U. Adding ...), but I can live with that, it's probably caused by Microsofts stupid decision to use different codepages for GUI and CLI.

But now i get this error with the new version:

Traceback (most recent call last):
  File "C:\Python26\nicotine\pynicotine\gtkgui\userbrowse.py", line 292, in OnFo
lderClicked
    self.OnDownloadDirectory(widget)
  File "C:\Python26\nicotine\pynicotine\gtkgui\userbrowse.py", line 508, in OnDo
wnloadDirectory
    self.DownloadDirectory(self.selected_folder)
  File "C:\Python26\nicotine\pynicotine\gtkgui\userbrowse.py", line 597, in Down
loadDirectory
    node = self.DirStore.on_get_iter(dir)
  File "C:\Python26\nicotine\pynicotine\gtkgui\uglytree.py", line 209, in on_get
_iter
    for i in path:
TypeError: iteration over non-sequence

Is it caused by the changes or should I create a new bug for it.

comment:9 Changed 3 years ago by gfarmerfr

Milestone: Release 1.2.16Release 1.3.0
Note: See TracTickets for help on using tickets.