Opened 9 years ago

Last modified 3 years ago

#667 new defect

Nicotine cannot read some files

Reported by: Frederick IV Barbarossa Owned by: quinox
Priority: normal Milestone: Release 1.3.0
Component: nicotine Version: 1.2.12
Keywords: Cc:

Description

Whenever Nicotine has to rescan my shares, there are 4 particular files in which it gives me one of the two following errors:

1) "Scanning File Error: 'utf16' codec can't decode bytes in position 10-11: illegal UTF-16 surrogate Path: /home/xxxxx/xxxxx/xxxxx/xxxxx.wmv" "Traceback: UnicodeDecodeError?: 'utf16' codec can't decode bytes in position 10-11: illegal UTF-16 surrogate"

2) "Scanning File Error: 'utf16' codec can't decode bytes in position 12-13: illegal encoding Path: /home/xxxxx/xxxxx/xxxxx/xxxxx.wmv" "Traceback: UnicodeDecodeError?: 'utf16' codec can't decode bytes in position 12-13: illegal encoding"

As a result of this, the files in question don't appear as shared and, consequentially, people cannot download them from me. I suspected first it might be because of some symbols in the files' names, but I've changed the names to very simple descriptions and the problem persists. Then I thought the files might have been corrupted while stored in my computer, so I downloaded two of them again, from a person on the Internet, and the problem remains. If it's a problem with the files, it must be a problem that was already present in the original files. I can read the files (watch the videos), and my file browser shows me all their properties, with no problems. So the files seem to be OK. I made a search in Google, and it seems to be some problem with python not being able to read the files. Is this a problem with the files or a problem in python?

Change History (3)

comment:1 Changed 9 years ago by quinox

It's a problem with N+.

File handling on Linux is a bit of a nasty business, since paths aren't strings but bytes - they can be written in any encoding. To display things to the user we need strings, so we need to convert it from bytes to strings at some point.

To work around it for now: the content of the file doesn't matter, it's the path. So either the filename or the directory contain characters that aren't UTF-16 (is there a reason you use UTF-16 and not UTF-8 like most people?). If you rename the file/dir to either proper UTF-18 or ASCII it should work again

comment:2 Changed 9 years ago by Frederick IV Barbarossa

Thank you very much for your answer, quinox. As far as I know, I'm using the UTF-8 file naming system in my Ubuntu account, if that's what you mean. (When I type the command "locale" I can see everything in ".utf8" format in the results.) The folder in which the files are located is called "vídeo" - it has the Portuguese character "í" - but, I've just checked and that's a valid character in UTF-8. I have quite an amount of files in that folder and only these particular 4 files give me an error. And, as I said, I've tried changing their names to very simple ones - like "cnnsheen.wmv" - and, still, the problem remains. Anyway, the problem cannot be with the path because (with the problematic files already with simple names), even if I change the folder name to just "video", the problem remains, and if I put one of the problematic files in another path with folders only named with "normal" characters, it still gives me the same error. (Only with these files and not with others in the same folder that have Portuguese characters.) So I have no clue as to what is happening. If I'm supposedly using UTF-8, why the hell is python recurring to UTF-16? Also, I've noticed that I'm getting a bunch of messages, together with the errors, that I don't remember Nicotine ever giving them to me before. For example, these were the last 5 lines I got when I did a rescan:

18:11:57 File "/usr/lib/pymodules/python2.6/mutagen/asf.py", line 114, in init self.value = self.parse(data, kwargs)

18:11:57 File "/usr/lib/pymodules/python2.6/mutagen/asf.py", line 156, in parse return data.decode("utf-16-le").strip("\x00")

18:11:57 File "/usr/lib/python2.6/encodings/utf_16_le.py", line 16, in decode return codecs.utf_16_le_decode(input, errors, True)

18:11:57 Scanning File Error: 'utf16' codec can't decode bytes in position 10-11: illegal UTF-16 surrogate Path: /home/xxxxx/xxxxx/xxxxx.wmv

18:12:01 Rescanning finished

comment:3 Changed 3 years ago by gfarmerfr

Milestone: Release 1.3.0
Note: See TracTickets for help on using tickets.