Opened 10 years ago

Last modified 3 years ago

#351 new defect

"Error creating directory" when downloading into existing directory.

Reported by: anonymous Owned by: quinox
Priority: minor Milestone: Release 1.3.0
Component: nicotine Version: SVN
Keywords: Cc:

Description

sometimes I use "Download to..." to grab some missing files and not have them strewn about the base dir. generally the suggested name (the name of the remote dir) is fine, it might even be where the rest of the files are... but that case refuses to function, and I must change the dirname to contain those remaining files. I would expect it to happily download into the existing dir, and deal with duplicates in the normal way.

Actually, I dont like the normal way of duplicate files. usually if I have a file of exactly the same name, I dont want it, maybe it should start paused or give some indication, especially if it is exactly the same size as the remote file. of course sometimes I am trying to replace a broken file...

second part I mention because I have noticed a few downloaded duplicates recently. I think there is some other bug where aborted files will become queued again, maybe after a crash or something. I will post another bug if I understand that behavior, but I added the second part because maybe nicotine should not be so ready to download dupes. occasionally I want that, but not usually.

Change History (11)

comment:1 Changed 10 years ago by offhand

The problem, as far as I am aware, is that the server does not support hash lookups for instance. Quinox might be able to explain it better.

comment:3 Changed 10 years ago by quinox

You raised three issues:

  • "Download to..." into an existing directory fails. This sounds like a bug indeed, we'll take a look at it and fix it
  • Duplicates should be detected. As Offhand said there is no hashing done on SLSK, so even if the file name and size is the same we don't know whether it is actually identical. Music files are generally the same size (2 to 8MB or so), and it's not uncommon to find identical file names with different albums (01 - Track.mp3 comes to mind). Not downloading files because you already got a similar one doesn't sound like a great idea to me. What we can do: download the file that might be a dupe, do a hash check once n+ downloaded it and don't save it in case it is actually the same file. We'll implement this
  • Files are queued twice. Do you accept uploads from users? If so you might want to turn it off - this is a known problem with SLSK, where cancelled files are requeued by the remote user.

comment:4 Changed 10 years ago by quinox

First bullet point is fixed with r851, saving behaviour is slightly different now but it shouldn't cause any confusion

comment:5 Changed 10 years ago by quinox

Second bullet is fixed with r852 (commit msg is wrong, I thought it was the third bullet point): If a newly downloaded file already exists under the desired name and their md5 digests are the same the file will not be stored under a different name.

This works well for music files, if you try to download 600MB dupes n+ will probably hang for a bit while calculating the hash.

comment:6 Changed 10 years ago by quinox

Note that it doesn't do dupe detection on your complete share, it only kicks in when a download cannot be saved because a file already exists with the same name (when mp3.1 etc. would be used)

comment:7 Changed 10 years ago by anonymous

I just noticed you have fixed existing directory thing, nice. now I am especially sorry for submitting this combined bug report, because the 'aside' has become interesting, to me at least. should it be split off, or is there somewhere better for this discussion.

first of all, thanks a lot for taking it on, the dupe detection you have added is definitely useful...

but my original comment had more to do with preventing unnecessary transfers. I have disabled the "accept uploads" thing, but I still noticed one duplicate download... unfortunately, I did not take note of the particulars. generic filenames might be common, but identical filesize is much less common. at least with my slow connection, and not necessarily attending to the client very regularly, I would rather confirm the download somehow if there is a high chance that I have that file. for example identical filename (maybe ignoring case, and some other simple variations), for only files over 5mb, or comparing bitrates, you must admit false posatives could be very rare if its tuned right. could be a nice optional feature since hashing isnt implemented on the server...

as for the dupe detection you added... very nice, but it should be possible then to generate hashes for your whole collection to have more useful detection... of course for it to be truly useful, it should be a hash of the audio frames only, and so, be immune to retagging. in fact, for lossless files, the hash of the pcm data should be used, which is embedded in flac files at least, then it would identify same lossless file regardless of codec and tagging.

comment:8 Changed 10 years ago by quinox

Of the 8347 files in my collection, 810 have conflicting file sizes and most are CBR160 or CBR320 so I'd say it's not just a theoretical idea :)

I think the current state of n+ is good enough for most people, that is no extended dupe checking. I really don't see it being of any value to most people - and hashing every file in your collection adds a lot of CPU strain (I know it's only needed once, but still)

I'm not completely unsympathetic to your situation though, so I propose I'll extend the plugin framework and build you a plugin that does:

  • Before the fact high-dupe-probability checks using filename and file size and bitrate (with a threshold, different libraries report slightly different rates), and pauses such transfers
  • After the fact hash checking using the hashes of your complete share. Hashes solely based on audio frames only sounds useful indeed, but it's out of the n+ scope and I suspect it's very difficult to do correctly, seeing how problematic just reading out metadata is! So unless you can point to a library that does precisely this it will be based on complete-file hash

comment:9 Changed 10 years ago by anonymous

thanks a lot for taking another look at this now slightly obscure idea, which I agree is largely beyond the scope of n+. You have really been doing some great work here, and I would have to say part 2 looks pretty low priority indeed. Extending the framework to make it possible as a plug-in sounds worthwhile though. I have been meaning to learn some python for a while now, so maybe I will jump in here before too long...

I only mentioned part2 out of frustration with an unexpected ordeal of integrating several eras of my music archives - doing definitive cdrips, finding bad or missing files, and removing dupes. for lossless files, they are easy enough to compare, and the difference is either some offset or some bad bits... for mp3s though... well just for instance, I have just discovered that one mp3 can be exactly the same as another while being half the filesize, so its not even as 'easy' as stripping tags. I found 3 files with vbr bitrates of 180,203, and 319, which have 100% identical audio content. the tool mp3packer (hxxp://omion.dyndns.org/mp3packer/mp3packer.html) helped me figure this out, but it does not have some "compare" mode. probably it would be better to use some audio fingerprinting scheme anyway, but that might not help with good vs bad file (skipping, etc). so... yeah, part 2 looks pretty hard, and is almost certainly beyond the scope...

as for part 1, it goes along with the idea that sometimes I want to confirm what I am downloading, that is part of #346, but... in this context, I must say that many times I have downloaded a file of identical size bacuase I am looking for an uncorrupt version of a file I have already. in fact, I am opening a new request to add a preference for filesize units. a while back, all units were bytes, now bytes are used some places, and "human readable" units in others. the exact size is sometimes useful. (opened #377)

ok, sorry to ramble on in this semi-conscious fashion, I actually returned to this bug for a more specific reason. mainly the primary bug does not seem completely fixed. possible it is partly related to the gnome file picker though. For one thing, now if you use 'download to...' and the directory does not already exist, it will no longer be 'suggested' for you, you must enter a dirname on your own, I found the default of 'remote dirname' to be useful.

when the dir does exists, I see some inconsistent behavior I dont understand. sometimes the picker will default to the existing directory, and that seems to work ok, but sometimes it will show the basedir, with the existing dir selected, in which case it has sometimes, but not always complained about the existing dir. sorry I cant be more specific right now, but I have seen this after the fix, not at the right location to test right now.

ok, I opened some other bug about the tangent stuff, so that this can be closed when its done. (#378)

comment:10 Changed 10 years ago by anonymous

just noticed this again, and I was not specific enough before. these issues occur only in share browse, and not in search tab. that is, in search if you select some files and do 'download to', the name of the remote directory of the first selected file will be the default, wheras in share browse, no default name is presented. pretty sure that is related to this fix, because previously there was the 'error creating', when the default dir (same name as remote) was existing. appears I did not notice before this only has to do with share browse.

comment:11 Changed 3 years ago by gfarmerfr

Milestone: Release 1.2.11Release 1.3.0
Note: See TracTickets for help on using tickets.