Opened 13 years ago

Last modified 3 years ago

#72 reopened defect

"Too many open files" causes nicotine go insane

Reported by: Josselin Mouette <joss@…> Owned by: daelstorm
Priority: major Milestone: Release 1.3.0
Component: nicotine Version: 1.2.5.1
Keywords: Cc:

Description

forwarded from Debian bug#391409

I had many problems with nicotine (crashes, all network connections being "stuck" etc.) recently, but the worst is that sometimes after a crash the configuration (containing all pending download) is simply deleted. This sometimes happens after just 10-20 minutes, sometimes nicotine runs for days.

Starting nicotine from a shell instead of the menu I found that it often complains about "Too many open files", and that it can not save the config. file. It seems that nicotine first deletes the old config file, and then tries to write a new one. But if writing the new config does not succeed, the old version is not recovered.

During normal operation nicotine seems to use 2-300 file descriptors. Right now I have a stuck nicotine instance that has all 1024 descriptors that select() can handle open, 801 of them being a TCP connection in the CLOSE_WAIT state - i.e. they're already dead. "Stuck" right now means:

  • the UI still runs
  • no network traffic
  • the config file is missing (luckily I now have a copy of config.old that's about 20 minutes old)
  • "Disconnect" did nothing but disable the "Disconnect" entry in the menu (the downloads did not disappear from the "Downloads" tab, all network connections remained open, and "Connect" remains disabled in the menu)
  • even "Rescan shares" complains about "Too many open files"

So,

  • there appears to be a serious file descriptor leak; maybe nicotine defers the closing of dead connections for far too long?
  • writing the config file is unsafe wrt. I/O errors
  • slskproty.py should be using poll instead of select (unfortunately I don't know much about Python so I can't offer to create a patch), so it could handle file handles > 1024
  • if "Disconnect" would work (i.e. it would _really_ close all network connections), then nicotine would probably able to recover

Change History (10)

comment:1 Changed 13 years ago by daelstorm

Status: newassigned

Config Issue: The config file modifications in r88 should prevent that problem from being a major issue with the next release. It saves the config file pseudo-atomically (it's not perfect).

The connection issue: This might be resolved in r102 when Server Ping and Connection Closed timeouts were re-enabled. This needs testing.

slskproto.py using poll v.s. select: I don't get it. How would that help with regards to file handles?

The Disconnect issue is interesting. I'll look into it.

Generally speaking, one can use ulimit to increase the number of files a process is allowed to open. When many hundreds (rarely thousands) of search results are returned, more than 1024 are sometimes needed.

comment:2 Changed 13 years ago by joss@…

About the poll/select idea, I think the reporter is indeed wrong, this won't change anything. The only sane thing to do is to handle gracefully the case when the limit is reached, e.g. by refusing new connections and displaying a warning.

comment:3 Changed 12 years ago by jUrner@…

Ups, quite an old report. Anyways....

search results are returned, more than 1024 [open files] are sometimes needed.

Them you should hit the MAX_OPEN_FILES / process limit on many oses quite reliably. What's wrong with detecting when this limit is hit and informing the user about it? Generally speeking, increasing a MAX_OPEN_FILES limit (if possible) is no good idea. It could be three, and there could be a good reason for that.

Most reliable way to crash my Nicotine is to search for "Roy Orbison". Given it is a non-virtual ListView? holding search results, even if the file limit was not hit, no ListView? on earth can handle that* many items ;-)

Nicotine - 1.2.6

comment:4 Changed 12 years ago by renan_s2

It happens with me. A reliable way to repeat this behaviour is for me to search for "Rush" or "Genesis".

Did NOT use to happen in old (pre-1.2.8) versions of Nicotine, as far as I remember.

comment:5 Changed 11 years ago by offhand

Resolution: fixed
Status: assignedclosed

I think protection and fixes have been implemented so this should not happen anymore. If it does please file a new ticket.

comment:6 Changed 10 years ago by castrillon_carlos@…

i have the same problem, nicotine crashes after i see that conf message in the log

comment:7 Changed 10 years ago by quinox

castrillon: Please use a newer version of n+, v. 1.2.10 was released just a short while ago which has all sorts of stuff to handle this situation better. Please file a new bug report if you still have crashes with 1.2.10

comment:8 Changed 9 years ago by rewrhidge

Really.

comment:9 Changed 9 years ago by moonstruxx@…

Resolution: fixed
Status: closedreopened

Hi! I'm hitting the connection issue, too. nicotine version is 1.2.15 here. It tries to reduce connections (successfully) but this stops all running transfers somehow .. they are still on cue but nicotine dosen't restart the transfers. It's somehow unusable after such a crash. Everything works fine after restart.

cheers,

bjoern

comment:10 Changed 3 years ago by gfarmerfr

Milestone: Release 1.3.0
Note: See TracTickets for help on using tickets.