Alfresco CIFS – Windows 2008 leaking SMB sessions
feb 27 2012
Categories : Notes
One of the strengths with Alfresco is the many ways you can interact with the repository, getting information in and out of the repository not only using the web client, but also with protocol such as imap, ftp nfs, webdav, and cifs. But taking a protocol such as cifs (before 1996 known as smb) that is non-versioning network file system and apply it to a versionable repository such as Alfresco you are on to a tough journey. I have been dealing more with cifs for many of my clients than I ever would have wanted, but I understand client that wish to use Alfresco via cifs. Their end users can just continue editing document the way they are used to and avoiding the extra steps of download/upload that the web client like Share requires.
The problem here is that although there is a file locking mechanism in CIFS, and Alfresco can check for that and detect when there is a file save and apply versioning in the background, there is no way to control how client implements a file save. Many document editing clients implements some sort of temporary file where the actual saving takes place, until you finally close the document, the original document is deleted by the client, and replaced by the temporary file now renamed to the original name. This Alfresco has to detect to actually do versioning, and not just plain delete and creation of a new document. My intention here is not to exactly explain the process, rather give you a quick insight of what Alfresco has to deal with. And if you think that the Microsoft Office suite has one way of doing a file save, think again. Not only does the file save process differ between the different office products, it differs between version, Microsoft Word 2003 have a different save process than version 2007. In version 4 of Alfresco this has finally been dealt with in a reliable way, and I feel comfortable saying to my clients that they can use cifs without file editing in some cases corrupting the files.
So for one of my clients we quickly upgraded to version 4, and deployed Alfresco with Share client not only acting as collaboration tool, but also as their Intranet. And with document editing via cifs with Single Sign On using Kerberos. And to start with all is well. Until they start the bigger roll out to let more users edit via cifs. Then some users get access denied and get prompted with a login, but no credentials entered ever work. And these clients are are all on a Windows 2008 Terminal Server. And a issue tracking journey begins ending in a very odd workaround.
The reports of this error starting coming in the second week in january. Now from the end users perspective it is Alfresco that is failing, but when researching issues like this it can be just about anything from the Windows OS, network equipment, and Alfresco. You just have to keep an open mind about every possibility and not start a blame game between Windows, Network and Alfresco administrators. Here are some of the things we tried collaboratively (not specifically in this order)
None of the above gave some insight, and we tried many more things not mentioned here. And if you have about 200 users relying on the Alfresco server being available, you cannot just restart Alfresco to try out configuration changes, or the terminal server. Probably would have saved some time if you are on Alfresco Enterprise, were you can change configuration and log levels without restart.
What finally gave a clue what was going on was setting the cifs.sessionDebug flags in alfresco-global.properties. Since we had no clue where the error could be, almost all of available flags were set. And it spits out lots of information, and in there we found
[SMB] Failed to allocate UID for virtual circuit, [0:-1, [:null,,,192.168.10.10,Normal],Tree=0,Searches=0]
It is not a very obvious error, and that it is the cause for authentication denied is even less obvious if you have no clue what a virtual circuit is. Time to learn more about cifs protocol than you want to. There was a similar error filed already JLAN-142, that I have now amended.
My client currently have have cifs working. It turns out that Windows 2008 Terminal server is “eating” smb sessions, and not releasing them, making Alfresco run out of Virtual Circuits . We have deployed two things, one is to compile a custom version of Alfresco (thank you for being open source) increasing virtual circuits from default 16 to 128. The other is a crazy one, the bug in windows can be worked around if you never log out the first user that logs into the terminal server as discussed in this Microsoft forum discussion. Since we deployed two fixes at the same time I cannot for sure say which is the final solution, or if they interact, but it actually looks like it is the “keep user logged in”, deemed “the weirdest workaround ever deployed” by my clients IT-administrators. If we can reach a definitive conclusion I will update this post, but after finally having a working system they are not to keen on more restarts.
And there seem to be some fixes to the SMB session leaks, since my client have their Terminal Server in a working state they haven’t tried them. But I think I should mention them, and if you have any experience with them let me know.