Hey, it seems like OpenEMS uses a custom patched Java-Websocket dependency.
We’ve observed an error regarding the websocket in an older OpenEMS version we’re still using. We pinpointed it to the Java-Websocket dependency. Looking into the latest backend code, we see, that it does not use the maven dependency anymore, but a custom jar in the wrapper package.
Seems like inside of this jar, at least the org.java_websocket.WebSocketImpl was changed, such that the flushandclosestate is volatile. So far I’ve not diff’ed it further.
And the volatile keyword is not changeable via reflection. @stefan.feilmeier can you confirm, that the volatile is also part of that change and should be part of the PR in the Java-Websocket repo?
I’ve also spend one day together with Kiro and tried to reproduce the connection issue in a test - I’ve tried a test within Java-Websocket directly and wasn’t able to achieve that. Even with direct thread manipulation it was not possible - but since OpenEMS swallows most of the exception (catches it and does not log it) in such case, I’ve added more logging in our backend for now and wait until it occurs again - hoping to see a bit more details for that kind of error.
We have seen in the past problems during reconnection, which were very hard to reproduce. One of our colleages at FENECON was now able to identify a deadlock in the underlying C code in Java NIO, that can happen if the connection drops while data is currently being sent. Similar to this:
We are currently testing a fix internally and I am pointing our devs to this thread.