OpenEMS custom Java-Websocket dependency usage

Hey, it seems like OpenEMS uses a custom patched Java-Websocket dependency.

We’ve observed an error regarding the websocket in an older OpenEMS version we’re still using. We pinpointed it to the Java-Websocket dependency. Looking into the latest backend code, we see, that it does not use the maven dependency anymore, but a custom jar in the wrapper package.

Seems like inside of this jar, at least the org.java_websocket.WebSocketImpl was changed, such that the flushandclosestate is volatile. So far I’ve not diff’ed it further.

In the issues and pull requests of GitHub - TooTallNate/Java-WebSocket: A barebones WebSocket client and server implementation written in 100% Java. · GitHub I can’t find this change. Is there any reason, why it’s not addressed in the project, but only a custom patch in the OpenEMS repository? And am I correct to infer, that the patch was related to backend ui websocket crashes?

Hi @paraplu,

I am aware of an older PR by @stefan.feilmeier in the Java-WebSocket repository.
"Add reconnectBlocking method with timeout” (PR #1251).

However, I don’t know how this relates to the changes in the bundled JAR or where it came from.

Ah thanks, seems like it may be indirectly related?! But this pull request does not change the volatile keyword.

The reconnect method itself of this PR is at least not used in the OpenEMS code, this seems to live with reflection usage in our own repo: openems/io.openems.common/src/io/openems/common/websocket/ClientReconnectorWorker.java at ae9582fc6758f986ddbcac386847f9fa3b68f424 · OpenEMS/openems · GitHub

And the volatile keyword is not changeable via reflection. @stefan.feilmeier can you confirm, that the volatile is also part of that change and should be part of the PR in the Java-Websocket repo?

I’ve also spend one day together with Kiro and tried to reproduce the connection issue in a test - I’ve tried a test within Java-Websocket directly and wasn’t able to achieve that. Even with direct thread manipulation it was not possible - but since OpenEMS swallows most of the exception (catches it and does not log it) in such case, I’ve added more logging in our backend for now and wait until it occurs again - hoping to see a bit more details for that kind of error.

We have seen in the past problems during reconnection, which were very hard to reproduce. One of our colleages at FENECON was now able to identify a deadlock in the underlying C code in Java NIO, that can happen if the connection drops while data is currently being sent. Similar to this:

We are currently testing a fix internally and I am pointing our devs to this thread.

1 Like