Webconsole Healthchecks

Hi there,

For our application I was looking into a way to implement health checks for monitoring the application. Example use cases are for Loadbalancers or Container Orchestration tools that monitor the application for health.

I found the following Felix package, but cannot seem to add it to the dependencies.
In particular, I was aiming to add the webconsole plugin and use that endpoint.

This might be more appropriate for a Github Issue, but maybe somebody can already let me know here if this is at all possible?

Hi,

which application are you trying to monitor - OpenEMS Edge or Backend?

For Edge I would watch the _sum/State Channel. This value is available via multiple ways, e.g. Modbus-TCP-Api and JSON/REST-Api. There are also ongoing developments for automated alerting mechanism based on this Channel (-> Implementation of SumState-Alerting by da-Kai · Pull Request #2260 · OpenEMS/openems · GitHub)

For Backend there was one approach suggested (which I am not 100 % sure if it is the best way to do it…): Allow unauthenticated access to /live on Backend2Backend REST API by phfeustel · Pull Request #2439 · OpenEMS/openems · GitHub. Your approach sounds also good, even though it might only cover a part of the actual availability state.

Which error do you get, when adding the healthcheck bundle?

Regards,
Stefan

Hi Stefan,

Thanks for your reply.

I am trying to monitor the OpenEMS Edge application. Since it’s using the Apache Felix framework, I went looking for pre-existing libraries to add some healthcheck utility for monitoring the application.

The use case is simply for infra / operational monitoring of the Docker container, to check for health or readiness of this particular service. Common practice here is to expose a /health endpoint for this.

The following error I am now stuck at. I already also added dependency org.owasp.encoder after receiving similar complaints.
Can’t seem to resolve this error in similar way.

Unable to resolve org.apache.felix.healthcheck.webconsoleplugin version=2.2.0: missing requirement osgi.service;effective:=active;filter:='(objectClass=org.apache.felix.hc.api.execution.HealthCheckExecutor)'

Thanks for the references. While they are also interesting ideas I am really looking for a mechanism of the osgi framework itself, rather than relying on a specific component.

Best,
Jaap

Hello Jaap,

just because I am curious, what do you want to Monitor here?

The “Health” of what?

Because openEMS runs completely without any flaws @Fenecon without any Errors i guess ans it is not neccessary to Monitor the Code or Java Application itself as it is robust and well Coded.

Am i getting something wrong?

Greetings
John

Hi John,

So it’s not really for measuring internal state of the application, or monitoring specific parts of the application.

The idea here is simply to detect availability of the service. Load Balancers, Reverse Proxy or Container Orchestration often rely on such an endpoint to probe for readiness, or health.

Many frameworks have boilerplate tools readily available. For Spring Boot there’s org.springframework.boot:spring-boot-starter-actuator and after looking around for something similar for Felix OSGi framework I came across this particular dependency.

I guess the root of my problem lies in my limited understanding of the OSGi framework and how it handles dependencies. I would be very keen to find some good learning resources for this.

Best,
Jaap

1 Like

Hi Jaap,

some thoughts in this context:

  1. There is hardly any such thing as “mechanism of the osgi framework itself”. Every feature is just implemented as bundles that provide services at runtime. But of course the Apache Felix project is close to providing those standard features.

Without trying it out myself, I guess you just tried the wrong bundle. org.apache.felix.healthcheck.webconsoleplugin is a plugin for the Apache Felix Web-Console. Instead I believe this one would be what you are searching for:

https://felix.apache.org/documentation/subprojects/apache-felix-healthchecks.html

I suggest you try that. If it’s not working for you, please answer here and I’ll try. Dependencies are sometimes complicated in OSGi.

  1. The best documentation for OSGi is it’s specification. It takes some time to get used to it, but it is a very valuable source of information:
  1. We run OpenEMS Edge mainly on real hardware. For this we implemented a feature that triggers the operating system watchdog via systemd. Could that work for you?
  1. OpenEMS has a global “_sum/State” Channel that is always available e.g. via JSON/REST or Modbus/TCP (if respective bundles are configured). You could query that value to get information about running state as well as if there is everything OK, or if there is a INFO, WARNING or FAULT:

Regards,
Stefan

For edge, there are several layers of healtchecks. One is infrastructure - which covers framework, its bundles and OSGi declarative services - this is the part where felix webconsole healthchecks help out of the box. However there is a second layer, which is application specific. This will not come “out of the box”. Luckily, the healthchecks project define an API, which you can use: HealthCheck and Result.
By implementing these two you can expose openems edge internal state through webconsole or dedicated HTTP endpoint which works independently of webconsole itself (its an optional component in many cases).

Other question is whether silent lack of communication (i.e. stale data) can or should trigger watchdog? For some projects I’ve seen that’s the case - i.e. Eclipse Kura has a concept of critical component which, in case of internal failure, should trigger watchdog and restart of software or even operating system.