The backoff process during backend overload is implemented using exceptions when writing to InfluxDB. However, even when InfluxDB returns a type exception, the query limit counter still increases. As you know, once InfluxDB creates a value in a field, its type is fixed. Therefore, in a long-term OpenEMS operation with various versions of Edge, data types can become inconsistent.
On the backend side, there is code that casts types when a type exception occurs. The backend also has a filter that modifies the data format to match the type required by InfluxDB. In this kind of environment, when the backend is restarted, if the system is unlucky, the process of registering type casting to the filter can repeatedly increase the query limit counter, eventually leading to a service outage.
To solve this issue, I have created the following PR, but I am not very confident about it. How do most people in the community approach this problem?
In our case, when we recently changed MeasuringEVCS to ElectricityMeter for EVCS, the OCPP /Frequency value changed from float to int. As a result, we had to rely heavily on this backend functionality to handle data from multiple Edge devices.
Also it would be interesting if there was a way to show that Community members are members of the assocation. Maybe a Badge could be granted etc. Need to do some researchâŚ
Back to topic: I asked @michaelgrill to review the PR as he has been working on that code recently.
Hi ikegam,
I am not sure, that it helps to increase the query limit counter to solve the problem (Note that I did not hat a close look at your code).
We once had probably the same issue. When we restarted the Backend we lost thousands of datapoints within the first hour after restart. The reason was hard to detect, because our logfile looks good, except of a few Typecast-Exceptions.
In our case the problem was, that every time a Typecast-Exception has been occured, all measuring points that are also transmitted with this exception are lost. It turns out that the OpenEMS backend collects more than 1000 measuring points and transmits them to the database in one piece. So everytime we got a Typecast-Exception we lost all this 1000+ measuring points. On a typical backend startup we saw maybe 50 Typecast-Exception in the backend log. But in the end, this has lead to >50.000 lost measuring points.
We solved this, by fixing our code and updating all edges. Still we have a very
few âunsolvableâ Typecast-Exceptions. But for them we added some hardcoded, predefined typecast handler.
Now, any new seen typecast exception sets the alarm bells ringing. And we respond with immediate action. Also we try to improve our internal development process to find this kind of problem in the review phase already.
So I would suggest to fix all Typecast-Exceptions as early as possible. In the end this may help you more, than fixing the querylimit counter mechanism.
Youâre absolutely right to keep consistent types on your Edge. thatâs definitely the right approach!
I also want to resolve the type issues from the edge side, but unfortunately, the data in our InfluxDB is already inconsistent. For example, evcs[1...4]/Frequency is stored as float, while evcs[5...10]/Frequency is int. And newer Edges send this as int.
One option would be to rebuild the InfluxDB entirely. but that would be quite a heavy task. Alternatively, we could shift to a new field, but that would increase the number of columns.
Iâll think about it more.
Either way, thanks a lot for sharing your experience. It was super helpful!
Mar 14 18:24:01 XXX java[3588521]: 2025-03-14T18:24:01,845 [thread-1] INFO [.debugcycle.DebugCycleExecutor] [Timedata.InfluxD
B] timedata0 [monitor] Pool: 8/10, Pending: 0, Completed: 8, Active: 0, MergePointsWorker[Default: 0/1000000], Limit:0.000, Reject
edExecutions:0
Mar 14 18:24:02 XXX java[3588521]: 2025-03-14T18:24:02,073 [fluxDB-8] WARN [red.influxdb.MergePointsWorker] Unable to write t
o InfluxDB. BadRequestException: HTTP status code: 400; Message: partial write: field type conflict: input field âevcs3/Frequencyâ
on measurement âdataâ is type integer, already exists as type float dropped=25
Mar 14 18:24:06 XXX java[3588521]: 2025-03-14T18:24:06,845 [thread-1] INFO [.debugcycle.DebugCycleExecutor] [Timedata.InfluxD
B] timedata0 [monitor] Pool: 9/10, Pending: 0, Completed: 9, Active: 0, MergePointsWorker[Default: 0/1000000], Limit:0.100, Reject
edExecutions:0
âŚ
Mar 14 18:28:16 XXX java[3588521]: 2025-03-14T18:28:16,844 [thread-1] INFO [.debugcycle.DebugCycleExecutor] [Timedata.InfluxDB] timedata0 [monitor] Pool: 10/10, Pending: 0, Completed: 34, Active: 0, MergePointsWorker[Default: 0/1000000], Limit:0.950, RejectedExecutions:0
This is a log when it happens. With above patch, it changes like follows.
Mar 16 16:44:42 XXX java[1367630]: 2025-03-16T16:44:42,478 [fluxDB-0] WARN [red.influxdb.MergePointsWorker] Unable to write to InfluxDB. BadRequestException: HTTP status code: 400; Message: partial write: field type conflict: input field âevcs25/Frequencyâ on measurement âdataâ is type integer, already exists as type float dropped=1
Mar 16 16:44:42 XXX java[1367630]: 2025-03-16T16:44:42,479 [fluxDB-0] INFO [nflux.FieldTypeConflictHandler] [Timedata.InfluxDB] Add handler for [evcs25/Frequency] from [integer] to [float]
Mar 16 16:44:42 XXX java[1367630]: Add predefined FieldTypeConflictHandler: this.createAndAddHandler(âevcs25/Frequencyâ, RequiredType.FLOAT);
Mar 16 16:44:42 XXX java[1367630]: 2025-03-16T16:44:42,749 [thread-1] INFO [.debugcycle.DebugCycleExecutor] [Timedata.InfluxDB] timedata0 [monitor] Pool: 10/10, Pending: 0, Completed: 311, Active: 0, MergePointsWorker[Default: 2/1000000], Limit:0.000, RejectedExecutions:0
I know, tidying up is a thankless task. But if you donât do it you will regularly and cyclically loose up to 1000 measuring points every time a new evcs[5...10]/Frequency will be received.
We had the same issue with evcsX/Frequency. Our solution in that case. Kick all evcsX/Frequency out. We do not need it. If I want to work with the frequency I use the grid meter frequency. I canât imagine a situation where I need the frequency of an EVCS. Also we had the same issue with FIRMWARE (I think), which was sometimes an Integer, a Float or a String, depending on the EVCS. We cleaned it up and renamed it to FIRMWARE_VERSION (I think) and made it always a String. You are right, this increases the number of columns in the DB. But I found influx handling this really well. I donât think that a few more columns will become an issue in the future.
So far, we have managed to deal with this problem well. But iâm a bit scared of the day, when we canât get any further.
The way we solved it is quite similar. Maybe this patch will also work well in your backend.
I guess the exception happens naturally with InfluxDB and OpenEMS, because the backend already has a function to handle typecast errors.
Currently, the overload detection works by increasing the querylimit counter whenever an exception occurs during a write. My patch simply changes this behavior so that the counter is not increased when the exception is caused by a type error, which is not overload.
Yes, I also feel that /Frequency is not needed for EVCS. Thatâs also one of the possible ways Iâm considering to solve this.
Actually, itâs when we get data for the old evcs[0...4] that the problem happens. The new Edges send /Frequency as an int, but InfluxDB expects a float.
This approach is a much better way to handle a single controller across all channels, as it greatly simplifies maintaining type integrity.
Perhaps we should consider deploying this aggregated InfluxDB and migrating to the new server.
However, since many users, as shown in this thread, still depend on raw InfluxDB, it might be beneficial to implement a backend solution to address this issue.