Random "HTTP 503 Unavailable"

I recently built a Nuki app for a home domotica controller which uses the Bridge API as well and I’m seeying the same behaviour. A lot of the endpoints return the #503 error ocassionaly. It seems the bridge is slow in responding to request and is not able to handle more than one request at the time. The combination of these issues causes the #503 error. If you for instance have the Nuki smartphone app open (without being connected with Bluetooth) and try to use the Bridge API you often end up with #503 errors.

I have the old Bridge hardware, the Bride is about 1M away from the Nuki lock and connection is always looking good (green). I access the Bridge API using node-fetch, a nodejs module based on window.fetch.

I would appreciate it if Nuki would investigate this and improve the performance from the Bridge to avoid this issue.

Yes, this is a known issue, that the Bridge can only really handle one request at a time and we are working on improvements there from different angles, but ressources are limited so this will still take some noticeable time.

Hi,

i experience the same issue.
In my case, after the Bridge executed a action (lockAction) it needs some “recovery time” (about a minute) with HTTP 503, after that the bridge responds with HTTP 200. it this a normal behavior?

Nuki Bridge (new HW) with 2.1.37

13:55:43.076839 IP CLIENT.55277 > NUKI BRIDGE.8080: Flags [P.], seq 1:200, ack 1, win 29200, length 199: HTTP: GET /lockAction?nukiId=NUKIID&action=2&token=GEHEIM HTTP/1.1
13:55:49.527871 IP NUKI BRIDGE.8080 > CLIENT.55277: Flags [P.], seq 1:148, ack 200, win 5545, length 147: HTTP: HTTP/1.1 200 OK
13:55:54.190946 IP CLIENT.55304 > NUKI BRIDGE.8080: Flags [P.], seq 1:168, ack 1, win 29200, length 167: HTTP: GET /list?token=GEHEIM HTTP/1.1
13:55:55.964556 IP NUKI BRIDGE.8080 > CLIENT.55304: Flags [P.], seq 1:96, ack 168, win 5577, length 95: HTTP: HTTP/1.1 503 Service Unavailable
13:56:09.190994 IP CLIENT.55333 > NUKI BRIDGE.8080: Flags [P.], seq 1:168, ack 1, win 29200, length 167: HTTP: GET /list?token=GEHEIM HTTP/1.1
13:56:09.287919 IP NUKI BRIDGE.8080 > CLIENT.55333: Flags [P.], seq 1:96, ack 168, win 5577, length 95: HTTP: HTTP/1.1 503 Service Unavailable
13:56:24.194952 IP CLIENT.55369 > NUKI BRIDGE.8080: Flags [P.], seq 1:168, ack 1, win 29200, length 167: HTTP: GET /list?token=GEHEIM HTTP/1.1
13:56:24.291353 IP NUKI BRIDGE.8080 > CLIENT.55369: Flags [P.], seq 1:96, ack 168, win 5577, length 95: HTTP: HTTP/1.1 503 Service Unavailable
13:56:39.192597 IP CLIENT.55410 > NUKI BRIDGE.8080: Flags [P.], seq 1:168, ack 1, win 29200, length 167: HTTP: GET /list?token=GEHEIM HTTP/1.1
13:56:39.324788 IP NUKI BRIDGE.8080 > CLIENT.55410: Flags [P.], seq 1:119, ack 168, win 5577, length 118: HTTP: HTTP/1.1 200 OK
13:56:39.444491 IP NUKI BRIDGE.8080 > CLIENT.55410: Flags [P.], seq 119:299, ack 168, win 5577, length 180: HTTP
13:56:54.193883 IP CLIENT.55445 > NUKI BRIDGE.8080: Flags [P.], seq 1:168, ack 1, win 29200, length 167: HTTP: GET /list?token=GEHEIM HTTP/1.1
13:56:54.318407 IP NUKI BRIDGE.8080 > CLIENT.55445: Flags [P.], seq 1:119, ack 168, win 5577, length 118: HTTP: HTTP/1.1 200 OK
[…]

A minute should definitly not be normal behaviour.

Only /lockState is calling the Smart Lock directly, so this can not be done in arbitrarily short intervals, but for everything else more than some seconds should not happen.

Do you have any other integrations running or callbacks which the Bridge could be processing?

Hello,

same problem for me here.
when I’m using curl and the locks/bridge is doing nothing its working 9 of 10 tries.

but when the lock is currently performing any lock action (even lock’n’ go where its doing nothing in the first 20s but waiting) its working 0 of 10 tries. so i am not able to ask the locks for the currrent lock state when they are performing any action (closing, opening, lockngo)
i also getting the 503 error.

any idea?

It’s not doing nothing while Lock & Go. In fact this is a tricky one to set activity log and trigger callbacks correctly.

What would be interesting: Do you notice any difference between /list and /lockState calls?

Hello,

yes i’ve already seen that @schlewitz mentioned that. So i already tried that but couldnt see any difference in the behaviour.
but is it normal that the api becomes unavailable EVERYTIME the lock is performing any lockaction (even lockngo)?

A shirt “busy time” after actions is normal due to status updates being pushed, but the described time-frames seem far to big.
We are currently running some additional duration tests with different setups to see if we can reproduce the problems.
Any further details about your specific setup and usage behavior will help us narrow down the search.

Hello, I’ve also gotten some problems with long “busy times” or “unavailable times”. Then I’ve seen that the Bridge is rebooting every few minutes (Firmware 2.2.9) (Uptime not longer than 600s).
Is that a known issue for that firmware version? What can cause a reboot of the bridge? Too much requests at same time?
Thanks!

I can’t reproduce this here with my bridge. Even within our stress-tests we never had such short uptimes. Only known problem is when your Brige is offline (can’t reach our server) and therefore restarts regularly.

OK, that’s strange. And how long is the reboot cycle if not connected to server?
Another strange thing is, that I’ve replaced the bridge with another one that’s on Firmware 2.2.9 and got the same reboots.
I’ve also tested both in another network and there were no reboots?!

But in the same network that causes reboots to Firmware 2.2.9, there is a bridge Gen1 with firmware 1.12.6 and that is running stable! Can this be related to the Router?! It’s an UPC router in Vienna (UPC Connect BOX)

This really sounds like the bridge thinks it needs to restart the Wifi module for some reason. (The older hardware revision - FW 1.x does not do this)
Could be router related, but I got an UPC Connect Box myself at home and don’t have these issues. Any special settings/setups you are using.
You can PM me with more details we can check with our FW developers if you want to.

curl -s http://nuki.lan:8080/info?token=xxxx,
list & lockstate, work fine,

curl -s http://nuki.lan:8080/log?token=xxxx
returns: HTTP 503 Unavailable

triyed either from Raspberry PI console and Windows 10 console and I get the exact same result

/log & /clearlog are not yet available
see Bridge HTTP-API endpoints

1 Like

Grazie (as per the last name, sorry to assume in case is not). I missed the comments related to that…What a pity as we were going to develop a log to MQTT python interface…It will have to wait… :upside_down_face:

i have the same issue. i executed following command for some hours:
watch "timeout 2 curl http://10.0.8.7:8080/callback/list\?token\=XXXYYYZZZ >> test.log" which means every 2 seconds i tried to get a list of callbacks. and ~200 requests of 5000 requests returns a 503 errors.

but actually the problem is, that my callback is not called sometimes or called delayed. and sometimes a cant open the door with lockAction endpoint, because it returns also a 503.

Calling too often can lead to problems, as the Brigde can only handle one call at a time and also interacts as a bridge between Smart Lock and Server, pushes status updates etc.
There are definitly still at least 3+ seconds.
The other question is, if where is a reason to call the callback-list that often (or at all)? (or did you just do this for testing purpose? -> should take approx. 250-500ms for the call itself which blocks other tasks)

okay, you want to say that this API is reliable? I can also try to do it only every 10 seconds, or any delay you want.

the callback-list call was just an example. you can reproduce it which any other provided endpoint.

as I said, I just want my callback to be called “in time”, but it doesn’t happen (sometimes).

i also want to be able to open or close the door through this api. but this is also not possible. sometimes the bridge is not available at all (like “not connected”, but it is connected!) and sometimes the endpoint return a 503 error and open or close the door “delayed” which also make an ‘retry’ mechanism on my/app site impossible because it would trigger the open/close process two times.

if my “test scenario” is not good enough to demonstrate that this api/device is not really reliable, then pls provide me a script or something like this how i can prove it.

I will PM you some test-script I use, so we can directly compare numbers. That would surely help us.

It’s time to revive this thread. I am currently running version XXX on my bridge and it’s giving me lots of HTTP 503 Unavailable, for list, callback and even info.

Also it doesn’t seem to be performing callbacks anymore.