General Software Improvements (3rd June 2020 onwards)

Sub-forum for general software improvements threads
hamishmb
Posts: 1891
Joined: 16/05/2017, 16:41

General Software Improvements (3rd June 2020 onwards)

Post by hamishmb »

I've implemented the sockets message queue feature now, and the changes are in the master branch.

I haven't yet written a system test for it, but I did manage to test that the code works manually - I'll write a proper test soon.
Hamish
hamishmb
Posts: 1891
Joined: 16/05/2017, 16:41

Re: General Software Improvements (3rd June 2020 onwards)

Post by hamishmb »

Note for Patrick:

The bus can be used by prefixing the message to send with the site ID to send it to, for example:

Code: Select all

*G4* turtle
So this could be sent to the NAS box, and as long as the NAS box has a socket that connects to G4 it will forward the message along. Note that there is currently no error returned if the routing wasn't possible (such as if an incorrect site ID was used, or the eg NAS box didn't have a socket connecting to the site the message is going to).

I'll document this properly soon, but first I might as well make the test for it.
Hamish
PatrickW
Posts: 146
Joined: 25/11/2019, 13:34

Re: General Software Improvements (3rd June 2020 onwards)

Post by PatrickW »

Thanks Hamish, that will be useful to know.
hamishmb
Posts: 1891
Joined: 16/05/2017, 16:41

Re: General Software Improvements (3rd June 2020 onwards)

Post by hamishmb »

Okay, I have made some more improvements.
  • A bug in the sockets update from last week that caused parts of messages to be lost is now fixed.
  • I've written a standalone test for the message bus functionality.
    • This needed a couple of "dummy" sites to be added to the end of config.py.
    • This also fixes the other standalone sockets tests which were broken by last week's changes.
There aren't yet unit tests for the new sockets message bus code, but I shall write them in the fullness of time.

I have a feeling that merging all this code might be a bit of a pain because of the changes in directory structure and so on, but I guess we will see.

EDIT: Fixed the unit tests for the sockets code too, turns out I broke them and didn't realise.
Hamish
hamishmb
Posts: 1891
Joined: 16/05/2017, 16:41

Re: General Software Improvements (3rd June 2020 onwards)

Post by hamishmb »

More improvements:
  • Improved argument validation in the constructor for the Sockets class.
  • Complete unit tests for the message bus/forwarding functionality.
Hamish
hamishmb
Posts: 1891
Joined: 16/05/2017, 16:41

Re: General Software Improvements (3rd June 2020 onwards)

Post by hamishmb »

Good progress today. I've made and tested a number of changes:
  • The NAS box control logic is now partially implemented.
    • (temperature monitoring and setting the system tick)
    • (system tick is not yet restored from the database).
  • Basic NAS box temperature monitoring is now implemented (and data gets put in the system status table).
  • The old sumppi control logic now uses the database instead of sockets to control the gate valve.
    • (note that I haven't implemented the gate valve side of using the database yet).
  • The system tick is now used in monitortools.py (and hence in all Reading objects during normal use).
  • Various other minor issues.
Notes/known issues:
  • All pis currently wait up to 60 seconds for a system tick on startup.
    • This is to account for the NAS box taking a while to start. We may need to delay things further.
  • The database connection timeout is now 30 seconds instead of 10.
    • I don't think this used to be a problem, but 10 seconds doesn't seem to be long enough right now for some reason. Any ideas why this might be? Perhaps DNS failing to resolve something?
  • The are synchronisation issues with the system tick right now.
    • Every 5-6 ticks or so, there's a reading which uses the same tick as the previous reading.
    • I think this is due to timing slowly slipping between the NAS box and my pi VMs.
    • I'm confident I can improve things at least a bit, but how much does this matter?
Hamish
TerryJC
Posts: 2616
Joined: 16/05/2017, 17:17

Re: General Software Improvements (3rd June 2020 onwards)

Post by TerryJC »

Regarding the System Tick, it's not critical but we could always make it a bit faster than the reading interval.
Terry
hamishmb
Posts: 1891
Joined: 16/05/2017, 16:41

Re: General Software Improvements (3rd June 2020 onwards)

Post by hamishmb »

Yeah, we could do that. I know there are optimisations I can make, so I'll go for some of them first and see how much it improves things.
Hamish
hamishmb
Posts: 1891
Joined: 16/05/2017, 16:41

Re: General Software Improvements (3rd June 2020 onwards)

Post by hamishmb »

Update for changes this week:
  • Create Tools/logiccoretools.py to interface between the control logic and coretools.DatabaseConnection (discussed and agreed with Patrick at viewtopic.php?f=14&t=237&start=10)
  • Log in event log and local file logs when NAS box is getting hot.
  • Restore system tick from database when possible.
  • Gate valve stuff:
    • Implement generic gate valve control logic
    • Update old sumppi control logic to use correct request format for gate valves.
    • Update config.py to use new valve control logic function.
    • Add a new method to GateValve and ManageGateValve so we can get the most recent requested position for a valve, and don't spam the logs now each time we find a gate valve position.
  • Remove the old temporary code that ensured monitor synchronised on startup - no longer needed.
  • Log when one or more monitors stop working.
  • Unit test stuff:
    • Fix unit tests on master branch (broken by previous changes to i2c logic and others).
    • Fix a unit test hang on the use-database branch code.
    • Move .coveragerc to correct place in use-database branch so coverage checking is now working properly again.
  • Database stuff:
    • Fix hang on startup if database isn't available.
    • Implement extremely basic error handling for when the database goes down.
  • Cleaning up:
    • Fix docs for constructor for ManageGateValve.
    • Make config.py docstring more Sphinx-friendly and silence misleading pylint warnings.
    • Remove some old notes in the gate valve code as discussed at viewtopic.php?f=14&t=244
In progress:
  • Stop everything from blowing up and smoke pouring out when the database goes down temporarily (like for a reboot). :lol:
  • General code clean up.
  • Updating old/missing/sub-optimal documentation.
We're getting there now guys! :)

There are still quite a few things to resolve, but it's getting pretty close. I'm going to try and tidy up as much stuff as possible pre-merge to make life as easy as possible. It's probably still going to be rather awkward merging because it's been so long, but I have high hopes that it will go okay in the end.

Once we've got the VPN going fully, I can start slowly deploying new code at WMT. I guess the thing I want to avoid is stuff going wrong in hard-to-debug ways as a result of merging and having not been able to test most of this code at WMT, but we shall see how it goes.
Hamish
hamishmb
Posts: 1891
Joined: 16/05/2017, 16:41

Re: General Software Improvements (3rd June 2020 onwards)

Post by hamishmb »

This week's changes:
  • Fix database disconnection and reconnection.
    • Handle this gracefully.
    • Also improve detection reliability for database connection loss.
    • Reduce overall latency of attempted database queries after database goes down.
  • Add pi CPU and memory load to pi status column in database
  • No longer require testing flag to be specified when running on NAS box
  • Implement system tick table.
  • Fix exception during shutdown on NAS box
  • Add initial support for new NASControl table to allow updating and system shutdown/reboot.
We now have something that we could deploy, even though it's not quite feature complete. Once we've got the deployment plan, and slowly brought the pis up to date with master, this will probably be finished.
Hamish
Post Reply