NOC 19.1 is Released¶
19.1 release contains of 272 bugfixes, optimisations and improvements.
19.1 introduces genuine NOC theme intended to replace venerable ExtJS' gray. New flat theme is based upon Triton theme using NOC-branded colors. NOC theme can be activated via config on per-installation basis. We expect to make it default several releases later.
Collections is the viable part of NOC. We're gracefully appreciate any contributions. In order to make contribution process easier we'd added Share button just into JSON preview. Enable collections sharing in config and create collections Merge Requests directly from NOC interface by single click.
Alarm console was thoroughly reworked. Current filters settings are stored in URL and can be shared with other users. Additional filters on services and subscribers were also added.
Run Commands interface was simplified. Left panel became hidden and working area was enlarged. List of objects can be modified directly from commands panel. Configurable command logging option was added to mrt service.
Alarms can be acknowledged by user to show that alarm has been seen and now under investigation.
We continue to move towards better integration with external systems. Our first priority is clean up and document API to be used by external systems to communicate with NOC.
A new NBI Service has introduced. nbi service is the host for Northbound Interface API, allowing to access NOC's data from upper-level system.
objectmetrics API for requesting metrics has introduced
DataStream service got a lots of improvements:
- alarm datastream for realtime alarm status streaming
- managedobject datastream got asset part containing hardware inventory data
API Key ACL¶
API Key got and additional ACL, allowing to restrict source addresses for particular keys.
Threshold processing became more flexible. Instead of four fixed levels (Low error, low warning, high warning and high error) an arbitrary amount of levels can be configured via Threshold Profiles. Arbitrary actions can be set for each threshold violation, including: * raising of alarm * sending of notification * calling handlers
Threshold closing condition can differ from opening one, allowing hysteresis to suppress unnecessary flapping.
Starting from 19.1 NOC can be used as long-term syslog archive solution. ManagedObjectProfile got additional Syslog Archive Policy setting. When enabled, syslogcollector service mirrors all received syslog messages to long-term analytic ClickHouse database. ClickHouse supports replication, enforces transparent compression and has very descent IOPS requirements, making it ideal for high-load storage.
Collected messages can be queried both through BI interface and direct SQL queries.
STP Topology metrics¶
STP topology changes metrics supported out-of-box. Devices' dashboards can show topology changes on graphs and further analytics can be applied. In combination with BI analytics network operators got the valuable tool to investigate short-term traffic disruption problems in large networks.
New platform detection policy¶
Behavior on new platform detection became configurable. Previous behavior was to automatically create platform, which can lead to headache in particular cases. Now you have and options configured from Managed Object Profile:
- Create - preserve previous behavior and create new platform automatically (default)
- Alarm - raise umbrella alarm and stop discovery
Behavior on firmware policy violation also became configurable. ManagedObjectProfile allow to configure following options:
- Ignore - do nothing (default)
- Ignore&Stop - Stop discovery
- Raise Alarm - Raise umbrella alarm
- Raise&Stop - Raise umbrella alarm and stop discovery
19.1 contains support for TV optical-to-RF converters widely used in cable TV networks. 2 profiles has introduced:
In addition, an
NSM.TIMOS profile became available
Performance, Scalability and optimisations¶
caps discovery used to collect all known capabilities for platform. Sometimes it is not desired behavior. So Caps profiles are introduced. Caps Profiles allows to enable or disable particular group of capabilities checking. Group of capabilities can be explicitly enabled, disabled or enabled only if required for configured topology discovery.
time.perf_counter backport to Python 2.7.
perf_counter uses CPU counters to measure time intervals. It's about 2x faster than
time.time and allows more granularity in time interval measurements (
time.time changes only ~64 times per second). This greatly increases precision of span interval measurements and of ping's RTT metrics.
Pymongo connection pool tuning¶
Our investigations showed that current pymongo's connection pool implementation has design flaw that leads to Pool connection poisoning problem under the common NOC's workfload: once opened mongo connection from discovery never been closed, leaving lots of connection after the spikes of load. We'd implemented own connection pool and submitted pull request to pymongo project (See LIFO connection pool policy).
ClickHouse table cleanup policy¶
ClickHouse table retention policy may be configured on per-table basis. partition dropping is automated and may be called manually or from cron.
Redis cache backend¶
Our investigations showed that memcached is prone to randomly forget keys while enough memory is available. This leads to random discovery job states loss, leading to resetting the state of measured snmp counters, loosing random metrics and leaving empty gaps in grafana dashboards. Problem is hard to diagnose and only cure is to restart memcached process. Problem lies deeply in memcached internal architecture and unlikely to be fixed.
So we'd introduced support for redis cache backend. We'll make decision to make or not to make it default cache backend after testing period.
SO_REUSEPORT & SO_FREEBIND for collectors¶
SO_REUSEPORT allows to share single port by several collector' processes using in-kernel load balancing, greatly improving collectors' throughoutput.
In combination with new
Syslog Archive and
ClickHouse table cleanup policy features NOC can be turned to high-performance syslog archiving solution.
GridVCS is NOC's high-performance redundant version control system used to store device configuration history. 19.1 release introduces several improvements to GridVCS subsystem.
built-in compression - though Mongo's Wired Tiger uses transparent compression on storage level, explicit compression on GridVCS level reduces both disk usage and database server traffic.
Previous releases used mercurial's mdiff to calculate config deltas. 19.1 uses BSDIFF4 format by default. During our tests BSDIFF4 showed better results in speed and delta size.
./noc gridvcscommand got additional
compresssubcommand, allowing to apply both compression and BSDIFF4 deltas to already collected data. While it can take a time for large storages it can free up significant disk space.
SA profiles used to live in
__init__.py file. Our code style advises to keep
__init__.py empty for various reason. Some features like profile loading from
custom will not work with
So starting with 19.1 it is recommended to place profile's code into
profile.py file. Loading from
__init__.py is still supported but it is a good time to plan migration of custom profiles.
OIDRule: High-order scale functions¶
scale can be defined as high-order functions, i.e. function returning other functions. It's greatly increase flexibility of scaling subsystem and allows external configuration of scaling processing.
seen signal can be configured to propagate up to the parent prefixes. Address and Prefix profiles got new
Seen propagation policy setting which determines should or should not parent prefix will be notified of child element seen by discovery.
Common usage pattern is to propagate
seen to aggregate prefixes to get notified when aggregate became used.
phone module got full-blown workflow support. Each phone number and phone range has own state which can be changed manually or via external signals.