NOC 19.1 is Released¶
In accordance to our Release Policy we're proudly present release 19.1.
19.1 release contains of 272 bugfixes, optimisations and improvements.
Highlights¶
Usability¶
NOC Theme¶
19.1 introduces genuine NOC theme intended to replace venerable ExtJS' gray. New flat theme is based upon Triton theme using NOC-branded colors. NOC theme can be activated via config on per-installation basis. We expect to make it default several releases later.
Collection Sharing¶
Collections is the viable part of NOC. We're gracefully appreciate any contributions. In order to make contribution process easier we'd added Share button just into JSON preview. Enable collections sharing in config and create collections Merge Requests directly from NOC interface by single click.
New fm.alarm¶
Alarm console was thoroughly reworked. Current filters settings are stored in URL and can be shared with other users. Additional filters on services and subscribers were also added.
New runcommands¶
Run Commands interface was simplified. Left panel became hidden and working area was enlarged. List of objects can be modified directly from commands panel. Configurable command logging option was added to mrt service.
Alarm acknowledgement¶
Alarms can be acknowledged by user to show that alarm has been seen and now under investigation.
Integration¶
We continue to move towards better integration with external systems. Our first priority is clean up and document API to be used by external systems to communicate with NOC.
NBI¶
A new NBI Service has introduced. nbi service is the host for Northbound Interface API, allowing to access NOC's data from upper-level system.
objectmetrics API for requesting metrics has introduced
DataStream¶
DataStream service got a lots of improvements:
- alarm datastream for realtime alarm status streaming
- managedobject datastream got asset part containing hardware inventory data
API Key ACL¶
API Key got and additional ACL, allowing to restrict source addresses for particular keys.
Threshold Profiles¶
Threshold processing became more flexible. Instead of four fixed levels (Low error, low warning, high warning and high error) an arbitrary amount of levels can be configured via Threshold Profiles. Arbitrary actions can be set for each threshold violation, including: * raising of alarm * sending of notification * calling handlers
Threshold closing condition can differ from opening one, allowing hysteresis to suppress unnecessary flapping.
Syslog archiving¶
Starting from 19.1 NOC can be used as long-term syslog archive solution. ManagedObjectProfile got additional Syslog Archive Policy setting. When enabled, syslogcollector service mirrors all received syslog messages to long-term analytic ClickHouse database. ClickHouse supports replication, enforces transparent compression and has very descent IOPS requirements, making it ideal for high-load storage.
Collected messages can be queried both through BI interface and direct SQL queries.
STP Topology metrics¶
STP topology changes metrics supported out-of-box. Devices' dashboards can show topology changes on graphs and further analytics can be applied. In combination with BI analytics network operators got the valuable tool to investigate short-term traffic disruption problems in large networks.
New platform detection policy¶
Behavior on new platform detection became configurable. Previous behavior was to automatically create platform, which can lead to headache in particular cases. Now you have and options configured from Managed Object Profile:
- Create - preserve previous behavior and create new platform automatically (default)
- Alarm - raise umbrella alarm and stop discovery
Firmware Policy¶
Behavior on firmware policy violation also became configurable. ManagedObjectProfile allow to configure following options:
- Ignore - do nothing (default)
- Ignore&Stop - Stop discovery
- Raise Alarm - Raise umbrella alarm
- Raise&Stop - Raise umbrella alarm and stop discovery
New Profiles¶
19.1 contains support for TV optical-to-RF converters widely used in cable TV networks. 2 profiles has introduced:
- IRE-Polus.Taros
- Vector.Lambda
In addition, an NSM.TIMOS
profile became available
Performance, Scalability and optimisations¶
Caps Profile¶
caps
discovery used to collect all known capabilities for platform. Sometimes it is not desired behavior. So Caps profiles are introduced. Caps Profiles allows to enable or disable particular group of capabilities checking. Group of capabilities can be explicitly enabled, disabled or enabled only if required for configured topology discovery.
High-precision timers¶
19.1 contains time.perf_counter
backport to Python 2.7. perf_counter
uses CPU counters to measure time intervals. It's about 2x faster than time.time
and allows more granularity in time interval measurements (time.time
changes only ~64 times per second). This greatly increases precision of span interval measurements and of ping's RTT metrics.
Pymongo connection pool tuning¶
Our investigations showed that current pymongo's connection pool implementation has design flaw that leads to Pool connection poisoning problem under the common NOC's workfload: once opened mongo connection from discovery never been closed, leaving lots of connection after the spikes of load. We'd implemented own connection pool and submitted pull request to pymongo project (See LIFO connection pool policy).
ClickHouse table cleanup policy¶
ClickHouse table retention policy may be configured on per-table basis. partition dropping is automated and may be called manually or from cron.
Redis cache backend¶
Our investigations showed that memcached is prone to randomly forget keys while enough memory is available. This leads to random discovery job states loss, leading to resetting the state of measured snmp counters, loosing random metrics and leaving empty gaps in grafana dashboards. Problem is hard to diagnose and only cure is to restart memcached process. Problem lies deeply in memcached internal architecture and unlikely to be fixed.
So we'd introduced support for redis cache backend. We'll make decision to make or not to make it default cache backend after testing period.
SO_REUSEPORT & SO_FREEBIND for collectors¶
syslogcollector and trapcollector services supports SO_REUSEPORT
and SO_FREEBIND
options for listeners.
SO_REUSEPORT
allows to share single port by several collector' processes using in-kernel load balancing, greatly improving collectors' throughoutput.
SO_FREEBIND
allows to bind to non-existing address, opening support for floating virtual addresses for collector (VRRP, CARP etc), adding necessary level of redundancy.
In combination with new Syslog Archive
and ClickHouse table cleanup policy
features NOC can be turned to high-performance syslog archiving solution.
GridVCS¶
GridVCS is NOC's high-performance redundant version control system used to store device configuration history. 19.1 release introduces several improvements to GridVCS subsystem.
built-in compression - though Mongo's Wired Tiger uses transparent compression on storage level, explicit compression on GridVCS level reduces both disk usage and database server traffic.
Previous releases used mercurial's mdiff to calculate config deltas. 19.1 uses BSDIFF4 format by default. During our tests BSDIFF4 showed better results in speed and delta size.
./noc gridvcs
command got additionalcompress
subcommand, allowing to apply both compression and BSDIFF4 deltas to already collected data. While it can take a time for large storages it can free up significant disk space.
API improvements¶
profile.py¶
SA profiles used to live in __init__.py
file. Our code style advises to keep __init__.py
empty for various reason. Some features like profile loading from custom
will not work with __init__.py
anyway.
So starting with 19.1 it is recommended to place profile's code into profile.py
file. Loading from __init__.py
is still supported but it is a good time to plan migration of custom profiles.
OIDRule: High-order scale functions¶
Metrics scale
can be defined as high-order functions, i.e. function returning other functions. It's greatly increase flexibility of scaling subsystem and allows external configuration of scaling processing.
IPAM seen
propagation¶
Workflow's seen
signal can be configured to propagate up to the parent prefixes. Address and Prefix profiles got new Seen propagation policy
setting which determines should or should not parent prefix will be notified of child element seen by discovery.
Common usage pattern is to propagate seen
to aggregate prefixes to get notified when aggregate became used.
Phone workflow¶
phone
module got full-blown workflow support. Each phone number and phone range has own state which can be changed manually or via external signals.