Перейти к содержанию

NOC 19.1 is Released

In accordance to our Release Policy we're proudly present release 19.1.

19.1 release contains of 272 bugfixes, optimisations and improvements.

Highlights

Usability

NOC Theme

19.1 introduces genuine NOC theme intended to replace venerable ExtJS' gray. New flat theme is based upon Triton theme using NOC-branded colors. NOC theme can be activated via config on per-installation basis. We expect to make it default several releases later.

Collection Sharing

Collections is the viable part of NOC. We're gracefully appreciate any contributions. In order to make contribution process easier we'd added Share button just into JSON preview. Enable collections sharing in config and create collections Merge Requests directly from NOC interface by single click.

New fm.alarm

Alarm console was thoroughly reworked. Current filters settings are stored in URL and can be shared with other users. Additional filters on services and subscribers were also added.

New runcommands

Run Commands interface was simplified. Left panel became hidden and working area was enlarged. List of objects can be modified directly from commands panel. Configurable command logging option was added to mrt service.

Alarm acknowledgement

Alarms can be acknowledged by user to show that alarm has been seen and now under investigation.

Integration

We continue to move towards better integration with external systems. Our first priority is clean up and document API to be used by external systems to communicate with NOC.

NBI

A new NBI Service has introduced. nbi service is the host for Northbound Interface API, allowing to access NOC's data from upper-level system.

objectmetrics API for requesting metrics has introduced

DataStream

DataStream service got a lots of improvements:

API Key ACL

API Key got and additional ACL, allowing to restrict source addresses for particular keys.

Threshold Profiles

Threshold processing became more flexible. Instead of four fixed levels (Low error, low warning, high warning and high error) an arbitrary amount of levels can be configured via Threshold Profiles. Arbitrary actions can be set for each threshold violation, including: * raising of alarm * sending of notification * calling handlers

Threshold closing condition can differ from opening one, allowing hysteresis to suppress unnecessary flapping.

Syslog archiving

Starting from 19.1 NOC can be used as long-term syslog archive solution. ManagedObjectProfile got additional Syslog Archive Policy setting. When enabled, syslogcollector service mirrors all received syslog messages to long-term analytic ClickHouse database. ClickHouse supports replication, enforces transparent compression and has very descent IOPS requirements, making it ideal for high-load storage.

Collected messages can be queried both through BI interface and direct SQL queries.

STP Topology metrics

STP topology changes metrics supported out-of-box. Devices' dashboards can show topology changes on graphs and further analytics can be applied. In combination with BI analytics network operators got the valuable tool to investigate short-term traffic disruption problems in large networks.

New platform detection policy

Behavior on new platform detection became configurable. Previous behavior was to automatically create platform, which can lead to headache in particular cases. Now you have and options configured from Managed Object Profile:

  • Create - preserve previous behavior and create new platform automatically (default)
  • Alarm - raise umbrella alarm and stop discovery

Firmware Policy

Behavior on firmware policy violation also became configurable. ManagedObjectProfile allow to configure following options:

  • Ignore - do nothing (default)
  • Ignore&Stop - Stop discovery
  • Raise Alarm - Raise umbrella alarm
  • Raise&Stop - Raise umbrella alarm and stop discovery

New Profiles

19.1 contains support for TV optical-to-RF converters widely used in cable TV networks. 2 profiles has introduced:

  • IRE-Polus.Taros
  • Vector.Lambda

In addition, an NSM.TIMOS profile became available

Performance, Scalability and optimisations

Caps Profile

caps discovery used to collect all known capabilities for platform. Sometimes it is not desired behavior. So Caps profiles are introduced. Caps Profiles allows to enable or disable particular group of capabilities checking. Group of capabilities can be explicitly enabled, disabled or enabled only if required for configured topology discovery.

High-precision timers

19.1 contains time.perf_counter backport to Python 2.7. perf_counter uses CPU counters to measure time intervals. It's about 2x faster than time.time and allows more granularity in time interval measurements (time.time changes only ~64 times per second). This greatly increases precision of span interval measurements and of ping's RTT metrics.

Pymongo connection pool tuning

Our investigations showed that current pymongo's connection pool implementation has design flaw that leads to Pool connection poisoning problem under the common NOC's workfload: once opened mongo connection from discovery never been closed, leaving lots of connection after the spikes of load. We'd implemented own connection pool and submitted pull request to pymongo project (See LIFO connection pool policy).

ClickHouse table cleanup policy

ClickHouse table retention policy may be configured on per-table basis. partition dropping is automated and may be called manually or from cron.

Redis cache backend

Our investigations showed that memcached is prone to randomly forget keys while enough memory is available. This leads to random discovery job states loss, leading to resetting the state of measured snmp counters, loosing random metrics and leaving empty gaps in grafana dashboards. Problem is hard to diagnose and only cure is to restart memcached process. Problem lies deeply in memcached internal architecture and unlikely to be fixed.

So we'd introduced support for redis cache backend. We'll make decision to make or not to make it default cache backend after testing period.

SO_REUSEPORT & SO_FREEBIND for collectors

syslogcollector and trapcollector services supports SO_REUSEPORT and SO_FREEBIND options for listeners.

SO_REUSEPORT allows to share single port by several collector' processes using in-kernel load balancing, greatly improving collectors' throughoutput.

SO_FREEBIND allows to bind to non-existing address, opening support for floating virtual addresses for collector (VRRP, CARP etc), adding necessary level of redundancy.

In combination with new Syslog Archive and ClickHouse table cleanup policy features NOC can be turned to high-performance syslog archiving solution.

GridVCS

GridVCS is NOC's high-performance redundant version control system used to store device configuration history. 19.1 release introduces several improvements to GridVCS subsystem.

  • built-in compression - though Mongo's Wired Tiger uses transparent compression on storage level, explicit compression on GridVCS level reduces both disk usage and database server traffic.

  • Previous releases used mercurial's mdiff to calculate config deltas. 19.1 uses BSDIFF4 format by default. During our tests BSDIFF4 showed better results in speed and delta size.

  • ./noc gridvcs command got additional compress subcommand, allowing to apply both compression and BSDIFF4 deltas to already collected data. While it can take a time for large storages it can free up significant disk space.

API improvements

profile.py

SA profiles used to live in __init__.py file. Our code style advises to keep __init__.py empty for various reason. Some features like profile loading from custom will not work with __init__.py anyway.

So starting with 19.1 it is recommended to place profile's code into profile.py file. Loading from __init__.py is still supported but it is a good time to plan migration of custom profiles.

OIDRule: High-order scale functions

Metrics scale can be defined as high-order functions, i.e. function returning other functions. It's greatly increase flexibility of scaling subsystem and allows external configuration of scaling processing.

IPAM seen propagation

Workflow's seen signal can be configured to propagate up to the parent prefixes. Address and Prefix profiles got new Seen propagation policy setting which determines should or should not parent prefix will be notified of child element seen by discovery.

Common usage pattern is to propagate seen to aggregate prefixes to get notified when aggregate became used.

Phone workflow

phone module got full-blown workflow support. Each phone number and phone range has own state which can be changed manually or via external signals.