Перейти к содержанию

NOC 19.1

In accordance to our Release Policy we're proudly present release 19.1.

19.1 release contains of 272 bugfixes, optimisations and improvements.

Highlights

Usability

NOC Theme

19.1 introduces genuine NOC theme intended to replace venerable ExtJS' gray. New flat theme is based upon Triton theme using NOC-branded colors. NOC theme can be activated via config on per-installation basis. We expect to make it default several releases later.

Collection Sharing

Collections is the viable part of NOC. We're gracefully appreciate any contributions. In order to make contribution process easier we'd added Share button just into JSON preview. Enable collections sharing in config and create collections Merge Requests directly from NOC interface by single click.

New fm.alarm

Alarm console was thoroughly reworked. Current filters settings are stored in URL and can be shared with other users. Additional filters on services and subscribers were also added.

New runcommands

Run Commands interface was simplified. Left panel became hidden and working area was enlarged. List of objects can be modified directly from commands panel. Configurable command logging option was added to mrt service.

Alarm acknowledgement

Alarms can be acknowledged by user to show that alarm has been seen and now under investigation.

Integration

We continue to move towards better integration with external systems. Our first priority is clean up and document API to be used by external systems to communicate with NOC.

NBI

A new nbi service has been introduced. nbi service is the host for Northbound Interface API, allowing to access NOC's data from upper-level system.

objectmetrics API <api-nbi-objectmetrics> for requesting metrics has introduced

DataStream

DataStream service <services-datastream> got a lots of improvements:

  • alarm datastream <api-datastream-alarm> for realtime alarm status streaming
  • managedobject datastream <api-datastream-managedobject> got asset part containing hardware inventory data

API Key ACL

API Key <reference-apikey> got and additional ACL, allowing to restrict source addresses for particular keys.

Threshold Profiles

Threshold processing became more flexible. Instead of four fixed levels (Low error, low warning, high warning and high error) an arbitrary amount of levels can be configured via Threshold Profiles. Arbitrary actions can be set for each threshold violation, including:

  • raising of alarm
  • sending of notification
  • calling handlers

Threshold closing condition can differ from opening one, allowing hysteresis to suppress unnecessary flapping.

Syslog archiving

Starting from 19.1 NOC can be used as long-term syslog archive solution. ManagedObjectProfile got additional Syslog Archive Policy setting. When enabled, syslogcollector <service-syslogcollector> service mirrors all received syslog messages to long-term analytic ClickHouse database. ClickHouse supports replication, enforces transparent compression and has very descent IOPS requirements, making it ideal for high-load storage.

Collected messages can be queried both through BI interface and direct SQL queries.

STP Topology metrics

STP topology changes metrics supported out-of-box. Devices' dashboards can show topology changes on graphs and further analytics can be applied. In combination with BI analytics network operators got the valuable tool to investigate short-term traffic disruption problems in large networks.

New platform detection policy

Behavior on new platform detection became configurable. Previous behavior was to automatically create platform, which can lead to headache in particular cases. Now you have and options configured from Managed Object Profile:

  • Create - preserve previous behavior and create new platform automatically (default)
  • Alarm - raise umbrella alarm and stop discovery

Firmware Policy

Behavior on firmware policy violation also became configurable. ManagedObjectProfile allow to configure following options:

  • Ignore - do nothing (default)
  • Ignore&Stop - Stop discovery
  • Raise Alarm - Raise umbrella alarm
  • Raise&Stop - Raise umbrella alarm and stop discovery

New Profiles

19.1 contains support for TV optical-to-RF converters widely used in cable TV networks. 2 profiles has introduced:

  • IRE-Polus.Taros
  • Vector.Lambda

In addition, an NSM.TIMOS <profile-NSM.TIMOS> profile became available

Performance, Scalability and optimisations

Caps Profile

caps discovery <discovery-box-caps> used to collect all known capabilities for platform. Sometimes it is not desired behavior. So Caps profiles are introduced. Caps Profiles allows to enable or disable particular group of capabilities checking. Group of capabilities can be explicitly enabled, disabled or enabled only if required for configured topology discovery.

High-precision timers

19.1 contains time.perf_counter backport to Python 2.7. perf_counter uses CPU counters to measure time intervals. It's about 2x faster than time.time and allows more granularity in time interval measurements (time.time changes only \~64 times per second). This greatly increases precision of span interval measurements and of ping's RTT metrics.

Pymongo connection pool tuning

Our investigations showed that current pymongo's connection pool implementation has design flaw that leads to Pool connection poisoning problem under the common NOC's workfload: once opened mongo connection from discovery never been closed, leaving lots of connection after the spikes of load. We'd implemented own connection pool and submitted pull request to pymongo project (See LIFO connection pool policy).

ClickHouse table cleanup policy

ClickHouse table retention policy may be configured on per-table basis. partition dropping is automated and may be called manually or from cron.

Redis cache backend

Our investigations showed that memcached is prone to randomly forget keys while enough memory is available. This leads to random discovery job states loss, leading to resetting the state of measured snmp counters, loosing random metrics and leaving empty gaps in grafana dashboards. Problem is hard to diagnose and only cure is to restart memcached process. Problem lies deeply in memcached internal architecture and unlikely to be fixed.

So we'd introduced support for Redis cache backend. We'll make decision to make or not to make it default cache backend after testing period.

SO_REUSEPORT & SO_FREEBIND for collectors

syslogcollector <service-syslogcollector> and trapcollector <service-trapcollector> services supports SO_REUSEPORT and SO_FREEBIND options for listeners.

SO_REUSEPORT allows to share single port by several collector' processes using in-kernel load balancing, greatly improving collectors' throughoutput.

SO_FREEBIND allows to bind to non-existing address, opening support for floating virtual addresses for collector (VRRP), CARP) etc), adding necessary level of redundancy.

In combination with new Syslog Archive <release-19.1-syslog-archive> and ClickHouse table cleanup policy <release-19.1-clickhouse-cleanup> features NOC can be turned to high-performance syslog archiving solution.

GridVCS

GridVCS is NOC's high-performance redundant version control system used to store device configuration history. 19.1 release introduces several improvements to GridVCS subsystem.

  • built-in compression - though Mongo's Wired Tiger uses transparent compression on storage level, explicit compression on GridVCS level reduces both disk usage and database server traffic.
  • Previous releases used mercurial's mdiff to calculate config deltas. 19.1 uses BSDIFF4 format by default. During our tests BSDIFF4 showed better results in speed and delta size.
  • ./noc gridvcs <man-gridvcs> command got additional compress subcommand, allowing to apply both compression and BSDIFF4 deltas to already collected data. While it can take a time for large storages it can free up significant disk space.

API improvements

profile.py

SA profiles <profiles> used to live in __init__.py file. Our code style advises to keep __init__.py empty for various reason. Some features like profile loading from custom will not work with __init__.py anyway.

So starting with 19.1 it is recommended to place profile's code into profile.py file. Loading from __init__.py is still supported but it is a good time to plan migration of custom profiles.

OIDRule: High-order scale functions

Metrics scale can be defined as high-order functions, i.e. function returning other functions. It's greatly increase flexibility of scaling subsystem and allows external configuration of scaling processing.

IPAM seen propagation

Workflow's seen signal can be configured to propagate up to the parent prefixes. Address and Prefix profiles got new Seen propagation policy setting which determines should or should not parent prefix will be notified of child element seen by discovery.

Common usage pattern is to propagate seen to aggregate prefixes to get notified when aggregate became used.

Phone workflow

phone module got full-blown workflow support. Each phone number and phone range has own state which can be changed manually or via external signals.

Breaking Changes

Migration

New features

MRTitle
MR1515Add estimate param to job command.
MR1525Collection sharing
MR1498DataStream: asset part of ManagedObject
MR1516APIKey ACL
MR1518Add export/import to ./noc beef command.
MR1514Configurable behavior on new platforms and firmware policy violations
MR1512new fm-alarm
MR1508IRE-Polus.Taros profile
MR1507Summary glyph display order
MR1501Add Errors Out and Discards In for ddash
MR1595Add periodic diagnostic to alarm diagnostic.
MR1460ThresholdProfile: Flexible thresholds configuration
MR1497Alarm acknowledge/unacknowledge
MR1491network stp topology changes on graph
MR1476GridVCS: bsdiff4 patches and zlib compression
MR1432Add initial support for NSN.TIMOS profile
MR1475High-precision timers
MR1458Add Network \| STP \| Topology Changes metric.
MR1455CapsProfile
MR1396redis cache backend
MR1404#794: IPAM seen propagation policy
MR1384card: project card
MR1390#942: Remove Root container
MR1352#694 ClickHouse table cleaning policy
MR1363Vector.Lambda profile
MR1283NOC theme
MR1336OIDRule: High-order scale functions
MR1338#539 Syslog archiving
MR1255nbi service
MR1345#497 syslogcollector/trapcollector: SO_REUSEPORT and IP_FREEBIND support
MR1252datastream: Alarm datastream
MR1226#636 Phone Workflow integraton
MR1113Profiles should be moved to profile.py

Improvements

MRTitle
MR1534Set default loglevel on command to info.
MR1535Update RU translation.
MR1527FM Alarms localization
MR1529Add full_name to PlatformApplication query fields.
MR1522Update/report interface status3
MR1510Update DLink.DxS profile
MR1556Update Rotek.BT profile (get_version)
MR1539Update settings by snmp requests for Dlink.DxS
MR1500Update Juniper.JUNOS profile
MR1503Speedup NetworkSegment Service Summary count.
MR1502Update Report for Interfaces Status
MR1490Generic.get_chassis_id disable Multicast MAC address check.
MR1494SKS.SKS and BDCOM.IOS config volatile.
MR1488Add platform to Linksys.SPS2xx profile.
MR1451Unified loader interface
MR1485Add caps profile to managedobject profile ETL loader.
MR1484Add to Linksys.SPS24xx platform OID
MR1434./noc dnszone import: Parse complex \$TTL directives
MR1452Move methods from SegmentTopology to BaseTopology
MR1449inv.networksegment: Bulk fields calculation
MR1454Add to_python method to ClickHouse model.
MR1466Add to Huawei.VRP profile get Serial Number attributes.
MR1453ResourceGroup: TreeCombo
MR1461Add config_volatile to Orion.NOS and SKS.SKS
MR1447Increase query interval for core.pm.utils function.
MR1417Extendable Generic.get_chassis_id script
MR1441Add patern more to Huawei.MA5600T profile.
MR1440Optimize reportalarmdetail and reportobjectdetail.
MR1439Update/eltex mes execute snmp
MR1437Delete aggregateinterface bi model
MR1420Add dynamically loader BI models.
MR1418RepoPreview MVVC
MR1427Migrate Alstec.24xx.get_metrics to new model.
MR1414networkx 2.2 and improvend spring layout implementation
MR1413dns.dnsserver: Remove sync field
MR1400requests 2.20.0
MR1392Diverged permissions
MR1382#961 Process All addresses and Loopback address syslog/trap source types
MR1408Add Generic.get_vlans and get_switchport scripts.
MR1409Add get_lldp_snmp capabilities for Cisco.IOS
MR1410Change Iface Name OID for get_ifindexes Plante.WCDG profile
MR1374migrate inv map to leafletjs
MR1381#971 trapcollector: Gentler handling of BER decoding errors
MR1371dnszone: Ignore addresses with missed FQDNs
MR1369Add theme variable to login page render.
MR1368Add "Up/10M" to reportcolumndatasource for report object detail.
MR1391CODEOWNERS file
MR1353#788 Try to determine VRF's for DHCP address discovery
MR1361DataStream: Load from custom
MR1251Customized PyMongo connection pool
MR1397Juniper.junos
MR1398auto logout remove msg
MR1385Dead code cleanup
MR1284runcommands refactoring
MR1375Cleanup pyrule from classifier trigger.
MR1341theme body padding for form
MR1362Add convert ifname for MA4000
MR1349Cleanup AlliedTelesis profiles.
MR1346snmp: Try to negotiate broken error_index
MR1344Add Interface packets dashboard in MO dash.
MR1318Migrate ReportProfileCheck report to ReportStat Backend.
MR1228Move numpy import to parse_table_header in lib/text.
MR1316Additional LLDP constants and caps conversion functions
MR1324Add TZ parameter to NBI query.
MR1126#260 add password widget
MR1322Add get_lldp_neighbors and get_capabilities for Qtech2500 profile
MR1264Add clean to events command.
MR1307Update Alcatel.OS62xx profile
MR1285Hp.1910
MR1190Update Rotek.RTBSv1 profile
MR1297Add Rotek.RTBSv1.get_metrics script.
MR1296add get_config script for Dlink.DVG profile
MR1291Extend job command.
MR1276Add clean_id_bson to alarm datastream.
MR1274threadpool: Cleanup worker result just after setting future
MR1286Add late_alarm metric to seflmon fm collector.
MR1249Profile.cli_retries_super_password parameter
MR1250perm: response layout
MR1229ldap: Additional check of username format
MR1214Add telemetry to MRT service.
MR1244Add physical iface count metrics to selfmon.
MR1216Add vv (very verbose parameter) to test command.

Bugfixes

MRTitle
MR1487Use ch_escape function on syslogcollector.
MR1478Fix Report Unknown Model Summary.
MR1477Fix Generic.get_capabilities snmp_v1
MR1474Fix load metric priority. Profile first, Generic second.
MR1473Fix Radio and SLA graph template for CH use.
MR1481Fix displaying platform in some Cisco Stackable switches
MR1479Fix Rotek RTBSv1 Tx Power metric
MR1438Fix Huawei.VRP.get_mac_address_table script
MR1422Fix MikroTik.RouterOS.get_interface_status_ex script
MR1462Fix heavy cpu load on show vlan command
MR1469Fix Huawei.VRP.get_version SerialNumber rogue chart.
MR1467Fix DLink.DxS profile
MR1463Fix Extreme.XOS.get_interfaces script
MR1465Fix PrefixBookmark import loop.
MR1464Fix selfmon FM metric name.
MR1457Fix getting single oid from multiple metrics.
MR1444Fix Iskratel.MSAN profile
MR1450Fix Orion.NOS.get_lldp_neighbors script
MR1433Fix Cisco.IOSXR profile
MR1436Fix Cisco.NXOS.get_arp script
MR1448Fix c.id in card.base.f_object_location.
MR1445login button width fixed
MR1459Lambda fix metrics
MR1468Huawei.VRP.get_version strip serial number.
MR1435InfiNet fix init.py pattern_prompt
MR1426inv.map fix performance
MR1443Fix Object.get_coordinate_zoom method.
MR1428Fix Huawei.MA5600T profile
MR1430Fix Alstec.24xx metric name.
MR1289Fix Juniper.JUNOS.get_lldp_neighbors Parameter 'remote_port' required.
MR1423Fix managedobject and object card for delete Root.
MR1429Fix avs Object.get_address_text method
MR1424Fix getting container path in Alarm Web and Card.
MR1425Fix typo in ManagedObject console UI.
MR1483Fix Raisecom.ROS.get_lldp_neighbors script
MR1395Fix container field type when remove Root.
MR1401ip.ipam: Fix prefix style
MR1411Fix Add Objects to Maintenance from SA !582
MR1386fix error "Отсутствуют адреса линка" in dns.reportmissedp2p
MR1405Fix Discovery Problem Detail report trace.
MR1394Fix get_lldp_neighbors by SNMP
MR1407Fix Plantet.WGSD Profile
MR1403#976 Fix closing of already closed session
MR1406Fix avs environments graph tmpl 148
MR1402jsloader fixed
MR1399Fix Ubiquiti profile and Generic.get_interfaces(get_bulk)
MR1389Fix Report Discovery Poison
MR1378Fix theme variable in desktop.html template.
MR1379Fix etl managedobject resourcegroup
MR1367Fix prompt in Rotek.RTBS.v1 profile.
MR1366Fix workflow CH dictionary.
MR1365Fix selfmon FM collector.
MR1364Fix update operation for superuser on secret field.
MR1376noc/noc#952 Fix metric path for Environment metric scope.
MR1310#964 Fix SA sessions leaking
MR1357Natex_fix_sn
MR1355Cisco_fix_snmp
MR1370Increase ManagedObject cache version for syslog archive field.
MR1356Fix Interface name Eltex.MES
MR1354Fix Interface name QSW2500
MR1335Fix get_interfaces, add reth aenet
MR1343Fix profilecheckdetail.
MR1342Fix secret field.
MR1351InfiNet-fix-get_version
MR1350Fix get_interfaces for Telindus profile
MR1348Fix stacked packets graph.
MR1360Fix Interface name ROS
MR1326Fix ch_state ch datasource.
MR1332Fix Span Card view from ClickHouse data.
MR1331Fix Huawei.MA5600T.get_cpe.
MR1328Fix Cisco.IOS.get_lldp_neighbors regex
MR1327Fix get_interfaces for Rotek.RTBSv1, add rule for platform RT-BS24
MR1325Fix CLIPS engine in slots.
MR1320Fix SNMP Trap OID Resolver
MR1323Fix get_interfaces for QSW2500 (dowwn -> down)
MR1269Fix Juniper.JUNOSe.get_interfaces script
MR1278Fix Huawei.MA5600T.get_cpe ValueError.
MR1314Fix Generic.get_chassis_id script
MR1306Fix AlliedTelesis.AT8000S.get_interfaces script
MR1313Fix Cisco.IOS.get_version for ME series
MR1262Fix Raisecom.RCIOS password prompt matching
MR1238Fix Juniper.JUNOS profile
MR1279Fixes empty range list in discoveryid.
MR1305Fix Rotek.RTBS profiles.
MR1304Fix some attributes for Span in MRT serivce
MR1303Fix selfmon escalator metrics.
MR1300fm.eventclassificationrule: Fix creating from event
MR1295Fix ./noc mib lookup
MR1298Fix custom metrics path in Generic.get_metrics.
MR1290Fix custom metrics.
MR1225noc/noc#954 Fix Cisco.IOS.get_inventory script
MR1275Fix InfiNet.WANFlexX.get_lldp_neighbors script
MR1281Delete quit() in script
MR1280Fit get_config
MR1277Fix Zhone.Bitstorm.get_interfaces script
MR1254Fix InfiNet.WANFlexX.get_interfaces script
MR1272Fix vendor name in SAE script credentials.
MR1246Fix Huawei.VRP pager
MR1268Fix scheme migrations
MR1245Fix Huawei.VRP3 prompt match
MR1259fix_error_web
MR1258Fix managed_object_platform migration.
MR1260Fix pm.util.get_objects_metrics if object_profile metrics empty.
MR1253Fix path in radius(services)
MR1203Fix prompt pattern in Eltex.DSLAM profile
MR1247Fix consul resolver index handling
MR1239#911 consul: Fix faulty state caused by changes in consul timeout behavior
MR1237#956 fix web scripts
MR1221Fix Generic.get_lldp_neighbors script
MR1243Fix now shift for selfmon task late.
MR1231noc/noc#946 Fix ManagedObject web console.
MR1235Fix futurize in SLA probe.
MR1234Fix Huawei.MA5600T.get_cpe.
MR1220Fix Generic.get_interfaces script
MR1204Fix Raisecom.ROS.get_interfaces script
MR1215Fix platform field in Platform Card.
MR1210ManagedObject datastream: Fix links property. capabilities property
MR1212Fix save empty metrics threshold in ManagedObjectProfile UI.
MR1211Fix interface validation errors in Huawei.VRP, Siklu.EH, Zhone.Bitstorm.
MR1317sa.managedobjectprofile: Fix text
MR1340noc/noc#966
MR1294selfmon typo in mo
MR1105#856 Rack view fix
MR1208#947 Fix MAC ranges optimization