NOC 19.1

18.01.2019

In accordance to our Release Policy we’re proudly present release 19.1.

19.1 release contains of 272 bugfixes, optimisations and improvements.

Highlights

Usability

NOC Theme

19.1 introduces genuine NOC theme intended to replace venerable ExtJS’ gray. New flat theme is based upon Triton theme using NOC-branded colors. NOC theme can be activated via config on per-installation basis. We expect to make it default several releases later.

Collection Sharing

Collections is the viable part of NOC. We’re gracefully appreciate any contributions. In order to make contribution process easier we’d added Share button just into JSON preview. Enable collections sharing in config and create collections Merge Requests directly from NOC interface by single click.

New fm.alarm

Alarm console was thoroughly reworked. Current filters settings are stored in URL and can be shared with other users. Additional filters on services and subscribers were also added.

New runcommands

Run Commands interface was simplified. Left panel became hidden and working area was enlarged. List of objects can be modified directly from commands panel. Configurable command logging option was added to mrt service.

Alarm acknowledgement

Alarms can be acknowledged by user to show that alarm has been seen and now under investigation.

Integration

We continue to move towards better integration with external systems. Our first priority is clean up and document API to be used by external systems to communicate with NOC.

NBI

A new NBI Service has introduced. nbi service is the host for Northbound Interface API, allowing to access NOC’s data from upper-level system.

objectmetrics API for requesting metrics has introduced

DataStream

DataStream service got a lots of improvements:

API Key ACL

API Key got and additional ACL, allowing to restrict source addresses for particular keys.

Threshold Profiles

Threshold processing became more flexible. Instead of four fixed levels (Low error, low warning, high warning and high error) an arbitrary amount of levels can be configured via Threshold Profiles. Arbitrary actions can be set for each threshold violation, including: * raising of alarm * sending of notification * calling handlers

Threshold closing condition can differ from opening one, allowing hysteresis to suppress unnecessary flapping.

Syslog archiving

Starting from 19.1 NOC can be used as long-term syslog archive solution. ManagedObjectProfile got additional Syslog Archive Policy setting. When enabled, syslogcollector service mirrors all received syslog messages to long-term analytic ClickHouse database. ClickHouse supports replication, enforces transparent compression and has very descent IOPS requirements, making it ideal for high-load storage.

Collected messages can be queried both through BI interface and direct SQL queries.

STP Topology metrics

STP topology changes metrics supported out-of-box. Devices’ dashboards can show topology changes on graphs and further analytics can be applied. In combination with BI analytics network operators got the valuable tool to investigate short-term traffic disruption problems in large networks.

New platform detection policy

Behavior on new platform detection became configurable. Previous behavior was to automatically create platform, which can lead to headache in particular cases. Now you have and options configured from Managed Object Profile:

  • Create - preserve previous behavior and create new platform automatically (default)
  • Alarm - raise umbrella alarm and stop discovery

Firmware Policy

Behavior on firmware policy violation also became configurable. ManagedObjectProfile allow to configure following options:

  • Ignore - do nothing (default)
  • Ignore&Stop - Stop discovery
  • Raise Alarm - Raise umbrella alarm
  • Raise&Stop - Raise umbrella alarm and stop discovery

New Profiles

19.1 contains support for TV optical-to-RF converters widely used in cable TV networks. 2 profiles has introduced:

  • IRE-Polus.Taros
  • Vector.Lambda

In addition, an NSM.TIMOS profile became available

Performance, Scalability and optimisations

Caps Profile

caps discovery used to collect all known capabilities for platform. Sometimes it is not desired behavior. So Caps profiles are introduced. Caps Profiles allows to enable or disable particular group of capabilities checking. Group of capabilities can be explicitly enabled, disabled or enabled only if required for configured topology discovery.

High-precision timers

19.1 contains time.perf_counter backport to Python 2.7. perf_counter uses CPU counters to measure time intervals. It’s about 2x faster than time.time and allows more granularity in time interval measurements (time.time changes only ~64 times per second). This greatly increases precision of span interval measurements and of ping’s RTT metrics.

Pymongo connection pool tuning

Our investigations showed that current pymongo’s connection pool implementation has design flaw that leads to Pool connection poisoning problem under the common NOC’s workfload: once opened mongo connection from discovery never been closed, leaving lots of connection after the spikes of load. We’d implemented own connection pool and submitted pull request to pymongo project (See LIFO connection pool policy).

ClickHouse table cleanup policy

ClickHouse table retention policy may be configured on per-table basis. partition dropping is automated and may be called manually or from cron.

Redis cache backend

Our investigations showed that memcached is prone to randomly forget keys while enough memory is available. This leads to random discovery job states loss, leading to resetting the state of measured snmp counters, loosing random metrics and leaving empty gaps in grafana dashboards. Problem is hard to diagnose and only cure is to restart memcached process. Problem lies deeply in memcached internal architecture and unlikely to be fixed.

So we’d introduced support for redis cache backend. We’ll make decision to make or not to make it default cache backend after testing period.

SO_REUSEPORT & SO_FREEBIND for collectors

syslogcollector and trapcollector services supports SO_REUSEPORT and SO_FREEBIND options for listeners.

SO_REUSEPORT allows to share single port by several collector’ processes using in-kernel load balancing, greatly improving collectors’ throughoutput.

SO_FREEBIND allows to bind to non-existing address, opening support for floating virtual addresses for collector (VRRP, CARP etc), adding necessary level of redundancy.

In combination with new Syslog Archive and ClickHouse table cleanup policy features NOC can be turned to high-performance syslog archiving solution.

GridVCS

GridVCS is NOC’s high-performance redundant version control system used to store device configuration history. 19.1 release introduces several improvements to GridVCS subsystem.

  • built-in compression - though Mongo’s Wired Tiger uses transparent compression on storage level, explicit compression on GridVCS level reduces both disk usage and database server traffic.

  • Previous releases used mercurial’s mdiff to calculate config deltas. 19.1 uses BSDIFF4 format by default. During our tests BSDIFF4 showed better results in speed and delta size.

  • ./noc gridvcs command got additional compress subcommand, allowing to apply both compression and BSDIFF4 deltas to already collected data. While it can take a time for large storages it can free up significant disk space.

API improvements

profile.py

SA profiles used to live in __init__.py file. Our code style advises to keep __init__.py empty for various reason. Some features like profile loading from custom will not work with __init__.py anyway.

So starting with 19.1 it is recommended to place profile’s code into profile.py file. Loading from __init__.py is still supported but it is a good time to plan migration of custom profiles.

OIDRule: High-order scale functions

Metrics scale can be defined as high-order functions, i.e. function returning other functions. It’s greatly increase flexibility of scaling subsystem and allows external configuration of scaling processing.

IPAM seen propagation

Workflow’s seen signal can be configured to propagate up to the parent prefixes. Address and Prefix profiles got new Seen propagation policy setting which determines should or should not parent prefix will be notified of child element seen by discovery.

Common usage pattern is to propagate seen to aggregate prefixes to get notified when aggregate became used.

Phone workflow

phone module got full-blown workflow support. Each phone number and phone range has own state which can be changed manually or via external signals.

Breaking Changes

Migration

New features

MR Title
!1515 Add estimate param to job command.
!1525 Collection sharing
!1498 DataStream: asset part of ManagedObject
!1516 APIKey ACL
!1518 Add export/import to ./noc beef command.
!1514 Configurable behavior on new platforms and firmware policy violations
!1512 new fm-alarm
!1508 IRE-Polus.Taros profile
!1507 Summary glyph display order
!1501 Add Errors Out and Discards In for ddash
!595 Add periodic diagnostic to alarmdiagnostic.
!1460 ThresholdProfile: Flexible thresholds configuration
!1497 Alarm acknowledge/unacknowledge
!1491 network stp topology changes on graph
!1476 GridVCS: bsdiff4 patches and zlib compression
!1432 Add initial support for NSN.TIMOS profile
!1475 High-precision timers
!1458 Add `Network
!1455 CapsProfile
!1396 redis cache backend
!1404 #794: IPAM seen propagation policy
!1384 card: project card
!1390 #942: Remove Root container
!1352 #694 ClickHouse table cleaning policy
!1363 Vector.Lambda profile
!1283 NOC theme
!1336 OIDRule: High-order scale functions
!1338 #539 Syslog archiving
!1255 nbi service
!1345 #497 syslogcollector/trapcollector: SO_REUSEPORT and IP_FREEBIND support
!1252 datastream: Alarm datastream
!1226 #636 Phone Workflow integraton
!1113 Profiles should be moved to profile.py

Improvements

MR Title
!1534 Set default loglevel on command to info.
!1535 Update RU translation.
!1527 FM Alarms localization
!1529 Add full_name to PlatformApplication query fields.
!1522 Update/report interface status3
!1510 Update DLink.DxS profile
!1556 Update Rotek.BT profile (get_version)
!1539 Update settings by snmp requests for Dlink.DxS
!1500 Update Juniper.JUNOS profile
!1503 Speedup NetworkSegment Service Summary count.
!1502 Update Report for Interfaces Status
!1490 Generic.get_chassis_id disable Multicast MAC address check.
!1494 SKS.SKS and BDCOM.IOS config volatile.
!1488 Add platform to Linksys.SPS2xx profile.
!1451 Unified loader interface
!1485 Add caps profile to managedobject profile ETL loader.
!1484 Add to Linksys.SPS24xx platform OID
!1434 ./noc dnszone import: Parse complex $TTL directives
!1452 Move methods from SegmentTopology to BaseTopology
!1449 inv.networksegment: Bulk fields calculation
!1454 Add to_python method to ClickHouse model.
!1466 Add to Huawei.VRP profile get Serial Number attributes.
!1453 ResourceGroup: TreeCombo
!1461 Add config_volatile to Orion.NOS and SKS.SKS
!1447 Increase query interval for core.pm.utils function.
!1417 Extendable Generic.get_chassis_id script
!1441 Add patern more to Huawei.MA5600T profile.
!1440 Optimize reportalarmdetail and reportobjectdetail.
!1439 Update/eltex mes execute snmp
!1437 Delete aggregateinterface bi model
!1420 Add dynamically loader BI models.
!1418 RepoPreview MVVC
!1427 Migrate Alstec.24xx.get_metrics to new model.
!1414 networkx 2.2 and improvend spring layout implementation
!1413 dns.dnsserver: Remove sync field
!1400 requests 2.20.0
!1392 Diverged permissions
!1382 #961 Process All addresses and Loopback address syslog/trap source types
!1408 Add Generic.get_vlans and get_switchport scripts.
!1409 Add get_lldp_snmp capabilities for Cisco.IOS
!1410 Change Iface Name OID for get_ifindexes Plante.WCDG profile
!1374 migrate inv map to leafletjs
!1381 #971 trapcollector: Gentler handling of BER decoding errors
!1371 dnszone: Ignore addresses with missed FQDNs
!1369 Add theme variable to login page render.
!1368 Add “Up/10M” to reportcolumndatasource for report object detail.
!1391 CODEOWNERS file
!1353 #788 Try to determine VRF’s for DHCP address discovery
!1361 DataStream: Load from custom
!1251 Customized PyMongo connection pool
!1397 Juniper.junos
!1398 auto logout remove msg
!1385 Dead code cleanup
!1284 runcommands refactoring
!1375 Cleanup pyrule from classifier trigger.
!1341 theme body padding for form
!1362 Add convert ifname for MA4000
!1349 Cleanup AlliedTelesis profiles.
!1346 snmp: Try to negotiate broken error_index
!1344 Add Interface packets dashboard in MO dash.
!1318 Migrate ReportProfileCheck report to ReportStat Backend.
!1228 Move numpy import to parse_table_header in lib/text.
!1316 Additional LLDP constants and caps conversion functions
!1324 Add TZ parameter to NBI query.
!1126 #260 add password widget
!1322 Add get_lldp_neighbors and get_capabilities for Qtech2500 profile
!1264 Add clean to events command.
!1307 Update Alcatel.OS62xx profile
!1285 Hp.1910
!1190 Update Rotek.RTBSv1 profile
!1297 Add Rotek.RTBSv1.get_metrics script.
!1296 add get_config script for Dlink.DVG profile
!1291 Extend job command.
!1276 Add clean_id_bson to alarm datastream.
!1274 threadpool: Cleanup worker result just after setting future
!1286 Add late_alarm metric to seflmon fm collector.
!1249 Profile.cli_retries_super_password parameter
!1250 perm: response layout
!1229 ldap: Additional check of username format
!1214 Add telemetry to MRT service.
!1244 Add physical iface count metrics to selfmon.
!1216 Add vv (very verbose parameter) to test command.

Bugfixes

MR Title
!1487 Use ch_escape function on syslogcollector.
!1478 Fix Report Unknown Model Summary.
!1477 Fix Generic.get_capabilities snmp_v1
!1474 Fix load metric priority. Profile first, Generic second.
!1473 Fix Radio and SLA graph template for CH use.
!1481 Fix displaying platform in some Cisco Stackable switches
!1479 Fix Rotek RTBSv1 Tx Power metric
!1438 Fix Huawei.VRP.get_mac_address_table script
!1422 Fix MikroTik.RouterOS.get_interface_status_ex script
!1462 Fix heavy cpu load on show vlan command
!1469 Fix Huawei.VRP.get_version SerialNumber rogue chart.
!1467 Fix DLink.DxS profile
!1463 Fix Extreme.XOS.get_interfaces script
!1465 Fix PrefixBookmark import loop.
!1464 Fix selfmon FM metric name.
!1457 Fix getting single oid from multiple metrics.
!1444 Fix Iskratel.MSAN profile
!1450 Fix Orion.NOS.get_lldp_neighbors script
!1433 Fix Cisco.IOSXR profile
!1436 Fix Cisco.NXOS.get_arp script
!1448 Fix c.id in card.base.f_object_location.
!1445 login button width fixed
!1459 Lambda fix metrics
!1468 Huawei.VRP.get_version strip serial number.
!1435 InfiNet fix init.py pattern_prompt
!1426 inv.map fix performance
!1443 Fix Object.get_coordinate_zoom method.
!1428 Fix Huawei.MA5600T profile
!1430 Fix Alstec.24xx metric name.
!1289 Fix Juniper.JUNOS.get_lldp_neighbors Parameter ‘remote_port’ required.
!1423 Fix managedobject and object card for delete Root.
!1429 Fix avs Object.get_address_text method
!1424 Fix getting container path in Alarm Web and Card.
!1425 Fix typo in ManagedObject console UI.
!1483 Fix Raisecom.ROS.get_lldp_neighbors script
!1395 Fix container field type when remove Root.
!1401 ip.ipam: Fix prefix style
!1411 Fix Add Objects to Maintenance from SA !582
!1386 fix error “Отсутствуют адреса линка” in dns.reportmissedp2p
!1405 Fix Discovery Problem Detail report trace.
!1394 Fix get_lldp_neighbors by SNMP
!1407 Fix Plantet.WGSD Profile
!1403 #976 Fix closing of already closed session
!1406 Fix avs environments graph tmpl 148
!1402 jsloader fixed
!1399 Fix Ubiquiti profile and Generic.get_interfaces(get_bulk)
!1389 Fix Report Discovery Poison
!1378 Fix theme variable in desktop.html template.
!1379 Fix etl managedobject resourcegroup
!1367 Fix prompt in Rotek.RTBS.v1 profile.
!1366 Fix workflow CH dictionary.
!1365 Fix selfmon FM collector.
!1364 Fix update operation for superuser on secret field.
!1376 noc/noc#952 Fix metric path for Environment metric scope.
!1310 #964 Fix SA sessions leaking
!1357 Natex_fix_sn
!1355 Cisco_fix_snmp
!1370 Increase ManagedObject cache version for syslog archive field.
!1356 Fix Interface name Eltex.MES
!1354 Fix Interface name QSW2500
!1335 Fix get_interfaces, add reth aenet
!1343 Fix profilecheckdetail.
!1342 Fix secret field.
!1351 InfiNet-fix-get_version
!1350 Fix get_interfaces for Telindus profile
!1348 Fix stacked packets graph.
!1360 Fix Interface name ROS
!1326 Fix ch_state ch datasource.
!1332 Fix Span Card view from ClickHouse data.
!1331 Fix Huawei.MA5600T.get_cpe.
!1328 Fix Cisco.IOS.get_lldp_neighbors regex
!1327 Fix get_interfaces for Rotek.RTBSv1, add rule for platform RT-BS24
!1325 Fix CLIPS engine in slots.
!1320 Fix SNMP Trap OID Resolver
!1323 Fix get_interfaces for QSW2500 (dowwn -> down)
!1269 Fix Juniper.JUNOSe.get_interfaces script
!1278 Fix Huawei.MA5600T.get_cpe ValueError.
!1314 Fix Generic.get_chassis_id script
!1306 Fix AlliedTelesis.AT8000S.get_interfaces script
!1313 Fix Cisco.IOS.get_version for ME series
!1262 Fix Raisecom.RCIOS password prompt matching
!1238 Fix Juniper.JUNOS profile
!1279 Fixes empty range list in discoveryid.
!1305 Fix Rotek.RTBS profiles.
!1304 Fix some attributes for Span in MRT serivce
!1303 Fix selfmon escalator metrics.
!1300 fm.eventclassificationrule: Fix creating from event
!1295 Fix ./noc mib lookup
!1298 Fix custom metrics path in Generic.get_metrics.
!1290 Fix custom metrics.
!1225 noc/noc#954 Fix Cisco.IOS.get_inventory script
!1275 Fix InfiNet.WANFlexX.get_lldp_neighbors script
!1281 Delete quit() in script
!1280 Fit get_config
!1277 Fix Zhone.Bitstorm.get_interfaces script
!1254 Fix InfiNet.WANFlexX.get_interfaces script
!1272 Fix vendor name in SAE script credentials.
!1246 Fix Huawei.VRP pager
!1268 Fix scheme migrations
!1245 Fix Huawei.VRP3 prompt match
!1259 fix_error_web
!1258 Fix managed_object_platform migration.
!1260 Fix pm.util.get_objects_metrics if object_profile metrics empty.
!1253 Fix path in radius(services)
!1203 Fix prompt pattern in Eltex.DSLAM profile
!1247 Fix consul resolver index handling
!1239 #911 consul: Fix faulty state caused by changes in consul timeout behavior
!1237 #956 fix web scripts
!1221 Fix Generic.get_lldp_neighbors script
!1243 Fix now shift for selfmon task late.
!1231 noc/noc#946 Fix ManagedObject web console.
!1235 Fix futurize in SLA probe.
!1234 Fix Huawei.MA5600T.get_cpe.
!1220 Fix Generic.get_interfaces script
!1204 Fix Raisecom.ROS.get_interfaces script
!1215 Fix platform field in Platform Card.
!1210 ManagedObject datastream: Fix links property. capabilities property
!1212 Fix save empty metrics threshold in ManagedObjectProfile UI.
!1211 Fix interface validation errors in Huawei.VRP, Siklu.EH, Zhone.Bitstorm.
!1317 sa.managedobjectprofile: Fix text
!1340 noc/noc#966
!1294 selfmon typo in mo
!1105 #856 Rack view fix
!1208 #947 Fix MAC ranges optimization