NOC 23.1

17.04.2023

NOC 23.1

23.1 release contains 274 bugfixes, optimisations and improvements.

Highlights

Topo service

With the 23.1 release, NOC got a new dedicated service for topology-related calculations. The topo service tracks all topology-related changes and maintains an internal graph.

Before the 23.1 release, NOC relied on proper segmentation to calculate uplinks. The uplinks are necessary for topology-based root-cause analysis. We have found that a segment-based approach is hard to implement on specific kinds of networks:

  • Flat networks without the segmentation.
  • Networks with implicit segmentation.
  • Segmented networks without explicit segment hierarchy.

Moreover, it was impossible to build uplinks for top-level root segments.

The new approach analyses the whole network and relies only on managed object levels. The levels are organic and reflect the object’s role in the network. The service tracks changes and analyses all possible paths to exit points. In-memory graph reduces the imposed database load during massive topology changes.

!!! trivia

* `topo` stands for `topology`.
* `un topo` means `a mouse` in italian.

Migrate FM Events to Click House.

Before the 23.1 release, NOC stored the FM events in MongoDB. The limitation of storage became the bottleneck to the system’s scalability.

The lack of collection partitioning in MongoDB didn’t allow us to clean the obsolete data without impact on system operations. The speed of deletion may be lower than the speed of insertion, rendering the implicit deletion or TTL indexes useless. The collection size grew fast. The only working solution was to drop the collection to reclaim the space.

We are working hard on the system performance tuning. The limited MongoDB’s write performance became a stopper.

With the 23.1 release, we have moved the event storage to the ClickHouse and obtained the following benefits:

  • The table partitioning allows maintaining of predictable storage usage by dropping obsolete partitions.
  • ClickHouse has good write scalability.
  • ClickHouse greatly overperforms MongoDB on write operations ever on single-server configurations.
  • It is possible to analyze the events using built-in NOC BI.
  • It is possible to use third-party tools like Tableau for data digging.

Managed Object Workflows

Managed Objects got full workflow integration like other resources. Now the workflow states define the discovery, monitoring, and management settings. The new approach allows greater flexibility and fits well with complex business scenarios.

Configurable Metric Collection Intervals

NOC 23.1 allows configuring different collection intervals for metrics. We also have implemented the collection sharding, which allows multiplexing high-cardinality metrics over time. Metrics collection from boxes with a huge amount of subinterfaces, like PON OLT or BRAS, now is possible. It’s also possible to split metrics depending on the cost of collection on equipment. The “cheap” metrics may be collected frequently, while we can still collect “expensive” metrics more rarely.

Internal Kafka-compatible Message Streaming

NOC now supports Kafka-compatible API for internal message streaming. It’s possible to choose between:

  • Liftbridge, for simple installations.
  • Redpanda for high-profile Linux installations.
  • Kafka for other systems.

NOC supports the deployment and tuning of Redpanda out of the box, and we’re planning to deprecate Liftbridge usage in the next releases.

We also have moved our own Liftbridge client implementation into the standalone Gufo Liftbridge package.

Customized Network Maps

We have reworked the network maps, and now it is possible to create customized maps with the arbitrary set of managed objects. We also have implemented “map generators” on the backend, allowing the auto-generation of custom maps.

New TT Adapter API

We have reworked our TT adapter API. Among the benefits are:

  • Full typing support.
  • Parts of the escalation scenario have been moved into the base classes of the adapter, allowing implementation of the customized scenarios.

Migration

FM users must run data conversion scripts manually:

./noc fix apply convert_fm_events
./noc fix apply convert_fm_outages

New features

MR Title
!6805 noc/noc#1942 Customize map backend by loader.
!6883 Add ImageStore for Network Map background files
!6908 noc/noc#1968 Add User Configured Map.
!6942 Move FM events to clickhouse
!6981 noc/noc#1970 Add min_group_size settings for AlarmGroup.
!6990 Add MessageStreamClient for stream work.
!7012 noc/noc#1906 Add RedPandaClient to msgstream.
!7016 noc/noc#2023 New list managed objects
!7024 noc/noc#2022 Add ReportEngine.
!7031 noc/noc#2024 Add interval to Metric Settings.
!7035 noc/noc#2021 Add CPE initial collection and discovery.
!7051 New TTSystem adapter API
!7053 noc/noc#2022 Add Report model.
!7082 noc/noc#2022 Add ReportForm.
!7108 #1865 Add Workflow to ManagedObject.
!7125 topo service

Improvements

MR Title
!6676 Add script labels
!6728 noc/noc#1928 Correlator add downlink objects for detect ring RCA.
!6749 Add ctl/memtrace endpoint for tracemalloc run.
!6780 Set close escalation delay to reopens alarm control time.
!6781 Update version to 22.2
!6792 Docker add worker, metrics. nginx web volume
!6798 Bump Django version to 3.2.16
!6803 Refactor lib/database_storage module
!6804 Check metrics service active when collected metrics.
!6813 Add bulk mode to set interfacestatus state.
!6820 Use bi_id field as sharding key for Metric Stream.
!6830 Reset ManagedObject diagnostic when disabled Box.
!6831 Check can_update_alarms settings when raise diagnostic alarm.
!6855 Catch ModuleNotFoundError exception when import Windows pyximport library.
!6857 Add apply alarm_class components to raise alarm on correlator.
!6858 Update language translation file.
!6868 Set SNMPTRAP/SYSLOG diagnostics set.
!6888 Fix flake8 ‘l’ error in web service
!6902 Add How-To use hk for collect custom attributes.
!6903 Add NOC shell used examples to doc.
!6904 Add endpoint bulk_ping to activator service
!6911 Add lib to .gitignore and delete lib/init.py
!6913 Update links in welcome screen
!6918 Set SNMP check status on Profile Check.
!6923 Add diagnostic labels.
!6934 Add ObjectDiagnostic Docs.
!6939 noc/noc#1593 Add MapFiled for store BI Events vars.
!6945 Fix ResourceGroup check on alarmescalation.
!6946 Use polars library for Datasource.
!6953 noc/noc#1939 Add service based dcs check params.
!6957 Add sync_diagnostic_labels settings to global config.
!6969 Improve SNMPError description.
!6977 Add ERR_CLI_PASSWORD_TIMEOUT to Authentication Failed.
!6979 Move Stream Config to separate msgstream module.
!6986 Add custom TopologyGenerator settings to UI.
!6996 Increase Map offset for isolated nodes.
!6997 noc/noc#2005 Add selected custom map lookup
!7007 Add InterfaceValidationPolicy check to ConfDB on_delete.
!7007 Add InterfaceValidationPolicy check to ConfDB on_delete.
!7025 Fix EventClass Rules test form
!7027 Fix on_super_password in cli
!7034 Add noc.js to change-ip script path
!7036 Fix network-scan-docs link
!7038 Check pager first on on_prompt script expect.
!7047 Additional AlarmClass to link retention ttl-policy.
!7048 Bump clickhouse version inside docker-compose
!7049 Add DiscoveryIDCachePoison datasource.
!7054 Add site-url for sitemap generation
!7055 Add fm-reboots datasource
!7058 Move change handler to ChangeTracker.
!7063 Add inv-linkdetail datasource
!7064 Update codeowners
!7065 Combine python linters to a single CI task
!7069 Add interval migration.
!7070 Add NoSAProfileError error.
!7071 Catch ResolutionError to RPCNoService.
!7077 Update HP.Comware profile
!7084 Add ttsystemstatds datasource
!7088 Update help command to show custom commands
!7089 Fix create threshold alarms on SLAProbe.
!7093 Bump FastAPI version.
!7094 noc/noc#2045 Bump mongoengine to 0.27 and pymongo to 4.3.3.
!7099 Add rules to MetricConfig on Metrics Service for improve performance.
!7104 Add meta section to metric stream message.
!7105 translation fix
!7106 Make ruff checks visible in joblogs
!7111 Bump pyproj to 3.4.1.
!7112 noc/noc#2046 Bump cachetools to 5.3.0
!7113 noc/noc#2049 Add upload MIB docs.
!7119 noc/noc#2050 Add L2Domain to RemoteSystem model.
!7120 noc/noc#1728 Check labels in match rule when rename and remove
!7123 noc/noc#2022 Migrate Datasource-based tabled report.
!7138 Speedup interface classification.
!7144 #817 Add LAGs interface labels.
!7146 #1539 Set pool_active param default to 1.
!7150 noc/noc#2061 Add error when status is 500 to ManagedObject list
!7152 noc/noc#2060 Add protected field to ManagedObject form
!7153 noc/noc#2063 Add labels to WF Editior State inspector
!7154 noc/noc#2062 Add state combo in filter
!7157 Set Generic.Host as default SA Profile.
!7158 Catch Kafkasender Service connect producer errors.
!7159 Send reboot to BI directly
!7161 Add migrations for allowed_models to Workflow.
!7163 #816 Add inheritance interface profile to aggregate members.
!7164 Remove b" from crashinfo list
!7166 Set icontains to UI State filter condition.
!7170 skip http-exception if status <400
!7172 Add ManagedObject topology DataStream.
!7187 Refactor Diagnostic API.
!7195 Move calculate uplink to TopoService.
!7198 Add labels to setstatus request.
!7200 Cached MetricDiscovery interval.

Bugfixes

MR Title
!6747 Fix time_delta when processed discovery metrics.
!6748 Disable suggests in local profile on migration.
!6752 Fix typo on Address.get_collision query.
!6759 Watch escalation when reopen alarm.
!6760 Fix typo on caps discovery logging.
!6763 noc/noc#1936 Fix l2_domain filter on VLAN UI.
!6765 Add send_message method to stub service.
!6770 noc/noc#1937 Fix sender destination send params.
!6775 Fix changelog reorder when compact.
!6777 Split SNMP/CLI credential action on diagnostic discovery.
!6778 Fix check alarm close error on deescalation process.
!6787 noc/noc#1940 Revert Prefix import to Address.
!6789 Fix reorder metrics states on compact procedures.
!6793 noc/noc#1943 Remove vcfilter from NetworkSegment Application.
!6795 Fix partition num on ServiceStub.
!6815 Fix kafkasender stream settings.
!6818 Fix Threshold Profile migration for unique name.
!6822 noc/noc#1785 removed item_frequencies method in fm.reporteventsummary
!6823 noc/noc#1954 Fix wait datastream ready on mx services.
!6827 noc/noc#1955 Add port param to CLI protocol checker.
!6834 Fix allocation order on vlan.
!6845 fix Eltex.LTP get_version
!6849 Fix etl changed labels when object labels is None.
!6854 noc/noc#1956 Fix ZeroDivisionError when prefix usage calc.
!6861 noc/noc#1956 Fix detect address usage with included special addresses.
!6866 Fix send mx message on classifier and uptime reboot.
!6869 noc/noc#1959 Add bulk param to model_set_state.
!6870 Fix typo on NBI objectmetrics.
!6873 noc/noc#1960 Fix error on service without router.
!6882 Fix migration to OS.Linux profile.
!6892 Fix rebuild route chains when delete MessageRoute.
!6900 Fix calculate down_objects metric on Ping Service.
!6909 Fix “no stream jobs” upon collection sync
!6912 Fix OS.Linux profile migration if profile exists.
!6922 noc/noc#1969 Add datastream param to detect changes.
!6926 Add is_delta to _conversions key, for save unit conversation.
!6928 Fix ‘referenced before assignment’ on escalation notify.
!6931 Catch error when transmute processing on Route.
!6943 Fix save in ManagedObject set_caps method.
!6949 noc/noc#1985 Cleanup change commit typo.
!6951 Fix iter datastream typo.
!6954 Fix datastream send message when deleted.
!6962 Fix migrate bi table if previous exists.
!6972 Fix error when change mongoengine DictField.
!6980 noc/noc#1984 Add counter flag to cdag probe for check shift counter type.
!6988 Fix OS.Linux migration for ProfileCheckRule model.
!6989 Fix typo.
!6998 Fix getting slot name on stream config.
!6999 noc/noc#2006 Fix migration threshold profile without function.
!7001 #1998 Bump gufo-ping 0.2.4
!7006 Fix typo portal id on segment map generator.
!7009 fix(peer): issue #2007, as-set format validation and position
!7013 Fix MAC discovery policy filter settings typo.
!7050 Cleanup bad documents on Object Status collection.
!7056 Convert Event Vars to string.
!7080 noc/noc#2039 Fix stucked UI when close tab
!7087 Fix iter_row method on DataSource.
!7090 Fix collection sync for EmbeddedDocumentListField.
!7092 noc/noc#2041 Sync cursor after flush state on MetricServce.
!7098 Fix aoikafka requirements.
!7102 noc/noc#2047 fix me.up() is undefined
!7102 noc/noc#2047 fix me.up() is undefined
!7114 Fix typo on MessageRoute UI Form.
!7121 Fix wipe user command.
!7122 Fix Events log.
!7124 noc/noc#2054 Fix rebuild datastream on DNS Model.
!7129 Fix DNSZone datastream when IP address used on masters.
!7142 Fix classifier Event Message format for send to ch.events.
!7148 noc/noc#2059 Catch getting error for MAC Collection button
!7149 Slice activator script result publish for large result size.
!7151 Fix msgstream client for migrations.
!7168 Rebuild managedobject datastream when changed discovery id.
!7173 #2065 Place interface IP Addresses to object VRP if device not supported VRF.
!7183 Use Generic.Host profile for unknown peering point SA profile.
!7189 Fix liftbridge client alter stream.
!7194 Fix getting external stream partition on Router.
!7196 Fix error when getting datastream format message headers.
!7197 Fix csvutil processed import.
!7199 noc/noc#2068 Disable clean when collection sync for instances without uuid.

Code Cleanup

MR Title
!6800 Refactor lib/highlight module
!6801 Refactor lib/template module
!6802 Remove lib/datasource module
!6829 Move lib/app directory into services/web/base
!6987 Cleanup print on config class.
!7052 Ruff linter
!7062 Simplify mib expressions
!7072 devcontainer.json: Move settings and extensions into customizations.vscode
!7073 ruff: Enable W - pycodestyle warnings
!7074 ruff: Enable flake8-builtin (A) diagnostics
!7075 Ruff: Enable pylint (PLC, PLE) checks
!7078 ruff: Fix PLW0120 else clause on loop without a break statement
!7134 Catch git safe.directory error when getting version.

Profile Changes

Alsitec.24xx

MR Title
!6810 Alstec.24xx.get_metrics. Fix metric units.

Cisco.IOS

MR Title
!7117 noc/noc#1920 Cisco.IOS. Cleanup output SNMP CDP neighbors.

Cisco.IOSXR

MR Title
!7059 Cisco.IOSXR get_inventory error asr9k
MR Title
!7103 DLink.DxS.get_interfaces: Fix CLI returns wrong oper_status

Dahua.DH

MR Title
!7147 Add Dahua.DH profile to collection
!7147 Add Dahua.DH profile to collection

Eltex.MES

MR Title
!6915 Eltex.MES. Add retry authentication to pattern_more.
!6965 fix interface description Eltex.MES.get_interfaces
!6974 Eltex.MES. Add MES-3316F and MES-3348F oid.
!7004 fix Stack Members in get_capabilities Eltex.MES
!7026 Eltex.MES. Add MES-2348P to detect oid version.
!7041 fix get_inventory Eltex.MES. Serial fix
!7041 fix get_inventory Eltex.MES. Serial fix
!7066 inv.platforms: Eltex MES-2324FB
!7068 mes2324fb
!7097 fix portchannel Eltex.MES

Eltex.MES24xx

MR Title
!6842 Fix Eltex.MES24xx.get_version script

Generic

MR Title
!6746 Use Attribute capability for get_inventory scripts.
!6896 Generic.get_capabilities. Filter non-printable character on sysDescr.
!6959 Generic.get_interface_status_ex. Ignore unknown interface on interfaces param.
!6959 Generic.get_interface_status_ex. Ignore unknown interface on interfaces param.
!6964 add chunk_size to Generic.get_interfaces
!7155 noc/noc#1983 Add return script execution metrics on Activator.script.
!7165 Fix units on collecting SLA metrics on profiles.

Hikvision.DSKV8

MR Title
!7137 Hikvision.DSKV8. Fix NTP Server parse on ConfDB normalizer.
!7137 Hikvision.DSKV8. Fix NTP Server parse on ConfDB normalizer.

Huawei.MA5600T

MR Title
!6783 noc/noc#1926 Huawei.MA5600T. Fix allow_empty_response for pattern_more send.
!7037 noc/noc#2020 Huawei.MA5600T.get_inventory. Fix detect board.
!7100 Fix CPE discovery
!7192 #2056 Huawei.MA5600T.get_inventory. Fix duplicate chassis as motherboard on MA5801-GP16.

Huawei.VRP

MR Title
!6799 Fixed detect port and power supply number for new Huawei CloudEngine switches
!6895 noc/noc#1964 Huawei.VRP.get_interfaces. Add allow_empty_response for ‘display vlan’ on cloud_engine_switch.

Juniper.JUNOS

MR Title
!6833 Juniper.JUNOS.get_metrics. Fix units on ‘Memory | Heap’ metrics
!6850 Juniper.JUNOS.get_metrics. Fix labels format on slot generator.
!7101 Juniper.JUNOS.get_metrics. Fix collect SLA metrics.

MikroTik.RouterOS

MR Title
!7128 noc/noc#1914 MikroTik.RouterOS. Fix config normalizer when router destination is ifname.
!7128 noc/noc#1914 MikroTik.RouterOS. Fix config normalizer when router destination is ifname.

NAG.SNR

MR Title
!7060 fixing NAG.SNR.get_inventory

Raisecom.ROS

MR Title
!6767 Fix Raisecom.ROS.get_version script

ZTE.ZXA10

MR Title
!7115 noc/noc#1658 ZTE.ZXA10.get_interfaces. Add SFUL, GFGM card type.

rare

MR Title
!6769 Fix 3Com.SuperStack3_4500.get_interfaces script
!6807 DCN.DCWL.get_metrics. Convert to flot.
!6825 DCN.DCWL.get_metrics. Fix check ‘channel-util’ key in metrics.
!6825 DCN.DCWL.get_metrics. Fix check ‘channel-util’ key in metrics.
!6884 Fix Qtech.QSW.get_version script
!6897 ECI.HiFOCuS. Fix setup_script profile method for None user.
!6961 H3C.VRP.get_interface_status. Fix matchers typo.
!6976 Cambium.ePMP. Add SNMP support.
!6995 Eltex.WOP. Add SNMP support.
!7019 add get_lldp_neighbors Qtech.QOS
!7030 DLink_Industrial_cli Fix (config) prompt and autoanswer
!7086 fix Zyxel.DSLAM
!7130 noc/noc#2037 BDCOM.xPON.get_interfaces. Add Giga-Combo-FX-SFP interface type.
!7132 Fix P1 interfaces on port1 Qtech.QOS
!7132 Fix P1 interfaces on port1 Qtech.QOS
!7143 #2037 BDCOM.xPON.get_interfaces. Fix parse tagged vlans.
!7180 Расхождение коллекции

Collections Changes

MR Title
!6837 inv.platforms: Huawei Technologies Co. S6730-H24X6C
!6838 inv.platforms: Huawei Technologies Co. S6330-H48X6C
!6839 inv.platforms: Huawei Technologies Co. S6330-H24X6C
!6885 Fix calculate MetricType for delta type.
!6914 Fix ComboPorts on ObjectModels.
!6936 ping: Switch to direct dispose protocol
!6993 noc/noc#1958 Add bulk mode for update object statuses on dispose message.
!7040 add profilecheckrules SKS-16E1-IP-ES-L
!7042 noc/noc#1729 Replace AlarmClass default severity by AlarmRule and labels.
!7079 noc/noc#2013 Add buckets to iter_collected_metrics for discovery.
!7085 add profilecheckrules zyxel.dslam VES-1624FT-55A
!7109 #2022 Add report config

Deploy Changes

MR Title
!6541 Add redpanda role deploy
!6736 add lib yedit
!6877 Ansible tower add metrics check
!7061 Split requirements.txt
!7076 Ruff: Enable pylint (PLR) checks