Перейти к содержанию

NOC 23.1

23.1 release contains 274 bugfixes, optimisations and improvements.

Highlights

Topo service

With the 23.1 release, NOC got a new dedicated service for topology-related calculations. The topo service tracks all topology-related changes and maintains an internal graph.

Before the 23.1 release, NOC relied on proper segmentation to calculate uplinks. The uplinks are necessary for topology-based root-cause analysis. We have found that a segment-based approach is hard to implement on specific kinds of networks:

  • Flat networks without the segmentation.
  • Networks with implicit segmentation.
  • Segmented networks without explicit segment hierarchy.

Moreover, it was impossible to build uplinks for top-level root segments.

The new approach analyses the whole network and relies only on managed object levels. The levels are organic and reflect the object's role in the network. The service tracks changes and analyses all possible paths to exit points. In-memory graph reduces the imposed database load during massive topology changes.

Trivia

  • topo stands for topology.
  • un topo means a mouse in italian.

Migrate FM Events to Click House.

Before the 23.1 release, NOC stored the FM events in MongoDB. The limitation of storage became the bottleneck to the system's scalability.

The lack of collection partitioning in MongoDB didn't allow us to clean the obsolete data without impact on system operations. The speed of deletion may be lower than the speed of insertion, rendering the implicit deletion or TTL indexes useless. The collection size grew fast. The only working solution was to drop the collection to reclaim the space.

We are working hard on the system performance tuning. The limited MongoDB's write performance became a stopper.

With the 23.1 release, we have moved the event storage to the ClickHouse and obtained the following benefits:

  • The table partitioning allows maintaining of predictable storage usage by dropping obsolete partitions.
  • ClickHouse has good write scalability.
  • ClickHouse greatly overperforms MongoDB on write operations ever on single-server configurations.
  • It is possible to analyze the events using built-in NOC BI.
  • It is possible to use third-party tools like Tableau for data digging.

Managed Object Workflows

Managed Objects got full workflow integration like other resources. Now the workflow states define the discovery, monitoring, and management settings. The new approach allows greater flexibility and fits well with complex business scenarios.

Configurable Metric Collection Intervals

NOC 23.1 allows configuring different collection intervals for metrics. We also have implemented the collection sharding, which allows multiplexing high-cardinality metrics over time. Metrics collection from boxes with a huge amount of subinterfaces, like PON OLT or BRAS, now is possible. It's also possible to split metrics depending on the cost of collection on equipment. The "cheap" metrics may be collected frequently, while we can still collect "expensive" metrics more rarely.

Internal Kafka-compatible Message Streaming

NOC now supports Kafka-compatible API for internal message streaming. It's possible to choose between:

  • Liftbridge, for simple installations.
  • Redpanda for high-profile Linux installations.
  • Kafka for other systems.

NOC supports the deployment and tuning of Redpanda out of the box, and we're planning to deprecate Liftbridge usage in the next releases.

We also have moved our own Liftbridge client implementation into the standalone Gufo Liftbridge package.

Customized Network Maps

We have reworked the network maps, and now it is possible to create customized maps with the arbitrary set of managed objects. We also have implemented "map generators" on the backend, allowing the auto-generation of custom maps.

New TT Adapter API

We have reworked our TT adapter API. Among the benefits are:

  • Full typing support.
  • Parts of the escalation scenario have been moved into the base classes of the adapter, allowing implementation of the customized scenarios.

Migration

FM users must run data conversion scripts manually:

./noc fix apply convert_fm_events
./noc fix apply convert_fm_outages

New features

MRTitle
MR6805noc/noc#1942 Customize map backend by loader.
MR6883Add ImageStore for Network Map background files
MR6908noc/noc#1968 Add User Configured Map.
MR6942Move FM events to clickhouse
MR6981noc/noc#1970 Add min_group_size settings for AlarmGroup.
MR6990Add MessageStreamClient for stream work.
MR7012noc/noc#1906 Add RedPandaClient to msgstream.
MR7016noc/noc#2023 New list managed objects
MR7024noc/noc#2022 Add ReportEngine.
MR7031noc/noc#2024 Add interval to Metric Settings.
MR7035noc/noc#2021 Add CPE initial collection and discovery.
MR7051New TTSystem adapter API
MR7053noc/noc#2022 Add Report model.
MR7082noc/noc#2022 Add ReportForm.
MR7108#1865 Add Workflow to ManagedObject.
MR7125topo service

Improvements

MRTitle
MR6676Add script labels
MR6728noc/noc#1928 Correlator add downlink objects for detect ring RCA.
MR6749Add ctl/memtrace endpoint for tracemalloc run.
MR6780Set close escalation delay to reopens alarm control time.
MR6781Update version to 22.2
MR6792Docker add worker, metrics. nginx web volume
MR6798Bump Django version to 3.2.16
MR6803Refactor lib/database_storage module
MR6804Check metrics service active when collected metrics.
MR6813Add bulk mode to set interfacestatus state.
MR6820Use bi_id field as sharding key for Metric Stream.
MR6830Reset ManagedObject diagnostic when disabled Box.
MR6831Check can_update_alarms settings when raise diagnostic alarm.
MR6855Catch ModuleNotFoundError exception when import Windows pyximport library.
MR6857Add apply alarm_class components to raise alarm on correlator.
MR6858Update language translation file.
MR6868Set SNMPTRAP/SYSLOG diagnostics set.
MR6888Fix flake8 'l' error in web service
MR6902Add How-To use hk for collect custom attributes.
MR6903Add NOC shell used examples to doc.
MR6904Add endpoint bulk_ping to activator service
MR6911Add lib to .gitignore and delete lib/init.py
MR6913Update links in welcome screen
MR6918Set SNMP check status on Profile Check.
MR6923Add diagnostic labels.
MR6934Add ObjectDiagnostic Docs.
MR6939noc/noc#1593 Add MapFiled for store BI Events vars.
MR6945Fix ResourceGroup check on alarmescalation.
MR6946Use polars library for Datasource.
MR6953noc/noc#1939 Add service based dcs check params.
MR6957Add sync_diagnostic_labels settings to global config.
MR6969Improve SNMPError description.
MR6977Add ERR_CLI_PASSWORD_TIMEOUT to Authentication Failed.
MR6979Move Stream Config to separate msgstream module.
MR6986Add custom TopologyGenerator settings to UI.
MR6996Increase Map offset for isolated nodes.
MR6997noc/noc#2005 Add selected custom map lookup
MR7007Add InterfaceValidationPolicy check to ConfDB on_delete.
MR7007Add InterfaceValidationPolicy check to ConfDB on_delete.
MR7025Fix EventClass Rules test form
MR7027Fix on_super_password in cli
MR7034Add noc.js to change-ip script path
MR7036Fix network-scan-docs link
MR7038Check pager first on on_prompt script expect.
MR7047Additional AlarmClass to link retention ttl-policy.
MR7048Bump clickhouse version inside docker-compose
MR7049Add DiscoveryIDCachePoison datasource.
MR7054Add site-url for sitemap generation
MR7055Add fm-reboots datasource
MR7058Move change handler to ChangeTracker.
MR7063Add inv-linkdetail datasource
MR7064Update codeowners
MR7065Combine python linters to a single CI task
MR7069Add interval migration.
MR7070Add NoSAProfileError error.
MR7071Catch ResolutionError to RPCNoService.
MR7077Update HP.Comware profile
MR7084Add ttsystemstatds datasource
MR7088Update help command to show custom commands
MR7089Fix create threshold alarms on SLAProbe.
MR7093Bump FastAPI version.
MR7094noc/noc#2045 Bump mongoengine to 0.27 and pymongo to 4.3.3.
MR7099Add rules to MetricConfig on Metrics Service for improve performance.
MR7104Add meta section to metric stream message.
MR7105translation fix
MR7106Make ruff checks visible in joblogs
MR7111Bump pyproj to 3.4.1.
MR7112noc/noc#2046 Bump cachetools to 5.3.0
MR7113noc/noc#2049 Add upload MIB docs.
MR7119noc/noc#2050 Add L2Domain to RemoteSystem model.
MR7120noc/noc#1728 Check labels in match rule when rename and remove
MR7123noc/noc#2022 Migrate Datasource-based tabled report.
MR7138Speedup interface classification.
MR7144#817 Add LAGs interface labels.
MR7146#1539 Set pool_active param default to 1.
MR7150noc/noc#2061 Add error when status is 500 to ManagedObject list
MR7152noc/noc#2060 Add protected field to ManagedObject form
MR7153noc/noc#2063 Add labels to WF Editior State inspector
MR7154noc/noc#2062 Add state combo in filter
MR7157Set Generic.Host as default SA Profile.
MR7158Catch Kafkasender Service connect producer errors.
MR7159Send reboot to BI directly
MR7161Add migrations for allowed_models to Workflow.
MR7163#816 Add inheritance interface profile to aggregate members.
MR7164Remove b" from crashinfo list
MR7166Set icontains to UI State filter condition.
MR7170skip http-exception if status <400
MR7172Add ManagedObject topology DataStream.
MR7187Refactor Diagnostic API.
MR7195Move calculate uplink to TopoService.
MR7198Add labels to setstatus request.
MR7200Cached MetricDiscovery interval.

Bugfixes

MRTitle
MR6747Fix time_delta when processed discovery metrics.
MR6748Disable suggests in local profile on migration.
MR6752Fix typo on Address.get_collision query.
MR6759Watch escalation when reopen alarm.
MR6760Fix typo on caps discovery logging.
MR6763noc/noc#1936 Fix l2_domain filter on VLAN UI.
MR6765Add send_message method to stub service.
MR6770noc/noc#1937 Fix sender destination send params.
MR6775Fix changelog reorder when compact.
MR6777Split SNMP/CLI credential action on diagnostic discovery.
MR6778Fix check alarm close error on deescalation process.
MR6787noc/noc#1940 Revert Prefix import to Address.
MR6789Fix reorder metrics states on compact procedures.
MR6793noc/noc#1943 Remove vcfilter from NetworkSegment Application.
MR6795Fix partition num on ServiceStub.
MR6815Fix kafkasender stream settings.
MR6818Fix Threshold Profile migration for unique name.
MR6822noc/noc#1785 removed item_frequencies method in fm.reporteventsummary
MR6823noc/noc#1954 Fix wait datastream ready on mx services.
MR6827noc/noc#1955 Add port param to CLI protocol checker.
MR6834Fix allocation order on vlan.
MR6845fix Eltex.LTP get_version
MR6849Fix etl changed labels when object labels is None.
MR6854noc/noc#1956 Fix ZeroDivisionError when prefix usage calc.
MR6861noc/noc#1956 Fix detect address usage with included special addresses.
MR6866Fix send mx message on classifier and uptime reboot.
MR6869noc/noc#1959 Add bulk param to model_set_state.
MR6870Fix typo on NBI objectmetrics.
MR6873noc/noc#1960 Fix error on service without router.
MR6882Fix migration to OS.Linux profile.
MR6892Fix rebuild route chains when delete MessageRoute.
MR6900Fix calculate down_objects metric on Ping Service.
MR6909Fix "no stream jobs" upon collection sync
MR6912Fix OS.Linux profile migration if profile exists.
MR6922noc/noc#1969 Add datastream param to detect changes.
MR6926Add is_delta to _conversions key, for save unit conversation.
MR6928Fix 'referenced before assignment' on escalation notify.
MR6931Catch error when transmute processing on Route.
MR6943Fix save in ManagedObject set_caps method.
MR6949noc/noc#1985 Cleanup change commit typo.
MR6951Fix iter datastream typo.
MR6954Fix datastream send message when deleted.
MR6962Fix migrate bi table if previous exists.
MR6972Fix error when change mongoengine DictField.
MR6980noc/noc#1984 Add counter flag to cdag probe for check shift counter type.
MR6988Fix OS.Linux migration for ProfileCheckRule model.
MR6989Fix typo.
MR6998Fix getting slot name on stream config.
MR6999noc/noc#2006 Fix migration threshold profile without function.
MR7001#1998 Bump gufo-ping 0.2.4
MR7006Fix typo portal id on segment map generator.
MR7009fix(peer): issue #2007, as-set format validation and position
MR7013Fix MAC discovery policy filter settings typo.
MR7050Cleanup bad documents on Object Status collection.
MR7056Convert Event Vars to string.
MR7080noc/noc#2039 Fix stucked UI when close tab
MR7087Fix iter_row method on DataSource.
MR7090Fix collection sync for EmbeddedDocumentListField.
MR7092noc/noc#2041 Sync cursor after flush state on MetricServce.
MR7098Fix aoikafka requirements.
MR7102noc/noc#2047 fix me.up() is undefined
MR7102noc/noc#2047 fix me.up() is undefined
MR7114Fix typo on MessageRoute UI Form.
MR7121Fix wipe user command.
MR7122Fix Events log.
MR7124noc/noc#2054 Fix rebuild datastream on DNS Model.
MR7129Fix DNSZone datastream when IP address used on masters.
MR7142Fix classifier Event Message format for send to ch.events.
MR7148noc/noc#2059 Catch getting error for MAC Collection button
MR7149Slice activator script result publish for large result size.
MR7151Fix msgstream client for migrations.
MR7168Rebuild managedobject datastream when changed discovery id.
MR7173#2065 Place interface IP Addresses to object VRP if device not supported VRF.
MR7183Use Generic.Host profile for unknown peering point SA profile.
MR7189Fix liftbridge client alter stream.
MR7194Fix getting external stream partition on Router.
MR7196Fix error when getting datastream format message headers.
MR7197Fix csvutil processed import.
MR7199noc/noc#2068 Disable clean when collection sync for instances without uuid.

Code Cleanup

MRTitle
MR6800Refactor lib/highlight module
MR6801Refactor lib/template module
MR6802Remove lib/datasource module
MR6829Move lib/app directory into services/web/base
MR6987Cleanup print on config class.
MR7052Ruff linter
MR7062Simplify mib expressions
MR7072devcontainer.json: Move settings and extensions into customizations.vscode
MR7073ruff: Enable W - pycodestyle warnings
MR7074ruff: Enable flake8-builtin (A) diagnostics
MR7075Ruff: Enable pylint (PLC, PLE) checks
MR7078ruff: Fix PLW0120 else clause on loop without a break statement
MR7134Catch git safe.directory error when getting version.

Profile Changes

Alsitec.24xx

MRTitle
MR6810Alstec.24xx.get_metrics. Fix metric units.

Cisco.IOS

MRTitle
MR7117noc/noc#1920 Cisco.IOS. Cleanup output SNMP CDP neighbors.

Cisco.IOSXR

MRTitle
MR7059Cisco.IOSXR get_inventory error asr9k

DLink.DxS

MRTitle
MR7103DLink.DxS.get_interfaces: Fix CLI returns wrong oper_status

Dahua.DH

MRTitle
MR7147Add Dahua.DH profile to collection
MR7147Add Dahua.DH profile to collection

Eltex.MES

MRTitle
MR6915Eltex.MES. Add retry authentication to pattern_more.
MR6965fix interface description Eltex.MES.get_interfaces
MR6974Eltex.MES. Add MES-3316F and MES-3348F oid.
MR7004fix Stack Members in get_capabilities Eltex.MES
MR7026Eltex.MES. Add MES-2348P to detect oid version.
MR7041fix get_inventory Eltex.MES. Serial fix
MR7041fix get_inventory Eltex.MES. Serial fix
MR7066inv.platforms: Eltex MES-2324FB
MR7068mes2324fb
MR7097fix portchannel Eltex.MES

Eltex.MES24xx

MRTitle
MR6842Fix Eltex.MES24xx.get_version script

Generic

MRTitle
MR6746Use Attribute capability for get_inventory scripts.
MR6896Generic.get_capabilities. Filter non-printable character on sysDescr.
MR6959Generic.get_interface_status_ex. Ignore unknown interface on interfaces param.
MR6959Generic.get_interface_status_ex. Ignore unknown interface on interfaces param.
MR6964add chunk_size to Generic.get_interfaces
MR7155noc/noc#1983 Add return script execution metrics on Activator.script.
MR7165Fix units on collecting SLA metrics on profiles.

Hikvision.DSKV8

MRTitle
MR7137Hikvision.DSKV8. Fix NTP Server parse on ConfDB normalizer.
MR7137Hikvision.DSKV8. Fix NTP Server parse on ConfDB normalizer.

Huawei.MA5600T

MRTitle
MR6783noc/noc#1926 Huawei.MA5600T. Fix allow_empty_response for pattern_more send.
MR7037noc/noc#2020 Huawei.MA5600T.get_inventory. Fix detect board.
MR7100Fix CPE discovery
MR7192#2056 Huawei.MA5600T.get_inventory. Fix duplicate chassis as motherboard on MA5801-GP16.

Huawei.VRP

MRTitle
MR6799Fixed detect port and power supply number for new Huawei CloudEngine switches
MR6895noc/noc#1964 Huawei.VRP.get_interfaces. Add allow_empty_response for 'display vlan' on cloud_engine_switch.

Juniper.JUNOS

MRTitle
MR6833Juniper.JUNOS.get_metrics. Fix units on 'Memory | Heap' metrics
MR6850Juniper.JUNOS.get_metrics. Fix labels format on slot generator.
MR7101Juniper.JUNOS.get_metrics. Fix collect SLA metrics.

MikroTik.RouterOS

MRTitle
MR7128noc/noc#1914 MikroTik.RouterOS. Fix config normalizer when router destination is ifname.
MR7128noc/noc#1914 MikroTik.RouterOS. Fix config normalizer when router destination is ifname.

NAG.SNR

MRTitle
MR7060fixing NAG.SNR.get_inventory

Raisecom.ROS

MRTitle
MR6767Fix Raisecom.ROS.get_version script

ZTE.ZXA10

MRTitle
MR7115noc/noc#1658 ZTE.ZXA10.get_interfaces. Add SFUL, GFGM card type.

rare

MRTitle
MR6769Fix 3Com.SuperStack3_4500.get_interfaces script
MR6807DCN.DCWL.get_metrics. Convert to flot.
MR6825DCN.DCWL.get_metrics. Fix check 'channel-util' key in metrics.
MR6825DCN.DCWL.get_metrics. Fix check 'channel-util' key in metrics.
MR6884Fix Qtech.QSW.get_version script
MR6897ECI.HiFOCuS. Fix setup_script profile method for None user.
MR6961H3C.VRP.get_interface_status. Fix matchers typo.
MR6976Cambium.ePMP. Add SNMP support.
MR6995Eltex.WOP. Add SNMP support.
MR7019add get_lldp_neighbors Qtech.QOS
MR7030DLink_Industrial_cli Fix (config) prompt and autoanswer
MR7086fix Zyxel.DSLAM
MR7130noc/noc#2037 BDCOM.xPON.get_interfaces. Add Giga-Combo-FX-SFP interface type.
MR7132Fix P1 interfaces on port1 Qtech.QOS
MR7132Fix P1 interfaces on port1 Qtech.QOS
MR7143#2037 BDCOM.xPON.get_interfaces. Fix parse tagged vlans.
MR7180Расхождение коллекции

Collections Changes

MRTitle
MR6837inv.platforms: Huawei Technologies Co. S6730-H24X6C
MR6838inv.platforms: Huawei Technologies Co. S6330-H48X6C
MR6839inv.platforms: Huawei Technologies Co. S6330-H24X6C
MR6885Fix calculate MetricType for delta type.
MR6914Fix ComboPorts on ObjectModels.
MR6936ping: Switch to direct dispose protocol
MR6993noc/noc#1958 Add bulk mode for update object statuses on dispose message.
MR7040add profilecheckrules SKS-16E1-IP-ES-L
MR7042noc/noc#1729 Replace AlarmClass default severity by AlarmRule and labels.
MR7079noc/noc#2013 Add buckets to iter_collected_metrics for discovery.
MR7085add profilecheckrules zyxel.dslam VES-1624FT-55A
MR7109#2022 Add report config

Deploy Changes

MRTitle
MR6541Add redpanda role deploy
MR6736add lib yedit
MR6877Ansible tower add metrics check
MR7061Split requirements.txt
MR7076Ruff: Enable pylint (PLR) checks