NOC 23.1
23.1 release contains 274 bugfixes, optimisations and improvements.
Highlights
Topo service
With the 23.1 release, NOC got a new dedicated service for topology-related calculations. The topo
service tracks all topology-related
changes and maintains an internal graph.
Before the 23.1 release, NOC relied on proper segmentation to calculate uplinks. The uplinks are necessary for topology-based root-cause analysis. We have found that a segment-based approach is hard to implement on specific kinds of networks:
Flat networks without the segmentation.
Networks with implicit segmentation.
Segmented networks without explicit segment hierarchy.
Moreover, it was impossible to build uplinks for top-level root segments.
The new approach analyses the whole network and relies only on managed object levels. The levels are organic and reflect the object's role in the network. The service tracks changes and analyses all possible paths to exit points.
In-memory graph reduces the imposed database load during massive topology changes.
Trivia
topo
stands for topology
.
un topo
means a mouse
in italian.
Migrate FM Events to Click House.
Before the 23.1 release, NOC stored the FM events in MongoDB. The limitation of storage became the bottleneck to the system's scalability.
The lack of collection partitioning in MongoDB didn't allow us to clean the obsolete data without impact on system operations. The speed of deletion may be lower than the speed of insertion, rendering the implicit deletion or TTL indexes useless. The collection size grew fast. The only working solution was to drop the collection to reclaim the space.
We are working hard on the system performance tuning. The limited MongoDB's write performance became a stopper.
With the 23.1 release, we have moved the event storage to the ClickHouse and obtained the following benefits:
The table partitioning allows maintaining of predictable storage usage by dropping obsolete partitions.
ClickHouse has good write scalability.
ClickHouse greatly overperforms MongoDB on write operations ever on single-server configurations.
It is possible to analyze the events using built-in NOC BI.
It is possible to use third-party tools like Tableau for data digging.
Managed Object Workflows
Managed Objects got full workflow integration like other resources. Now the workflow states define the discovery, monitoring, and management settings. The new approach allows greater flexibility and fits well with complex business scenarios.
Configurable Metric Collection Intervals
NOC 23.1 allows configuring different collection intervals for metrics. We also have implemented the collection sharding, which allows multiplexing high-cardinality metrics over time. Metrics collection from boxes with a huge amount of subinterfaces, like PON OLT or BRAS, now is possible.
It's also possible to split metrics depending on the cost of collection on equipment. The "cheap" metrics may be collected frequently, while we can still collect "expensive" metrics more rarely.
Internal Kafka-compatible Message Streaming
NOC now supports Kafka-compatible API for internal message streaming. It's possible to choose between:
Liftbridge, for simple installations.
Redpanda for high-profile Linux installations.
Kafka for other systems.
NOC supports the deployment and tuning of Redpanda out of the box, and
we're planning to deprecate Liftbridge usage in the next releases.
We also have moved our own Liftbridge client implementation into the standalone
Gufo Liftbridge package.
Customized Network Maps
We have reworked the network maps, and now it is possible to create customized maps with the arbitrary set of managed objects. We also have implemented "map generators" on the backend, allowing the auto-generation of custom maps.
New TT Adapter API
We have reworked our TT adapter API. Among the benefits are:
Full typing support.
Parts of the escalation scenario have been moved into the base classes of the adapter,
allowing implementation of the customized scenarios.
Migration
FM users must run data conversion scripts manually:
./noc fix apply convert_fm_events
./noc fix apply convert_fm_outages
New features
MR
Title
MR6805
noc/noc#1942 Customize map backend by loader.
MR6883
Add ImageStore for Network Map background files
MR6908
noc/noc#1968 Add User Configured Map.
MR6942
Move FM events to clickhouse
MR6981
noc/noc#1970 Add min_group_size settings for AlarmGroup.
MR6990
Add MessageStreamClient for stream work.
MR7012
noc/noc#1906 Add RedPandaClient to msgstream.
MR7016
noc/noc#2023 New list managed objects
MR7024
noc/noc#2022 Add ReportEngine.
MR7031
noc/noc#2024 Add interval to Metric Settings.
MR7035
noc/noc#2021 Add CPE initial collection and discovery.
MR7051
New TTSystem adapter API
MR7053
noc/noc#2022 Add Report model.
MR7082
noc/noc#2022 Add ReportForm.
MR7108
#1865 Add Workflow to ManagedObject.
MR7125
topo service
Improvements
MR
Title
MR6676
Add script labels
MR6728
noc/noc#1928 Correlator add downlink objects for detect ring RCA.
MR6749
Add ctl/memtrace endpoint for tracemalloc run.
MR6780
Set close escalation delay to reopens alarm control time.
MR6781
Update version to 22.2
MR6792
Docker add worker, metrics. nginx web volume
MR6798
Bump Django version to 3.2.16
MR6803
Refactor lib/database_storage module
MR6804
Check metrics service active when collected metrics.
MR6813
Add bulk mode to set interfacestatus state.
MR6820
Use bi_id field as sharding key for Metric Stream.
MR6830
Reset ManagedObject diagnostic when disabled Box.
MR6831
Check can_update_alarms settings when raise diagnostic alarm.
MR6855
Catch ModuleNotFoundError exception when import Windows pyximport library.
MR6857
Add apply alarm_class components to raise alarm on correlator.
MR6858
Update language translation file.
MR6868
Set SNMPTRAP/SYSLOG diagnostics set.
MR6888
Fix flake8 'l' error in web service
MR6902
Add How-To use hk for collect custom attributes.
MR6903
Add NOC shell used examples to doc.
MR6904
Add endpoint bulk_ping to activator service
MR6911
Add lib to .gitignore and delete lib/init .py
MR6913
Update links in welcome screen
MR6918
Set SNMP check status on Profile Check.
MR6923
Add diagnostic labels.
MR6934
Add ObjectDiagnostic Docs.
MR6939
noc/noc#1593 Add MapFiled for store BI Events vars.
MR6945
Fix ResourceGroup check on alarmescalation.
MR6946
Use polars library for Datasource.
MR6953
noc/noc#1939 Add service based dcs check params.
MR6957
Add sync_diagnostic_labels settings to global config.
MR6969
Improve SNMPError description.
MR6977
Add ERR_CLI_PASSWORD_TIMEOUT to Authentication Failed.
MR6979
Move Stream Config to separate msgstream module.
MR6986
Add custom TopologyGenerator settings to UI.
MR6996
Increase Map offset for isolated nodes.
MR6997
noc/noc#2005 Add selected custom map lookup
MR7007
Add InterfaceValidationPolicy check to ConfDB on_delete.
MR7007
Add InterfaceValidationPolicy check to ConfDB on_delete.
MR7025
Fix EventClass Rules test form
MR7027
Fix on_super_password in cli
MR7034
Add noc.js to change-ip script path
MR7036
Fix network-scan-docs link
MR7038
Check pager first on on_prompt script expect.
MR7047
Additional AlarmClass to link retention ttl-policy.
MR7048
Bump clickhouse version inside docker-compose
MR7049
Add DiscoveryIDCachePoison datasource.
MR7054
Add site-url for sitemap generation
MR7055
Add fm-reboots datasource
MR7058
Move change handler to ChangeTracker.
MR7063
Add inv-linkdetail datasource
MR7064
Update codeowners
MR7065
Combine python linters to a single CI task
MR7069
Add interval migration.
MR7070
Add NoSAProfileError error.
MR7071
Catch ResolutionError to RPCNoService.
MR7077
Update HP.Comware profile
MR7084
Add ttsystemstatds datasource
MR7088
Update help command to show custom commands
MR7089
Fix create threshold alarms on SLAProbe.
MR7093
Bump FastAPI version.
MR7094
noc/noc#2045 Bump mongoengine to 0.27 and pymongo to 4.3.3.
MR7099
Add rules to MetricConfig on Metrics Service for improve performance.
MR7104
Add meta section to metric stream message.
MR7105
translation fix
MR7106
Make ruff checks visible in joblogs
MR7111
Bump pyproj to 3.4.1.
MR7112
noc/noc#2046 Bump cachetools to 5.3.0
MR7113
noc/noc#2049 Add upload MIB docs.
MR7119
noc/noc#2050 Add L2Domain to RemoteSystem model.
MR7120
noc/noc#1728 Check labels in match rule when rename and remove
MR7123
noc/noc#2022 Migrate Datasource-based tabled report.
MR7138
Speedup interface classification.
MR7144
#817 Add LAGs interface labels.
MR7146
#1539 Set pool_active param default to 1.
MR7150
noc/noc#2061 Add error when status is 500 to ManagedObject list
MR7152
noc/noc#2060 Add protected field to ManagedObject form
MR7153
noc/noc#2063 Add labels to WF Editior State inspector
MR7154
noc/noc#2062 Add state combo in filter
MR7157
Set Generic.Host as default SA Profile.
MR7158
Catch Kafkasender Service connect producer errors.
MR7159
Send reboot to BI directly
MR7161
Add migrations for allowed_models to Workflow.
MR7163
#816 Add inheritance interface profile to aggregate members.
MR7164
Remove b"
from crashinfo list
MR7166
Set icontains to UI State filter condition.
MR7170
skip http-exception if status <400
MR7172
Add ManagedObject topology DataStream.
MR7187
Refactor Diagnostic API.
MR7195
Move calculate uplink to TopoService.
MR7198
Add labels to setstatus request.
MR7200
Cached MetricDiscovery interval.
Bugfixes
MR
Title
MR6747
Fix time_delta when processed discovery metrics.
MR6748
Disable suggests in local profile on migration.
MR6752
Fix typo on Address.get_collision query.
MR6759
Watch escalation when reopen alarm.
MR6760
Fix typo on caps discovery logging.
MR6763
noc/noc#1936 Fix l2_domain filter on VLAN UI.
MR6765
Add send_message method to stub service.
MR6770
noc/noc#1937 Fix sender destination send params.
MR6775
Fix changelog reorder when compact.
MR6777
Split SNMP/CLI credential action on diagnostic discovery.
MR6778
Fix check alarm close error on deescalation process.
MR6787
noc/noc#1940 Revert Prefix import to Address.
MR6789
Fix reorder metrics states on compact procedures.
MR6793
noc/noc#1943 Remove vcfilter from NetworkSegment Application.
MR6795
Fix partition num on ServiceStub.
MR6815
Fix kafkasender stream settings.
MR6818
Fix Threshold Profile migration for unique name.
MR6822
noc/noc#1785 removed item_frequencies method in fm.reporteventsummary
MR6823
noc/noc#1954 Fix wait datastream ready on mx services.
MR6827
noc/noc#1955 Add port param to CLI protocol checker.
MR6834
Fix allocation order on vlan.
MR6845
fix Eltex.LTP get_version
MR6849
Fix etl changed labels when object labels is None.
MR6854
noc/noc#1956 Fix ZeroDivisionError when prefix usage calc.
MR6861
noc/noc#1956 Fix detect address usage with included special addresses.
MR6866
Fix send mx message on classifier and uptime reboot.
MR6869
noc/noc#1959 Add bulk param to model_set_state.
MR6870
Fix typo on NBI objectmetrics.
MR6873
noc/noc#1960 Fix error on service without router.
MR6882
Fix migration to OS.Linux profile.
MR6892
Fix rebuild route chains when delete MessageRoute.
MR6900
Fix calculate down_objects metric on Ping Service.
MR6909
Fix "no stream jobs" upon collection sync
MR6912
Fix OS.Linux profile migration if profile exists.
MR6922
noc/noc#1969 Add datastream param to detect changes.
MR6926
Add is_delta to _conversions key, for save unit conversation.
MR6928
Fix 'referenced before assignment' on escalation notify.
MR6931
Catch error when transmute processing on Route.
MR6943
Fix save in ManagedObject set_caps method.
MR6949
noc/noc#1985 Cleanup change commit typo.
MR6951
Fix iter datastream typo.
MR6954
Fix datastream send message when deleted.
MR6962
Fix migrate bi table if previous exists.
MR6972
Fix error when change mongoengine DictField.
MR6980
noc/noc#1984 Add counter flag to cdag probe for check shift counter type.
MR6988
Fix OS.Linux migration for ProfileCheckRule model.
MR6989
Fix typo.
MR6998
Fix getting slot name on stream config.
MR6999
noc/noc#2006 Fix migration threshold profile without function.
MR7001
#1998 Bump gufo-ping 0.2.4
MR7006
Fix typo portal id on segment map generator.
MR7009
fix(peer): issue #2007, as-set format validation and position
MR7013
Fix MAC discovery policy filter settings typo.
MR7050
Cleanup bad documents on Object Status collection.
MR7056
Convert Event Vars to string.
MR7080
noc/noc#2039 Fix stucked UI when close tab
MR7087
Fix iter_row method on DataSource.
MR7090
Fix collection sync for EmbeddedDocumentListField.
MR7092
noc/noc#2041 Sync cursor after flush state on MetricServce.
MR7098
Fix aoikafka requirements.
MR7102
noc/noc#2047 fix me.up() is undefined
MR7102
noc/noc#2047 fix me.up() is undefined
MR7114
Fix typo on MessageRoute UI Form.
MR7121
Fix wipe user command.
MR7122
Fix Events log.
MR7124
noc/noc#2054 Fix rebuild datastream on DNS Model.
MR7129
Fix DNSZone datastream when IP address used on masters.
MR7142
Fix classifier Event Message format for send to ch.events.
MR7148
noc/noc#2059 Catch getting error for MAC Collection button
MR7149
Slice activator script result publish for large result size.
MR7151
Fix msgstream client for migrations.
MR7168
Rebuild managedobject datastream when changed discovery id.
MR7173
#2065 Place interface IP Addresses to object VRP if device not supported VRF.
MR7183
Use Generic.Host profile for unknown peering point SA profile.
MR7189
Fix liftbridge client alter stream.
MR7194
Fix getting external stream partition on Router.
MR7196
Fix error when getting datastream format message headers.
MR7197
Fix csvutil processed import.
MR7199
noc/noc#2068 Disable clean when collection sync for instances without uuid.
Code Cleanup
MR
Title
MR6800
Refactor lib/highlight module
MR6801
Refactor lib/template module
MR6802
Remove lib/datasource module
MR6829
Move lib/app directory into services/web/base
MR6987
Cleanup print on config class.
MR7052
Ruff linter
MR7062
Simplify mib expressions
MR7072
devcontainer.json: Move settings and extensions into customizations.vscode
MR7073
ruff: Enable W - pycodestyle warnings
MR7074
ruff: Enable flake8-builtin (A) diagnostics
MR7075
Ruff: Enable pylint (PLC, PLE) checks
MR7078
ruff: Fix PLW0120 else
clause on loop without a break
statement
MR7134
Catch git safe.directory error when getting version.
Profile Changes
Alsitec.24xx
MR
Title
MR6810
Alstec.24xx.get_metrics. Fix metric units.
Cisco.IOS
MR
Title
MR7117
noc/noc#1920 Cisco.IOS. Cleanup output SNMP CDP neighbors.
Cisco.IOSXR
MR
Title
MR7059
Cisco.IOSXR get_inventory error asr9k
DLink.DxS
MR
Title
MR7103
DLink.DxS.get_interfaces: Fix CLI returns wrong oper_status
Dahua.DH
MR
Title
MR7147
Add Dahua.DH profile to collection
MR7147
Add Dahua.DH profile to collection
Eltex.MES
MR
Title
MR6915
Eltex.MES. Add retry authentication to pattern_more.
MR6965
fix interface description Eltex.MES.get_interfaces
MR6974
Eltex.MES. Add MES-3316F and MES-3348F oid.
MR7004
fix Stack Members in get_capabilities Eltex.MES
MR7026
Eltex.MES. Add MES-2348P to detect oid version.
MR7041
fix get_inventory Eltex.MES. Serial fix
MR7041
fix get_inventory Eltex.MES. Serial fix
MR7066
inv.platforms: Eltex MES-2324FB
MR7068
mes2324fb
MR7097
fix portchannel Eltex.MES
Eltex.MES24xx
MR
Title
MR6842
Fix Eltex.MES24xx.get_version script
Generic
MR
Title
MR6746
Use Attribute capability for get_inventory scripts.
MR6896
Generic.get_capabilities. Filter non-printable character on sysDescr.
MR6959
Generic.get_interface_status_ex. Ignore unknown interface on interfaces param.
MR6959
Generic.get_interface_status_ex. Ignore unknown interface on interfaces param.
MR6964
add chunk_size to Generic.get_interfaces
MR7155
noc/noc#1983 Add return script execution metrics on Activator.script.
MR7165
Fix units on collecting SLA metrics on profiles.
Hikvision.DSKV8
MR
Title
MR7137
Hikvision.DSKV8. Fix NTP Server parse on ConfDB normalizer.
MR7137
Hikvision.DSKV8. Fix NTP Server parse on ConfDB normalizer.
Huawei.MA5600T
MR
Title
MR6783
noc/noc#1926 Huawei.MA5600T. Fix allow_empty_response for pattern_more send.
MR7037
noc/noc#2020 Huawei.MA5600T.get_inventory. Fix detect board.
MR7100
Fix CPE discovery
MR7192
#2056 Huawei.MA5600T.get_inventory. Fix duplicate chassis as motherboard on MA5801-GP16.
Huawei.VRP
MR
Title
MR6799
Fixed detect port and power supply number for new Huawei CloudEngine switches
MR6895
noc/noc#1964 Huawei.VRP.get_interfaces. Add allow_empty_response for 'display vlan' on cloud_engine_switch.
Juniper.JUNOS
MR
Title
MR6833
Juniper.JUNOS.get_metrics. Fix units on 'Memory | Heap' metrics
MR6850
Juniper.JUNOS.get_metrics. Fix labels format on slot generator.
MR7101
Juniper.JUNOS.get_metrics. Fix collect SLA metrics.
MikroTik.RouterOS
MR
Title
MR7128
noc/noc#1914 MikroTik.RouterOS. Fix config normalizer when router destination is ifname.
MR7128
noc/noc#1914 MikroTik.RouterOS. Fix config normalizer when router destination is ifname.
NAG.SNR
MR
Title
MR7060
fixing NAG.SNR.get_inventory
Raisecom.ROS
MR
Title
MR6767
Fix Raisecom.ROS.get_version script
ZTE.ZXA10
MR
Title
MR7115
noc/noc#1658 ZTE.ZXA10.get_interfaces. Add SFUL, GFGM card type.
rare
MR
Title
MR6769
Fix 3Com.SuperStack3_4500.get_interfaces script
MR6807
DCN.DCWL.get_metrics. Convert to flot.
MR6825
DCN.DCWL.get_metrics. Fix check 'channel-util' key in metrics.
MR6825
DCN.DCWL.get_metrics. Fix check 'channel-util' key in metrics.
MR6884
Fix Qtech.QSW.get_version script
MR6897
ECI.HiFOCuS. Fix setup_script profile method for None user.
MR6961
H3C.VRP.get_interface_status. Fix matchers typo.
MR6976
Cambium.ePMP. Add SNMP support.
MR6995
Eltex.WOP. Add SNMP support.
MR7019
add get_lldp_neighbors Qtech.QOS
MR7030
DLink_Industrial_cli Fix (config) prompt and autoanswer
MR7086
fix Zyxel.DSLAM
MR7130
noc/noc#2037 BDCOM.xPON.get_interfaces. Add Giga-Combo-FX-SFP interface type.
MR7132
Fix P1 interfaces on port1 Qtech.QOS
MR7132
Fix P1 interfaces on port1 Qtech.QOS
MR7143
#2037 BDCOM.xPON.get_interfaces. Fix parse tagged vlans.
MR7180
Расхождение коллекции
Collections Changes
MR
Title
MR6837
inv.platforms: Huawei Technologies Co. S6730-H24X6C
MR6838
inv.platforms: Huawei Technologies Co. S6330-H48X6C
MR6839
inv.platforms: Huawei Technologies Co. S6330-H24X6C
MR6885
Fix calculate MetricType for delta type.
MR6914
Fix ComboPorts on ObjectModels.
MR6936
ping: Switch to direct dispose protocol
MR6993
noc/noc#1958 Add bulk mode for update object statuses on dispose message.
MR7040
add profilecheckrules SKS-16E1-IP-ES-L
MR7042
noc/noc#1729 Replace AlarmClass default severity by AlarmRule and labels.
MR7079
noc/noc#2013 Add buckets to iter_collected_metrics for discovery.
MR7085
add profilecheckrules zyxel.dslam VES-1624FT-55A
MR7109
#2022 Add report config
Deploy Changes
MR
Title
MR6541
Add redpanda role deploy
MR6736
add lib yedit
MR6877
Ansible tower add metrics check
MR7061
Split requirements.txt
MR7076
Ruff: Enable pylint (PLR) checks