NOC 23.1 23.1 release contains 274 bugfixes, optimisations and improvements.
Highlights Topo service With the 23.1 release, NOC got a new dedicated service for topology-related calculations. The topo
service tracks all topology-related changes and maintains an internal graph.
Before the 23.1 release, NOC relied on proper segmentation to calculate uplinks. The uplinks are necessary for topology-based root-cause analysis. We have found that a segment-based approach is hard to implement on specific kinds of networks:
Flat networks without the segmentation. Networks with implicit segmentation. Segmented networks without explicit segment hierarchy. Moreover, it was impossible to build uplinks for top-level root segments.
The new approach analyses the whole network and relies only on managed object levels. The levels are organic and reflect the object's role in the network. The service tracks changes and analyses all possible paths to exit points. In-memory graph reduces the imposed database load during massive topology changes.
Trivia
topo
stands for topology
.un topo
means a mouse
in italian.Migrate FM Events to Click House. Before the 23.1 release, NOC stored the FM events in MongoDB. The limitation of storage became the bottleneck to the system's scalability.
The lack of collection partitioning in MongoDB didn't allow us to clean the obsolete data without impact on system operations. The speed of deletion may be lower than the speed of insertion, rendering the implicit deletion or TTL indexes useless. The collection size grew fast. The only working solution was to drop the collection to reclaim the space.
We are working hard on the system performance tuning. The limited MongoDB's write performance became a stopper.
With the 23.1 release, we have moved the event storage to the ClickHouse and obtained the following benefits:
The table partitioning allows maintaining of predictable storage usage by dropping obsolete partitions. ClickHouse has good write scalability. ClickHouse greatly overperforms MongoDB on write operations ever on single-server configurations. It is possible to analyze the events using built-in NOC BI. It is possible to use third-party tools like Tableau for data digging. Managed Object Workflows Managed Objects got full workflow integration like other resources. Now the workflow states define the discovery, monitoring, and management settings. The new approach allows greater flexibility and fits well with complex business scenarios.
Configurable Metric Collection Intervals NOC 23.1 allows configuring different collection intervals for metrics. We also have implemented the collection sharding, which allows multiplexing high-cardinality metrics over time. Metrics collection from boxes with a huge amount of subinterfaces, like PON OLT or BRAS, now is possible. It's also possible to split metrics depending on the cost of collection on equipment. The "cheap" metrics may be collected frequently, while we can still collect "expensive" metrics more rarely.
Internal Kafka-compatible Message Streaming NOC now supports Kafka-compatible API for internal message streaming. It's possible to choose between:
Liftbridge, for simple installations. Redpanda for high-profile Linux installations. Kafka for other systems. NOC supports the deployment and tuning of Redpanda out of the box, and we're planning to deprecate Liftbridge usage in the next releases.
We also have moved our own Liftbridge client implementation into the standalone Gufo Liftbridge package.
Customized Network Maps We have reworked the network maps, and now it is possible to create customized maps with the arbitrary set of managed objects. We also have implemented "map generators" on the backend, allowing the auto-generation of custom maps.
New TT Adapter API We have reworked our TT adapter API. Among the benefits are:
Full typing support. Parts of the escalation scenario have been moved into the base classes of the adapter, allowing implementation of the customized scenarios. Migration FM users must run data conversion scripts manually:
./noc fix apply convert_fm_events
./noc fix apply convert_fm_outages
New features MR Title MR6805 noc/noc#1942 Customize map backend by loader. MR6883 Add ImageStore for Network Map background files MR6908 noc/noc#1968 Add User Configured Map. MR6942 Move FM events to clickhouse MR6981 noc/noc#1970 Add min_group_size settings for AlarmGroup. MR6990 Add MessageStreamClient for stream work. MR7012 noc/noc#1906 Add RedPandaClient to msgstream. MR7016 noc/noc#2023 New list managed objects MR7024 noc/noc#2022 Add ReportEngine. MR7031 noc/noc#2024 Add interval to Metric Settings. MR7035 noc/noc#2021 Add CPE initial collection and discovery. MR7051 New TTSystem adapter API MR7053 noc/noc#2022 Add Report model. MR7082 noc/noc#2022 Add ReportForm. MR7108 #1865 Add Workflow to ManagedObject. MR7125 topo service
Improvements MR Title MR6676 Add script labels MR6728 noc/noc#1928 Correlator add downlink objects for detect ring RCA. MR6749 Add ctl/memtrace endpoint for tracemalloc run. MR6780 Set close escalation delay to reopens alarm control time. MR6781 Update version to 22.2 MR6792 Docker add worker, metrics. nginx web volume MR6798 Bump Django version to 3.2.16 MR6803 Refactor lib/database_storage module MR6804 Check metrics service active when collected metrics. MR6813 Add bulk mode to set interfacestatus state. MR6820 Use bi_id field as sharding key for Metric Stream. MR6830 Reset ManagedObject diagnostic when disabled Box. MR6831 Check can_update_alarms settings when raise diagnostic alarm. MR6855 Catch ModuleNotFoundError exception when import Windows pyximport library. MR6857 Add apply alarm_class components to raise alarm on correlator. MR6858 Update language translation file. MR6868 Set SNMPTRAP/SYSLOG diagnostics set. MR6888 Fix flake8 'l' error in web service MR6902 Add How-To use hk for collect custom attributes. MR6903 Add NOC shell used examples to doc. MR6904 Add endpoint bulk_ping to activator service MR6911 Add lib to .gitignore and delete lib/init .py MR6913 Update links in welcome screen MR6918 Set SNMP check status on Profile Check. MR6923 Add diagnostic labels. MR6934 Add ObjectDiagnostic Docs. MR6939 noc/noc#1593 Add MapFiled for store BI Events vars. MR6945 Fix ResourceGroup check on alarmescalation. MR6946 Use polars library for Datasource. MR6953 noc/noc#1939 Add service based dcs check params. MR6957 Add sync_diagnostic_labels settings to global config. MR6969 Improve SNMPError description. MR6977 Add ERR_CLI_PASSWORD_TIMEOUT to Authentication Failed. MR6979 Move Stream Config to separate msgstream module. MR6986 Add custom TopologyGenerator settings to UI. MR6996 Increase Map offset for isolated nodes. MR6997 noc/noc#2005 Add selected custom map lookup MR7007 Add InterfaceValidationPolicy check to ConfDB on_delete. MR7007 Add InterfaceValidationPolicy check to ConfDB on_delete. MR7025 Fix EventClass Rules test form MR7027 Fix on_super_password in cli MR7034 Add noc.js to change-ip script path MR7036 Fix network-scan-docs link MR7038 Check pager first on on_prompt script expect. MR7047 Additional AlarmClass to link retention ttl-policy. MR7048 Bump clickhouse version inside docker-compose MR7049 Add DiscoveryIDCachePoison datasource. MR7054 Add site-url for sitemap generation MR7055 Add fm-reboots datasource MR7058 Move change handler to ChangeTracker. MR7063 Add inv-linkdetail datasource MR7064 Update codeowners MR7065 Combine python linters to a single CI task MR7069 Add interval migration. MR7070 Add NoSAProfileError error. MR7071 Catch ResolutionError to RPCNoService. MR7077 Update HP.Comware profile MR7084 Add ttsystemstatds datasource MR7088 Update help command to show custom commands MR7089 Fix create threshold alarms on SLAProbe. MR7093 Bump FastAPI version. MR7094 noc/noc#2045 Bump mongoengine to 0.27 and pymongo to 4.3.3. MR7099 Add rules to MetricConfig on Metrics Service for improve performance. MR7104 Add meta section to metric stream message. MR7105 translation fix MR7106 Make ruff checks visible in joblogs MR7111 Bump pyproj to 3.4.1. MR7112 noc/noc#2046 Bump cachetools to 5.3.0 MR7113 noc/noc#2049 Add upload MIB docs. MR7119 noc/noc#2050 Add L2Domain to RemoteSystem model. MR7120 noc/noc#1728 Check labels in match rule when rename and remove MR7123 noc/noc#2022 Migrate Datasource-based tabled report. MR7138 Speedup interface classification. MR7144 #817 Add LAGs interface labels. MR7146 #1539 Set pool_active param default to 1. MR7150 noc/noc#2061 Add error when status is 500 to ManagedObject list MR7152 noc/noc#2060 Add protected field to ManagedObject form MR7153 noc/noc#2063 Add labels to WF Editior State inspector MR7154 noc/noc#2062 Add state combo in filter MR7157 Set Generic.Host as default SA Profile. MR7158 Catch Kafkasender Service connect producer errors. MR7159 Send reboot to BI directly MR7161 Add migrations for allowed_models to Workflow. MR7163 #816 Add inheritance interface profile to aggregate members. MR7164 Remove b"
from crashinfo list MR7166 Set icontains to UI State filter condition. MR7170 skip http-exception if status <400 MR7172 Add ManagedObject topology DataStream. MR7187 Refactor Diagnostic API. MR7195 Move calculate uplink to TopoService. MR7198 Add labels to setstatus request. MR7200 Cached MetricDiscovery interval.
Bugfixes MR Title MR6747 Fix time_delta when processed discovery metrics. MR6748 Disable suggests in local profile on migration. MR6752 Fix typo on Address.get_collision query. MR6759 Watch escalation when reopen alarm. MR6760 Fix typo on caps discovery logging. MR6763 noc/noc#1936 Fix l2_domain filter on VLAN UI. MR6765 Add send_message method to stub service. MR6770 noc/noc#1937 Fix sender destination send params. MR6775 Fix changelog reorder when compact. MR6777 Split SNMP/CLI credential action on diagnostic discovery. MR6778 Fix check alarm close error on deescalation process. MR6787 noc/noc#1940 Revert Prefix import to Address. MR6789 Fix reorder metrics states on compact procedures. MR6793 noc/noc#1943 Remove vcfilter from NetworkSegment Application. MR6795 Fix partition num on ServiceStub. MR6815 Fix kafkasender stream settings. MR6818 Fix Threshold Profile migration for unique name. MR6822 noc/noc#1785 removed item_frequencies method in fm.reporteventsummary MR6823 noc/noc#1954 Fix wait datastream ready on mx services. MR6827 noc/noc#1955 Add port param to CLI protocol checker. MR6834 Fix allocation order on vlan. MR6845 fix Eltex.LTP get_version MR6849 Fix etl changed labels when object labels is None. MR6854 noc/noc#1956 Fix ZeroDivisionError when prefix usage calc. MR6861 noc/noc#1956 Fix detect address usage with included special addresses. MR6866 Fix send mx message on classifier and uptime reboot. MR6869 noc/noc#1959 Add bulk param to model_set_state. MR6870 Fix typo on NBI objectmetrics. MR6873 noc/noc#1960 Fix error on service without router. MR6882 Fix migration to OS.Linux profile. MR6892 Fix rebuild route chains when delete MessageRoute. MR6900 Fix calculate down_objects metric on Ping Service. MR6909 Fix "no stream jobs" upon collection sync MR6912 Fix OS.Linux profile migration if profile exists. MR6922 noc/noc#1969 Add datastream param to detect changes. MR6926 Add is_delta to _conversions key, for save unit conversation. MR6928 Fix 'referenced before assignment' on escalation notify. MR6931 Catch error when transmute processing on Route. MR6943 Fix save in ManagedObject set_caps method. MR6949 noc/noc#1985 Cleanup change commit typo. MR6951 Fix iter datastream typo. MR6954 Fix datastream send message when deleted. MR6962 Fix migrate bi table if previous exists. MR6972 Fix error when change mongoengine DictField. MR6980 noc/noc#1984 Add counter flag to cdag probe for check shift counter type. MR6988 Fix OS.Linux migration for ProfileCheckRule model. MR6989 Fix typo. MR6998 Fix getting slot name on stream config. MR6999 noc/noc#2006 Fix migration threshold profile without function. MR7001 #1998 Bump gufo-ping 0.2.4 MR7006 Fix typo portal id on segment map generator. MR7009 fix(peer): issue #2007, as-set format validation and position MR7013 Fix MAC discovery policy filter settings typo. MR7050 Cleanup bad documents on Object Status collection. MR7056 Convert Event Vars to string. MR7080 noc/noc#2039 Fix stucked UI when close tab MR7087 Fix iter_row method on DataSource. MR7090 Fix collection sync for EmbeddedDocumentListField. MR7092 noc/noc#2041 Sync cursor after flush state on MetricServce. MR7098 Fix aoikafka requirements. MR7102 noc/noc#2047 fix me.up() is undefined MR7102 noc/noc#2047 fix me.up() is undefined MR7114 Fix typo on MessageRoute UI Form. MR7121 Fix wipe user command. MR7122 Fix Events log. MR7124 noc/noc#2054 Fix rebuild datastream on DNS Model. MR7129 Fix DNSZone datastream when IP address used on masters. MR7142 Fix classifier Event Message format for send to ch.events. MR7148 noc/noc#2059 Catch getting error for MAC Collection button MR7149 Slice activator script result publish for large result size. MR7151 Fix msgstream client for migrations. MR7168 Rebuild managedobject datastream when changed discovery id. MR7173 #2065 Place interface IP Addresses to object VRP if device not supported VRF. MR7183 Use Generic.Host profile for unknown peering point SA profile. MR7189 Fix liftbridge client alter stream. MR7194 Fix getting external stream partition on Router. MR7196 Fix error when getting datastream format message headers. MR7197 Fix csvutil processed import. MR7199 noc/noc#2068 Disable clean when collection sync for instances without uuid.
Code Cleanup MR Title MR6800 Refactor lib/highlight module MR6801 Refactor lib/template module MR6802 Remove lib/datasource module MR6829 Move lib/app directory into services/web/base MR6987 Cleanup print on config class. MR7052 Ruff linter MR7062 Simplify mib expressions MR7072 devcontainer.json: Move settings and extensions into customizations.vscode MR7073 ruff: Enable W - pycodestyle warnings MR7074 ruff: Enable flake8-builtin (A) diagnostics MR7075 Ruff: Enable pylint (PLC, PLE) checks MR7078 ruff: Fix PLW0120 else
clause on loop without a break
statement MR7134 Catch git safe.directory error when getting version.
Profile Changes Alsitec.24xx MR Title MR6810 Alstec.24xx.get_metrics. Fix metric units.
Cisco.IOS MR Title MR7117 noc/noc#1920 Cisco.IOS. Cleanup output SNMP CDP neighbors.
Cisco.IOSXR MR Title MR7059 Cisco.IOSXR get_inventory error asr9k
DLink.DxS MR Title MR7103 DLink.DxS.get_interfaces: Fix CLI returns wrong oper_status
Dahua.DH MR Title MR7147 Add Dahua.DH profile to collection MR7147 Add Dahua.DH profile to collection
Eltex.MES MR Title MR6915 Eltex.MES. Add retry authentication to pattern_more. MR6965 fix interface description Eltex.MES.get_interfaces MR6974 Eltex.MES. Add MES-3316F and MES-3348F oid. MR7004 fix Stack Members in get_capabilities Eltex.MES MR7026 Eltex.MES. Add MES-2348P to detect oid version. MR7041 fix get_inventory Eltex.MES. Serial fix MR7041 fix get_inventory Eltex.MES. Serial fix MR7066 inv.platforms: Eltex MES-2324FB MR7068 mes2324fb MR7097 fix portchannel Eltex.MES
Eltex.MES24xx MR Title MR6842 Fix Eltex.MES24xx.get_version script
Generic MR Title MR6746 Use Attribute capability for get_inventory scripts. MR6896 Generic.get_capabilities. Filter non-printable character on sysDescr. MR6959 Generic.get_interface_status_ex. Ignore unknown interface on interfaces param. MR6959 Generic.get_interface_status_ex. Ignore unknown interface on interfaces param. MR6964 add chunk_size to Generic.get_interfaces MR7155 noc/noc#1983 Add return script execution metrics on Activator.script. MR7165 Fix units on collecting SLA metrics on profiles.
Hikvision.DSKV8 MR Title MR7137 Hikvision.DSKV8. Fix NTP Server parse on ConfDB normalizer. MR7137 Hikvision.DSKV8. Fix NTP Server parse on ConfDB normalizer.
Huawei.MA5600T MR Title MR6783 noc/noc#1926 Huawei.MA5600T. Fix allow_empty_response for pattern_more send. MR7037 noc/noc#2020 Huawei.MA5600T.get_inventory. Fix detect board. MR7100 Fix CPE discovery MR7192 #2056 Huawei.MA5600T.get_inventory. Fix duplicate chassis as motherboard on MA5801-GP16.
Huawei.VRP MR Title MR6799 Fixed detect port and power supply number for new Huawei CloudEngine switches MR6895 noc/noc#1964 Huawei.VRP.get_interfaces. Add allow_empty_response for 'display vlan' on cloud_engine_switch.
Juniper.JUNOS MR Title MR6833 Juniper.JUNOS.get_metrics. Fix units on 'Memory | Heap' metrics MR6850 Juniper.JUNOS.get_metrics. Fix labels format on slot generator. MR7101 Juniper.JUNOS.get_metrics. Fix collect SLA metrics.
MikroTik.RouterOS MR Title MR7128 noc/noc#1914 MikroTik.RouterOS. Fix config normalizer when router destination is ifname. MR7128 noc/noc#1914 MikroTik.RouterOS. Fix config normalizer when router destination is ifname.
NAG.SNR MR Title MR7060 fixing NAG.SNR.get_inventory
Raisecom.ROS MR Title MR6767 Fix Raisecom.ROS.get_version script
ZTE.ZXA10 MR Title MR7115 noc/noc#1658 ZTE.ZXA10.get_interfaces. Add SFUL, GFGM card type.
rare MR Title MR6769 Fix 3Com.SuperStack3_4500.get_interfaces script MR6807 DCN.DCWL.get_metrics. Convert to flot. MR6825 DCN.DCWL.get_metrics. Fix check 'channel-util' key in metrics. MR6825 DCN.DCWL.get_metrics. Fix check 'channel-util' key in metrics. MR6884 Fix Qtech.QSW.get_version script MR6897 ECI.HiFOCuS. Fix setup_script profile method for None user. MR6961 H3C.VRP.get_interface_status. Fix matchers typo. MR6976 Cambium.ePMP. Add SNMP support. MR6995 Eltex.WOP. Add SNMP support. MR7019 add get_lldp_neighbors Qtech.QOS MR7030 DLink_Industrial_cli Fix (config) prompt and autoanswer MR7086 fix Zyxel.DSLAM MR7130 noc/noc#2037 BDCOM.xPON.get_interfaces. Add Giga-Combo-FX-SFP interface type. MR7132 Fix P1 interfaces on port1 Qtech.QOS MR7132 Fix P1 interfaces on port1 Qtech.QOS MR7143 #2037 BDCOM.xPON.get_interfaces. Fix parse tagged vlans. MR7180 Расхождение коллекции
Collections Changes MR Title MR6837 inv.platforms: Huawei Technologies Co. S6730-H24X6C MR6838 inv.platforms: Huawei Technologies Co. S6330-H48X6C MR6839 inv.platforms: Huawei Technologies Co. S6330-H24X6C MR6885 Fix calculate MetricType for delta type. MR6914 Fix ComboPorts on ObjectModels. MR6936 ping: Switch to direct dispose protocol MR6993 noc/noc#1958 Add bulk mode for update object statuses on dispose message. MR7040 add profilecheckrules SKS-16E1-IP-ES-L MR7042 noc/noc#1729 Replace AlarmClass default severity by AlarmRule and labels. MR7079 noc/noc#2013 Add buckets to iter_collected_metrics for discovery. MR7085 add profilecheckrules zyxel.dslam VES-1624FT-55A MR7109 #2022 Add report config
Deploy Changes MR Title MR6541 Add redpanda role deploy MR6736 add lib yedit MR6877 Ansible tower add metrics check MR7061 Split requirements.txt MR7076 Ruff: Enable pylint (PLR) checks