NOC 23.1¶
23.1 release contains 274 bugfixes, optimisations and improvements.
Highlights¶
Topo service¶
With the 23.1 release, NOC got a new dedicated service for topology-related calculations. The topo
service tracks all topology-related changes and maintains an internal graph.
Before the 23.1 release, NOC relied on proper segmentation to calculate uplinks. The uplinks are necessary for topology-based root-cause analysis. We have found that a segment-based approach is hard to implement on specific kinds of networks:
- Flat networks without the segmentation.
- Networks with implicit segmentation.
- Segmented networks without explicit segment hierarchy.
Moreover, it was impossible to build uplinks for top-level root segments.
The new approach analyses the whole network and relies only on managed object levels. The levels are organic and reflect the object's role in the network. The service tracks changes and analyses all possible paths to exit points. In-memory graph reduces the imposed database load during massive topology changes.
Trivia
topo
stands fortopology
.un topo
meansa mouse
in italian.
Migrate FM Events to Click House.¶
Before the 23.1 release, NOC stored the FM events in MongoDB. The limitation of storage became the bottleneck to the system's scalability.
The lack of collection partitioning in MongoDB didn't allow us to clean the obsolete data without impact on system operations. The speed of deletion may be lower than the speed of insertion, rendering the implicit deletion or TTL indexes useless. The collection size grew fast. The only working solution was to drop the collection to reclaim the space.
We are working hard on the system performance tuning. The limited MongoDB's write performance became a stopper.
With the 23.1 release, we have moved the event storage to the ClickHouse and obtained the following benefits:
- The table partitioning allows maintaining of predictable storage usage by dropping obsolete partitions.
- ClickHouse has good write scalability.
- ClickHouse greatly overperforms MongoDB on write operations ever on single-server configurations.
- It is possible to analyze the events using built-in NOC BI.
- It is possible to use third-party tools like Tableau for data digging.
Managed Object Workflows¶
Managed Objects got full workflow integration like other resources. Now the workflow states define the discovery, monitoring, and management settings. The new approach allows greater flexibility and fits well with complex business scenarios.
Configurable Metric Collection Intervals¶
NOC 23.1 allows configuring different collection intervals for metrics. We also have implemented the collection sharding, which allows multiplexing high-cardinality metrics over time. Metrics collection from boxes with a huge amount of subinterfaces, like PON OLT or BRAS, now is possible. It's also possible to split metrics depending on the cost of collection on equipment. The "cheap" metrics may be collected frequently, while we can still collect "expensive" metrics more rarely.
Internal Kafka-compatible Message Streaming¶
NOC now supports Kafka-compatible API for internal message streaming. It's possible to choose between:
- Liftbridge, for simple installations.
- Redpanda for high-profile Linux installations.
- Kafka for other systems.
NOC supports the deployment and tuning of Redpanda out of the box, and we're planning to deprecate Liftbridge usage in the next releases.
We also have moved our own Liftbridge client implementation into the standalone Gufo Liftbridge package.
Customized Network Maps¶
We have reworked the network maps, and now it is possible to create customized maps with the arbitrary set of managed objects. We also have implemented "map generators" on the backend, allowing the auto-generation of custom maps.
New TT Adapter API¶
We have reworked our TT adapter API. Among the benefits are:
- Full typing support.
- Parts of the escalation scenario have been moved into the base classes of the adapter, allowing implementation of the customized scenarios.
Migration¶
FM users must run data conversion scripts manually:
./noc fix apply convert_fm_events
./noc fix apply convert_fm_outages
New features¶
MR | Title |
---|---|
MR6805 | noc/noc#1942 Customize map backend by loader. |
MR6883 | Add ImageStore for Network Map background files |
MR6908 | noc/noc#1968 Add User Configured Map. |
MR6942 | Move FM events to clickhouse |
MR6981 | noc/noc#1970 Add min_group_size settings for AlarmGroup. |
MR6990 | Add MessageStreamClient for stream work. |
MR7012 | noc/noc#1906 Add RedPandaClient to msgstream. |
MR7016 | noc/noc#2023 New list managed objects |
MR7024 | noc/noc#2022 Add ReportEngine. |
MR7031 | noc/noc#2024 Add interval to Metric Settings. |
MR7035 | noc/noc#2021 Add CPE initial collection and discovery. |
MR7051 | New TTSystem adapter API |
MR7053 | noc/noc#2022 Add Report model. |
MR7082 | noc/noc#2022 Add ReportForm. |
MR7108 | #1865 Add Workflow to ManagedObject. |
MR7125 | topo service |
Improvements¶
MR | Title |
---|---|
MR6676 | Add script labels |
MR6728 | noc/noc#1928 Correlator add downlink objects for detect ring RCA. |
MR6749 | Add ctl/memtrace endpoint for tracemalloc run. |
MR6780 | Set close escalation delay to reopens alarm control time. |
MR6781 | Update version to 22.2 |
MR6792 | Docker add worker, metrics. nginx web volume |
MR6798 | Bump Django version to 3.2.16 |
MR6803 | Refactor lib/database_storage module |
MR6804 | Check metrics service active when collected metrics. |
MR6813 | Add bulk mode to set interfacestatus state. |
MR6820 | Use bi_id field as sharding key for Metric Stream. |
MR6830 | Reset ManagedObject diagnostic when disabled Box. |
MR6831 | Check can_update_alarms settings when raise diagnostic alarm. |
MR6855 | Catch ModuleNotFoundError exception when import Windows pyximport library. |
MR6857 | Add apply alarm_class components to raise alarm on correlator. |
MR6858 | Update language translation file. |
MR6868 | Set SNMPTRAP/SYSLOG diagnostics set. |
MR6888 | Fix flake8 'l' error in web service |
MR6902 | Add How-To use hk for collect custom attributes. |
MR6903 | Add NOC shell used examples to doc. |
MR6904 | Add endpoint bulk_ping to activator service |
MR6911 | Add lib to .gitignore and delete lib/init.py |
MR6913 | Update links in welcome screen |
MR6918 | Set SNMP check status on Profile Check. |
MR6923 | Add diagnostic labels. |
MR6934 | Add ObjectDiagnostic Docs. |
MR6939 | noc/noc#1593 Add MapFiled for store BI Events vars. |
MR6945 | Fix ResourceGroup check on alarmescalation. |
MR6946 | Use polars library for Datasource. |
MR6953 | noc/noc#1939 Add service based dcs check params. |
MR6957 | Add sync_diagnostic_labels settings to global config. |
MR6969 | Improve SNMPError description. |
MR6977 | Add ERR_CLI_PASSWORD_TIMEOUT to Authentication Failed. |
MR6979 | Move Stream Config to separate msgstream module. |
MR6986 | Add custom TopologyGenerator settings to UI. |
MR6996 | Increase Map offset for isolated nodes. |
MR6997 | noc/noc#2005 Add selected custom map lookup |
MR7007 | Add InterfaceValidationPolicy check to ConfDB on_delete. |
MR7007 | Add InterfaceValidationPolicy check to ConfDB on_delete. |
MR7025 | Fix EventClass Rules test form |
MR7027 | Fix on_super_password in cli |
MR7034 | Add noc.js to change-ip script path |
MR7036 | Fix network-scan-docs link |
MR7038 | Check pager first on on_prompt script expect. |
MR7047 | Additional AlarmClass to link retention ttl-policy. |
MR7048 | Bump clickhouse version inside docker-compose |
MR7049 | Add DiscoveryIDCachePoison datasource. |
MR7054 | Add site-url for sitemap generation |
MR7055 | Add fm-reboots datasource |
MR7058 | Move change handler to ChangeTracker. |
MR7063 | Add inv-linkdetail datasource |
MR7064 | Update codeowners |
MR7065 | Combine python linters to a single CI task |
MR7069 | Add interval migration. |
MR7070 | Add NoSAProfileError error. |
MR7071 | Catch ResolutionError to RPCNoService. |
MR7077 | Update HP.Comware profile |
MR7084 | Add ttsystemstatds datasource |
MR7088 | Update help command to show custom commands |
MR7089 | Fix create threshold alarms on SLAProbe. |
MR7093 | Bump FastAPI version. |
MR7094 | noc/noc#2045 Bump mongoengine to 0.27 and pymongo to 4.3.3. |
MR7099 | Add rules to MetricConfig on Metrics Service for improve performance. |
MR7104 | Add meta section to metric stream message. |
MR7105 | translation fix |
MR7106 | Make ruff checks visible in joblogs |
MR7111 | Bump pyproj to 3.4.1. |
MR7112 | noc/noc#2046 Bump cachetools to 5.3.0 |
MR7113 | noc/noc#2049 Add upload MIB docs. |
MR7119 | noc/noc#2050 Add L2Domain to RemoteSystem model. |
MR7120 | noc/noc#1728 Check labels in match rule when rename and remove |
MR7123 | noc/noc#2022 Migrate Datasource-based tabled report. |
MR7138 | Speedup interface classification. |
MR7144 | #817 Add LAGs interface labels. |
MR7146 | #1539 Set pool_active param default to 1. |
MR7150 | noc/noc#2061 Add error when status is 500 to ManagedObject list |
MR7152 | noc/noc#2060 Add protected field to ManagedObject form |
MR7153 | noc/noc#2063 Add labels to WF Editior State inspector |
MR7154 | noc/noc#2062 Add state combo in filter |
MR7157 | Set Generic.Host as default SA Profile. |
MR7158 | Catch Kafkasender Service connect producer errors. |
MR7159 | Send reboot to BI directly |
MR7161 | Add migrations for allowed_models to Workflow. |
MR7163 | #816 Add inheritance interface profile to aggregate members. |
MR7164 | Remove b" from crashinfo list |
MR7166 | Set icontains to UI State filter condition. |
MR7170 | skip http-exception if status <400 |
MR7172 | Add ManagedObject topology DataStream. |
MR7187 | Refactor Diagnostic API. |
MR7195 | Move calculate uplink to TopoService. |
MR7198 | Add labels to setstatus request. |
MR7200 | Cached MetricDiscovery interval. |
Bugfixes¶
MR | Title |
---|---|
MR6747 | Fix time_delta when processed discovery metrics. |
MR6748 | Disable suggests in local profile on migration. |
MR6752 | Fix typo on Address.get_collision query. |
MR6759 | Watch escalation when reopen alarm. |
MR6760 | Fix typo on caps discovery logging. |
MR6763 | noc/noc#1936 Fix l2_domain filter on VLAN UI. |
MR6765 | Add send_message method to stub service. |
MR6770 | noc/noc#1937 Fix sender destination send params. |
MR6775 | Fix changelog reorder when compact. |
MR6777 | Split SNMP/CLI credential action on diagnostic discovery. |
MR6778 | Fix check alarm close error on deescalation process. |
MR6787 | noc/noc#1940 Revert Prefix import to Address. |
MR6789 | Fix reorder metrics states on compact procedures. |
MR6793 | noc/noc#1943 Remove vcfilter from NetworkSegment Application. |
MR6795 | Fix partition num on ServiceStub. |
MR6815 | Fix kafkasender stream settings. |
MR6818 | Fix Threshold Profile migration for unique name. |
MR6822 | noc/noc#1785 removed item_frequencies method in fm.reporteventsummary |
MR6823 | noc/noc#1954 Fix wait datastream ready on mx services. |
MR6827 | noc/noc#1955 Add port param to CLI protocol checker. |
MR6834 | Fix allocation order on vlan. |
MR6845 | fix Eltex.LTP get_version |
MR6849 | Fix etl changed labels when object labels is None. |
MR6854 | noc/noc#1956 Fix ZeroDivisionError when prefix usage calc. |
MR6861 | noc/noc#1956 Fix detect address usage with included special addresses. |
MR6866 | Fix send mx message on classifier and uptime reboot. |
MR6869 | noc/noc#1959 Add bulk param to model_set_state. |
MR6870 | Fix typo on NBI objectmetrics. |
MR6873 | noc/noc#1960 Fix error on service without router. |
MR6882 | Fix migration to OS.Linux profile. |
MR6892 | Fix rebuild route chains when delete MessageRoute. |
MR6900 | Fix calculate down_objects metric on Ping Service. |
MR6909 | Fix "no stream jobs" upon collection sync |
MR6912 | Fix OS.Linux profile migration if profile exists. |
MR6922 | noc/noc#1969 Add datastream param to detect changes. |
MR6926 | Add is_delta to _conversions key, for save unit conversation. |
MR6928 | Fix 'referenced before assignment' on escalation notify. |
MR6931 | Catch error when transmute processing on Route. |
MR6943 | Fix save in ManagedObject set_caps method. |
MR6949 | noc/noc#1985 Cleanup change commit typo. |
MR6951 | Fix iter datastream typo. |
MR6954 | Fix datastream send message when deleted. |
MR6962 | Fix migrate bi table if previous exists. |
MR6972 | Fix error when change mongoengine DictField. |
MR6980 | noc/noc#1984 Add counter flag to cdag probe for check shift counter type. |
MR6988 | Fix OS.Linux migration for ProfileCheckRule model. |
MR6989 | Fix typo. |
MR6998 | Fix getting slot name on stream config. |
MR6999 | noc/noc#2006 Fix migration threshold profile without function. |
MR7001 | #1998 Bump gufo-ping 0.2.4 |
MR7006 | Fix typo portal id on segment map generator. |
MR7009 | fix(peer): issue #2007, as-set format validation and position |
MR7013 | Fix MAC discovery policy filter settings typo. |
MR7050 | Cleanup bad documents on Object Status collection. |
MR7056 | Convert Event Vars to string. |
MR7080 | noc/noc#2039 Fix stucked UI when close tab |
MR7087 | Fix iter_row method on DataSource. |
MR7090 | Fix collection sync for EmbeddedDocumentListField. |
MR7092 | noc/noc#2041 Sync cursor after flush state on MetricServce. |
MR7098 | Fix aoikafka requirements. |
MR7102 | noc/noc#2047 fix me.up() is undefined |
MR7102 | noc/noc#2047 fix me.up() is undefined |
MR7114 | Fix typo on MessageRoute UI Form. |
MR7121 | Fix wipe user command. |
MR7122 | Fix Events log. |
MR7124 | noc/noc#2054 Fix rebuild datastream on DNS Model. |
MR7129 | Fix DNSZone datastream when IP address used on masters. |
MR7142 | Fix classifier Event Message format for send to ch.events. |
MR7148 | noc/noc#2059 Catch getting error for MAC Collection button |
MR7149 | Slice activator script result publish for large result size. |
MR7151 | Fix msgstream client for migrations. |
MR7168 | Rebuild managedobject datastream when changed discovery id. |
MR7173 | #2065 Place interface IP Addresses to object VRP if device not supported VRF. |
MR7183 | Use Generic.Host profile for unknown peering point SA profile. |
MR7189 | Fix liftbridge client alter stream. |
MR7194 | Fix getting external stream partition on Router. |
MR7196 | Fix error when getting datastream format message headers. |
MR7197 | Fix csvutil processed import. |
MR7199 | noc/noc#2068 Disable clean when collection sync for instances without uuid. |
Code Cleanup¶
MR | Title |
---|---|
MR6800 | Refactor lib/highlight module |
MR6801 | Refactor lib/template module |
MR6802 | Remove lib/datasource module |
MR6829 | Move lib/app directory into services/web/base |
MR6987 | Cleanup print on config class. |
MR7052 | Ruff linter |
MR7062 | Simplify mib expressions |
MR7072 | devcontainer.json: Move settings and extensions into customizations.vscode |
MR7073 | ruff: Enable W - pycodestyle warnings |
MR7074 | ruff: Enable flake8-builtin (A) diagnostics |
MR7075 | Ruff: Enable pylint (PLC, PLE) checks |
MR7078 | ruff: Fix PLW0120 else clause on loop without a break statement |
MR7134 | Catch git safe.directory error when getting version. |
Profile Changes¶
Alsitec.24xx¶
MR | Title |
---|---|
MR6810 | Alstec.24xx.get_metrics. Fix metric units. |
Cisco.IOS¶
MR | Title |
---|---|
MR7117 | noc/noc#1920 Cisco.IOS. Cleanup output SNMP CDP neighbors. |
Cisco.IOSXR¶
MR | Title |
---|---|
MR7059 | Cisco.IOSXR get_inventory error asr9k |
DLink.DxS¶
MR | Title |
---|---|
MR7103 | DLink.DxS.get_interfaces: Fix CLI returns wrong oper_status |
Dahua.DH¶
Eltex.MES¶
MR | Title |
---|---|
MR6915 | Eltex.MES. Add retry authentication to pattern_more. |
MR6965 | fix interface description Eltex.MES.get_interfaces |
MR6974 | Eltex.MES. Add MES-3316F and MES-3348F oid. |
MR7004 | fix Stack Members in get_capabilities Eltex.MES |
MR7026 | Eltex.MES. Add MES-2348P to detect oid version. |
MR7041 | fix get_inventory Eltex.MES. Serial fix |
MR7041 | fix get_inventory Eltex.MES. Serial fix |
MR7066 | inv.platforms: Eltex MES-2324FB |
MR7068 | mes2324fb |
MR7097 | fix portchannel Eltex.MES |
Eltex.MES24xx¶
MR | Title |
---|---|
MR6842 | Fix Eltex.MES24xx.get_version script |
Generic¶
MR | Title |
---|---|
MR6746 | Use Attribute capability for get_inventory scripts. |
MR6896 | Generic.get_capabilities. Filter non-printable character on sysDescr. |
MR6959 | Generic.get_interface_status_ex. Ignore unknown interface on interfaces param. |
MR6959 | Generic.get_interface_status_ex. Ignore unknown interface on interfaces param. |
MR6964 | add chunk_size to Generic.get_interfaces |
MR7155 | noc/noc#1983 Add return script execution metrics on Activator.script. |
MR7165 | Fix units on collecting SLA metrics on profiles. |
Hikvision.DSKV8¶
Huawei.MA5600T¶
Huawei.VRP¶
Juniper.JUNOS¶
MikroTik.RouterOS¶
NAG.SNR¶
MR | Title |
---|---|
MR7060 | fixing NAG.SNR.get_inventory |
Raisecom.ROS¶
MR | Title |
---|---|
MR6767 | Fix Raisecom.ROS.get_version script |
ZTE.ZXA10¶
MR | Title |
---|---|
MR7115 | noc/noc#1658 ZTE.ZXA10.get_interfaces. Add SFUL, GFGM card type. |
rare¶
MR | Title |
---|---|
MR6769 | Fix 3Com.SuperStack3_4500.get_interfaces script |
MR6807 | DCN.DCWL.get_metrics. Convert to flot. |
MR6825 | DCN.DCWL.get_metrics. Fix check 'channel-util' key in metrics. |
MR6825 | DCN.DCWL.get_metrics. Fix check 'channel-util' key in metrics. |
MR6884 | Fix Qtech.QSW.get_version script |
MR6897 | ECI.HiFOCuS. Fix setup_script profile method for None user. |
MR6961 | H3C.VRP.get_interface_status. Fix matchers typo. |
MR6976 | Cambium.ePMP. Add SNMP support. |
MR6995 | Eltex.WOP. Add SNMP support. |
MR7019 | add get_lldp_neighbors Qtech.QOS |
MR7030 | DLink_Industrial_cli Fix (config) prompt and autoanswer |
MR7086 | fix Zyxel.DSLAM |
MR7130 | noc/noc#2037 BDCOM.xPON.get_interfaces. Add Giga-Combo-FX-SFP interface type. |
MR7132 | Fix P1 interfaces on port1 Qtech.QOS |
MR7132 | Fix P1 interfaces on port1 Qtech.QOS |
MR7143 | #2037 BDCOM.xPON.get_interfaces. Fix parse tagged vlans. |
MR7180 | Расхождение коллекции |
Collections Changes¶
MR | Title |
---|---|
MR6837 | inv.platforms: Huawei Technologies Co. S6730-H24X6C |
MR6838 | inv.platforms: Huawei Technologies Co. S6330-H48X6C |
MR6839 | inv.platforms: Huawei Technologies Co. S6330-H24X6C |
MR6885 | Fix calculate MetricType for delta type. |
MR6914 | Fix ComboPorts on ObjectModels. |
MR6936 | ping: Switch to direct dispose protocol |
MR6993 | noc/noc#1958 Add bulk mode for update object statuses on dispose message. |
MR7040 | add profilecheckrules SKS-16E1-IP-ES-L |
MR7042 | noc/noc#1729 Replace AlarmClass default severity by AlarmRule and labels. |
MR7079 | noc/noc#2013 Add buckets to iter_collected_metrics for discovery. |
MR7085 | add profilecheckrules zyxel.dslam VES-1624FT-55A |
MR7109 | #2022 Add report config |