Перейти к содержанию

NOC 23.1 is Released

23.1 release contains 274 bugfixes, optimisations and improvements.

Highlights

Topo service

With the 23.1 release, NOC got a new dedicated service for topology-related calculations. The topo service tracks all topology-related changes and maintains an internal graph.

Before the 23.1 release, NOC relied on proper segmentation to calculate uplinks. The uplinks are necessary for topology-based root-cause analysis. We have found that a segment-based approach is hard to implement on specific kinds of networks:

  • Flat networks without the segmentation.
  • Networks with implicit segmentation.
  • Segmented networks without explicit segment hierarchy.

Moreover, it was impossible to build uplinks for top-level root segments.

The new approach analyses the whole network and relies only on managed object levels. The levels are organic and reflect the object's role in the network. The service tracks changes and analyses all possible paths to exit points. In-memory graph reduces the imposed database load during massive topology changes.

Trivia

  • topo stands for topology.
  • un topo means a mouse in italian.

Migrate FM Events to Click House.

Before the 23.1 release, NOC stored the FM events in MongoDB. The limitation of storage became the bottleneck to the system's scalability.

The lack of collection partitioning in MongoDB didn't allow us to clean the obsolete data without impact on system operations. The speed of deletion may be lower than the speed of insertion, rendering the implicit deletion or TTL indexes useless. The collection size grew fast. The only working solution was to drop the collection to reclaim the space.

We are working hard on the system performance tuning. The limited MongoDB's write performance became a stopper.

With the 23.1 release, we have moved the event storage to the ClickHouse and obtained the following benefits:

  • The table partitioning allows maintaining of predictable storage usage by dropping obsolete partitions.
  • ClickHouse has good write scalability.
  • ClickHouse greatly overperforms MongoDB on write operations ever on single-server configurations.
  • It is possible to analyze the events using built-in NOC BI.
  • It is possible to use third-party tools like Tableau for data digging.

Managed Object Workflows

Managed Objects got full workflow integration like other resources. Now the workflow states define the discovery, monitoring, and management settings. The new approach allows greater flexibility and fits well with complex business scenarios.

Configurable Metric Collection Intervals

NOC 23.1 allows configuring different collection intervals for metrics. We also have implemented the collection sharding, which allows multiplexing high-cardinality metrics over time. Metrics collection from boxes with a huge amount of subinterfaces, like PON OLT or BRAS, now is possible. It's also possible to split metrics depending on the cost of collection on equipment. The "cheap" metrics may be collected frequently, while we can still collect "expensive" metrics more rarely.

Internal Kafka-compatible Message Streaming

NOC now supports Kafka-compatible API for internal message streaming. It's possible to choose between:

  • Liftbridge, for simple installations.
  • Redpanda for high-profile Linux installations.
  • Kafka for other systems.

NOC supports the deployment and tuning of Redpanda out of the box, and we're planning to deprecate Liftbridge usage in the next releases.

We also have moved our own Liftbridge client implementation into the standalone Gufo Liftbridge package.

Customized Network Maps

We have reworked the network maps, and now it is possible to create customized maps with the arbitrary set of managed objects. We also have implemented "map generators" on the backend, allowing the auto-generation of custom maps.

New TT Adapter API

We have reworked our TT adapter API. Among the benefits are:

  • Full typing support.
  • Parts of the escalation scenario have been moved into the base classes of the adapter, allowing implementation of the customized scenarios.

Migration

FM users must run data conversion scripts manually:

./noc fix apply convert_fm_events
./noc fix apply convert_fm_outages