NOC 20.4¶
20.4 release contains [225](https://code.getnoc.com/noc/noc/merge_requests?scope=all&state=merged&milestone_title=20.4) bugfixes, optimisations and improvements.
Highlights¶
Generic Message Exchange¶
NOC can send notifications to email/telegram via Notification groups on alarms and configuration changes. Notifications are useful to take human attention to possible problem. To notify push data to external system NOC uses DataStream approach. External systems have to pull changes and process them according own logic.
NOC 20.4 generalises all data pushed to external systems to the concepts
of messages. Message is the piece of data
which can be passed from NOC to outside. Messages can be of different
types:
- alarms
- object inventory data
- configuration
- configuration change
- reboot
- new object
- system login
- etc.
NOC can generate messages on certain condition. Humans and soulless robots can have interest in messages. So we need some kind of routing.
NOC 20.4 introduces new service, called Message Exchanger or mx. Like mail
servers, mx receives the message,
processes it headers and decides where to route the message. mx relies on family of the sender processes. Each kind of sender can
deliver the message outside of the system. Each sender supports
particular exchange protocol, hiding delivery details from mx. mx can
transform messages or apply the templates to convert delivered message
to desired format. mx, senders, message generation and transport
conventions became the viable part of NOC called Generalized Message Exchange or GMX.
NOC 20.4 introduces kafkasender service,
used to push data to a Kafka message bus. We're planning to convert
other senders (mailsender, tgsender, etc) to GMX in the NOC 21.1.
Kafka Integration¶
NOC 20.4 introduces the kafkasender
service, the part of GMX. Kafka became mainstream message bus in telecom
operation, and NOC is being able to push all data, available via
DataStream to a Kafka for following routing and processing, reducing
amount of mutual system-to-system integrations.
Biosegmentation¶
Biosegmentation has been introduced in NOC 20.3 as ad-hoc segmentation
process. Process relies on the series of trials. Each trial can lead to merging or
fixing the structure of segments tree. Current implementation relies on
inter-segment links. But sometimes the segment hierarchy must be
established before the linking process.
NOC 20.4 introduces additional MAC-based biosegmentation approach,
called Vacuum Bulling, allowing to build
segment hierarchy basing on MAC addresses, collected on interfaces.
Ordered Message Queue¶
NOC uses NSQ as internal message queue. Lightweight and hi-performance solution shows good result usually. But after the time architectural corner cases became more and more visible:
- NSQ designed to be always-on-dial solution. nsqd is on every host, communicating to publisher via localhost loopback. In modern container world that fact being bug, not a feature. Reliance on absolute reliability of connection between publisher and broker became unacceptable.
- Subscribers have to communicate with nsqlookup service to find the hosts containing data. Then they have to establish direct connection with them. Official python NSQ client uses up to 5 tcp connections. So amount of connection grows fast with grow of amount producers and subscribers.
- Official python NSQ client's error handling is far from ideal. Code base is old and obscure and hard to maintain. No asyncio version is available.
- No fault tolerance. Failed nsqd will lead to the lost messages. No message replication at all.
- Out-of-order messages. Message order may change due to internal nsqd implementation and to client logic. Applications like fault management relies on message order. Closing events must follow opening ones. Otherwise the hanging alarms will pollute the system.
During the researches we'd decided we need message system with commit-log approach. Though Kafka is industrial standard, its dependency on JVM and Zookeeper may be a burden. We stopped on Liftbridge. Liftbridge is clean and simple implementation of proven Kafka storage and replication algorithms.
We'd ported events topics to Liftbridge,
fixing critical events ordering problem. GMX topics uses Liftbridge too.
Next release (21.1) will address remaining topics.
FastAPI¶
We'd starting migration from Tornado to FastAPI. Main motivation is:
- Tornado has bring generator-based asynchronous programming to Python2. Python3 has introduced native asynchronous programming along with asyncio library. Later Tornado versions are simple wrappers atop asyncio.
- FastAPI uses Pydantic for request and response validation. We'd considered Pydantic very useful during out ETL refactoring
- FastAPI generates OpenAPI/Swagger scheme, improving integration capabilities.
- FastAPI is fast.
We'd ported login service to FastAPI. JWT
had replaced Tornado's signed cookies. We'd also implemented the set of
OAuth2-based endpoints for our next-generation UI.
ETL Improvements¶
ETL has relied on CSV format to store extracted data. Though it simple and wraps SQL responses in obvious way, it have some limitation:
- Metadata of extracted fields stored outside of extractor, in the loader.
- Field order hardcoded in loader
- Fields has no type information, leading to leaky validation
- No native way to pass complex data structures, like list and nested documents
- Extractors must return empty data for long time deprecated fields
NOC 20.4 introduces new extractor API. Instead of lists, passed to CSV, extractor returns pydantic model instances. Pydantic models are defined in separate modules and reused by both extractors and loaders. Interface between extractor and loader became well-defined. Models perform data validation on extraction and load stages. So errors in extractor will lead to informative error message and to the stopping of process.
ETL now uses JSON Line format (jsonl) - a bunch of JSON structures for each row, separated by newlines. So it is possible to store structures with arbitrary complexity. We'd ever provided the tool to convert legacy extracted data to a new format.
SNMP Rate Limiting¶
NOC 20.4 allows to limit a rate of SNMP requests basing on profile or platform settings. This reduces impact on the platforms with weak CPU or slow control-to-dataplane bus.
orjson¶
orjson is used instead of ujson for JSON serialization/deserialization.
New profiles¶
- KUB Nano
- Qtech.QFC
Migration¶
Tower Upgrade¶
Please upgrade Tower up to 1.0.0 or later before continuing NOC installation/upgrade process. See [Tower upgrade process documentation](https://code.getnoc.com/noc/tower/-/blob/master/UPDATING.md) for more details.
Elder versions of Tower will stop deploy with following error message
Liftbridge/NATS¶
NOC 20.4 introduces Liftbridge service for ordered message queue. You should deploy at least 1 Liftbridge and 1 NATS service instance. See more details in Tower's service configuration section.
ETL¶
Run fix after upgrade
` $ ./noc fix apply fix_etl_jsonl`
New features¶
| MR | Title | 
|---|---|
| MR1668 | Added function get alarms for controllers and devices | 
| for periodic job. | |
| MR4223 | FastAPI login service | 
| MR4256 | Add Project to ETL | 
| MR4274 | New profile Qtech.QFC | 
| MR4290 | Liftbridge client | 
| MR4361 | #1363 ifdesc: Interface autocreation | 
| MR4388 | Add new controller profile KUB Nano | 
| MR4398 | mx service | 
| MR4403 | kafkasender service | 
| MR4473 | #1368 Model Interface scopes | 
| MR4488 | #892 ETL JSON format | 
| MR4519 | noc/noc#1356 SNMP Rate Limit | 
| MR4538 | Configurable LDAP server policies | 
| MR4567 | Biosegmentation: Vacuum bulling | 
Improvements¶
| MR | Title | 
|---|---|
| MR4225 | Fix ddash refid | 
| MR4233 | Allow alternative locations for binary speedup modules | 
| MR4236 | Catch when sentry-sdk module enabled but not installed. | 
| MR4246 | Fix Qtech.BFC profile | 
| MR4261 | noc/noc#1304 Replace ujson with orjson | 
| MR4264 | runtime optimization ReportMaxMetrics | 
| MR4275 | ElectronR.KO01M profile scripts | 
| MR4278 | noc/noc#1383 Add IfPath collator to confdb | 
| MR4280 | noc/noc#1381 Add alarm_consequence_policy to TTSystem settings. | 
| MR4281 | #1384 Add source-ip aaa hints. | 
| MR4287 | Add round argument to metric scale function | 
| MR4293 | Debian-based docker image | 
| MR4296 | Change python to python3 when use ./noc | 
| MR4314 | Update Card for Sensor Controller | 
| MR4320 | Fill capabilities for beef. | 
| MR4338 | New Grafana dashboards | 
| MR4344 | Profile fix controllers | 
| MR4348 | exp_decaywindow function | 
| MR4349 | Controller/fix2 | 
| MR4354 | add_interface-type_Juniper_JUNOSe | 
| MR4358 | Fix Qtech.BFC profile | 
| MR4364 | LiftBridgeClient: Proper handling of message headers | 
| MR4369 | LiftBridgeClient: fetch_metadata() streamandwait_for_streamparameters | 
| MR4380 | Add to_json for thresholdprofile | 
| MR4383 | Update threshold handler | 
| MR4384 | Add collators to some profiles. | 
| MR4389 | Electron fix profile | 
| MR4391 | add new metric Qtech.BFC | 
| MR4394 | fix some controllers ddash/metrics | 
| MR4396 | Fix inerfaces name Qtech.BFC | 
| MR4399 | Up report MAX_ITERATOR to 800 000. | 
| MR4402 | mx: Use FastAPIService | 
| MR4405 | liftbridge cursor persistence api | 
| MR4407 | add_columns_total_reportmaxmetrics | 
| MR4416 | Add csv+zip format to ReportDetails. | 
| MR4417 | Add Long Alarm Archiveoptions to ReportAlarm, from Clickhouse table. | 
| MR4428 | Add available_only options to ReportDiscoveryTopologyProblem. | 
| MR4432 | Reset NetworkSegment TTL cache after remove. | 
| MR4433 | Change is_uplink criterias priority on segment MAC discovery. | 
| MR4439 | fix_reportmaxmetrics | 
| MR4447 | Add octets_in_sum and octets_out_sum columns to ReportMetrics. | 
| MR4453 | ConfDB syslog | 
| MR4455 | Fix controllers profiles, ddash | 
| MR4457 | Fix get_iface_metrics | 
| MR4462 | noc/noc#1392 Add search port by contains ifdescription token to ifdecr discovery. | 
| MR4464 | LiftBridge client: Connection pooling | 
| MR4470 | Add ReportMovedMacApplication application. | 
| MR4475 | Add sorted to tags application. | 
| MR4477 | noc/noc#1416 Extend ConfDB meta section. | 
| MR4479 | Add get_confdb_query method to ManagedObjectSelector and MatchPrefix ConfDB function. | 
| MR4480 | Add csv_zip file format to MetricsDetail Report. | 
| MR4483 | noc/noc#1397 Additional biosegtrial criteria to policy. | 
| MR4486 | Add migrate_ts field to ReportMovedMac. | 
| MR4501 | noc/noc#1428 Add InterfaceDiscoveryApplicator for fill ConfDB info from interface discovery. | 
| MR4508 | add_csvzip_reportmaxmetrics | 
| MR4511 | Fix ./noc discovery for LB | 
| MR4515 | noc/noc#1432 lb client: Configurable message size limit | 
| MR4516 | fix csv_import view | 
| MR4517 | Additional options to segment command | 
| MR4535 | Bump networkx/numpy requirements | 
| MR4539 | lb client: increased resilience | 
| MR4547 | Add JOB_CLASS param to core.defer util. | 
| MR4549 | ETL model Reference | 
| MR4551 | add column reboots in fm.reportalarmdetail | 
| MR4553 | fix processing trunk port vlan for HP A3100-24 (v5.20.99) | 
| MR4565 | Add ttl-policy argument to link command. | 
| MR4571 | Filter Multicast MACs on Moved MAC report. | 
| MR4573 | Add api_unlimited_row_limit param | 
| MR4579 | liftBridge: publish_async waits for all the acks | 
| MR4582 | noc/noc#1371 Add schedule_discovery_config handler to events.discovery. | 
| MR4592 | noc/noc#1400 Migrate InterfaceClassification to ConfDB. | 
| MR4602 | Add MatchAllVLAN and MatchAnyVLAN function to ConfDB. | 
| MR4607 | Bump pytest version | 
| MR4624 | add metrics Subscribers \| SummaryAlcatel.TIMOS | 
| MR4629 | noc/noc#1440 Use all macs on 'Discovery ID cache poison' report. | 
| MR4630 | Convert limit from dcs to int. | 
| MR4632 | Add Telephony SIP metrics graph. | 
| MR4633 | Always uplinks calculate. | 
Bugfixes¶
| MR | Title | 
|---|---|
| MR4249 | Fix card MO | 
| MR4251 | Fix status RNR | 
| MR4258 | Change field_num on ReportObjectStat | 
| MR4269 | noc/noc#1374 Fix typo on datastream format check. | 
| MR4285 | Fix Profile Check Summary typo. | 
| MR4303 | #1335 ConfDB: Fix andinsideorcombination | 
| MR4310 | Fix RNR affected AD | 
| MR4319 | Add err_status to beef snmp_getbulk_response method. | 
| MR4321 | Convert oid on snmp raw_varbinds. | 
| MR4322 | Fix event clean | 
| MR4327 | Convert set to list on orjson dumps. | 
| MR4328 | Add xmac discovery to ReportDiscoveryResult. | 
| MR4363 | ./noc migrate-liftbridge: Do not create streams for disabled services | 
| MR4368 | Fix hash_int() | 
| MR4373 | Fix typo on Calcify Biosegmentation policy. | 
| MR4409 | Add get_pool_partitions method to TrapCollectorService. | 
| MR4418 | Add id field to project etl loader. | 
| MR4419 | Fix multiple segment args on discovery command. | 
| MR4423 | noc/noc#1399 Delete Permissions and Favorites on wipe user. | 
| MR4424 | noc/noc#1375 Fix DEFAULT_STENCIL use on SegmentTopology. | 
| MR4425 | noc/noc#1396 AlarmEscalation. Use item delay for consequence escalation. | 
| MR4426 | Fix extapp group regex splitter to non-greedy. | 
| MR4430 | Fix ManagedObject _reset_caches key for _id_cache. | 
| MR4452 | noc/noc#1406 Use system username for JWT. | 
| MR4461 | noc/noc#1229 Fix user cleanup Django Admin Log. | 
| MR4472 | Add audience param to is_logged jwt.decode. | 
| MR4474 | Add 120 sec to out_of_order escalation time. | 
| MR4485 | noc/noc#688 Fix invalidate l1 cache for ManagedObject. | 
| MR4492 | Skipping files if already compressed on destination. | 
| MR4497 | noc/noc#1427 Fix whois ARIN url. | 
| MR4498 | Fix object data use. | 
| MR4502 | Move orjson defaults to jsonutils. | 
| MR4505 | Bump ssh2-python to 0.23. | 
| MR4506 | pm/utils -> Fix dict | 
| MR4507 | Some etl loader fixes. | 
| MR4513 | noc/noc#1423 Convert pubkey to bytes. | 
| MR4514 | Convert empty object data to list on 0020 migration. | 
| MR4518 | Fix vendors and handlers migrations | 
| MR4522 | Fix typo on ifdescr discovery. | 
| MR4524 | #1312 Consistent VPN ID generation | 
| MR4540 | Fix customfields for mongoengine. | 
| MR4555 | Revert uvicorn to 0.12.1. | 
| MR4561 | Fix typo on interfaceprofile UI Application. | 
| MR4564 | Fix trace when execute other script that command on MRT. | 
| MR4569 | Fix typo on MRT service. | 
| MR4575 | Add static_service_groups and static_client_groups clean_map to managedobject etl loader. | 
| MR4590 | Fix login cookie ttl | 
| MR4594 | Fix ETL loader change. | 
| MR4595 | Fix extra filter when set extra order. | 
| MR4598 | Fix datetime field on Service ETL model. | 
| MR4614 | Fix SNMP_GET_OIDS on get_chassis_id scripts to list. | 
| MR4627 | noc/noc#1439 Fix tag contains query for non latin symbol. | 
Code Cleanup¶
| MR | Title | 
|---|---|
| MR4254 | Cleanup flake. | 
| MR4301 | Fix vendor docs test | 
| MR4317 | Updated .dockerignore | 
| MR4360 | Remove unused dependencies: tornadis, mistune | 
| MR4362 | Update blinker, bsdiff, cachetools, crontab, | 
| progressbar2, psycopg2, python-dateutil versions | |
| MR4465 | Remove legacy scripts/ci-run | 
| MR4496 | Fix formatting | 
| MR4533 | Bump requirements | 
| MR4587 | Fix collect beef for orjson. | 
| MR4589 | Fix some lint errors | 
| MR4622 | Fix Service etl model. | 
Profile Changes¶
Cisco.IOS¶
| MR | Title | 
|---|---|
| MR4316 | Update Cisco.IOS profile to support more physical | 
| interfaces | 
Cisco.IOSXR¶
| MR | Title | 
|---|---|
| MR4408 | added interfacetypes for IOSXR platform | 
DLink.DxS¶
| MR | Title | 
|---|---|
| MR4355 | DLink.DxS.get_metrics. Fix SNMP Error when 'CPU | 
| Usage' metric. | |
| MR4434 | Fix Dlink.DxS profile. | 
EdgeCore.ES¶
| MR | Title | 
|---|---|
| MR4556 | EdgeCore.ES.get_spanning_tree. Fix getting port_id | 
| for Trunk interface. | 
Eltex.MES¶
| MR | Title | 
|---|---|
| MR4217 | test tacacs1.yml crashed. AssertionError: assert \[\] == \[(right syntax)\] | 
| MR4262 | Eltex.MES.get_capabilities. Fix detect stack mode by SNMP. | 
| MR4523 | Eltex.MES.get_vlans. Use Generic script. | 
| MR4615 | Eltex.MES. Add 1.3.6.1.4.1.89.53.4.1.7.1 to display_snmp. | 
Eltex.MES24xx¶
| MR | Title | 
|---|---|
| MR4381 | Fix Eltex.MES24xx.get_interfaces script | 
Extreme.XOS¶
| MR | Title | 
|---|---|
| MR4404 | Fix Extreme.XOS.get_lldp_neighbors script | 
Generic¶
| MR | Title | 
|---|---|
| MR4239 | Generic.get_capabilities add SNMP \| OID \|EnterpriseIDlen check. | 
| MR4342 | Generic.get_arp. Cleanup snmp for py3 | 
| MR4613 | Generic.get_chassis_id. Add 'LLDP-MIB::lldpLocChassisId' oid to display_hints. | 
Huawei.MA5600T¶
| MR | Title | 
|---|---|
| MR4611 | Huawei.MA5600T.get_spanning_tree. Fix waited | 
| command. | 
Huawei.VRP¶
| MR | Title | 
|---|---|
| MR4422 | Huawei.VRP. Add NE8000 version detect. | 
| MR4550 | Huawei.VRP fix normalize_enable_stp | 
| MR4557 | Huawei.VRP. Check nexthop type on ConfDB route normalizer. | 
Juniper.JUNOS¶
| MR | Title | 
|---|---|
| MR4324 | Fix Juniper.JUNOS.get_chassis_id script | 
| MR4377 | Fix Juniper.JUNOS.get_interfaces script | 
NAG.SNR¶
| MR | Title | 
|---|---|
| MR4351 | Fix NAG.SNR.get_interfaces script | 
| MR4481 | Fix NAG.SNR.get_lldp_neighbors script | 
Qtech.QSW¶
| MR | Title | 
|---|---|
| MR4576 | Fix Qtech.QSW profile | 
Qtech.QSW2800¶
| MR | Title | 
|---|---|
| MR4444 | Qtech.QSW2800. Add sdiag prompt. | 
| MR4542 | Fix Qtech.QSW2800.get_version script | 
Ubiquiti.AirOS¶
| MR | Title | 
|---|---|
| MR4240 | Ubiquiti.AirOS.get_version. Cleanup for py3. | 
rare¶
| MR | Title | 
|---|---|
| MR4214 | ConfDB tests profile Raisecom.RCIOS. | 
| MR4241 | Alstec.MSPU.get_version. Fix HappyBaby platform regex. | 
| MR4265 | Fix ZTE.ZXA10 profile | 
| MR4272 | Eltex.WOPLR. Add get_interface_type method to profile. | 
| MR4279 | Update Rotek.BT profile | 
| MR4288 | Add Enterasys.EOS profile | 
| MR4295 | Fix metric name | 
| MR4302 | add snmp in profile Juniper.JUNOSe | 
| MR4313 | Rotek.BT fix get_metrics | 
| MR4335 | add snmp in profile Alcatel.TIMOS | 
| MR4353 | Update ZTE.ZXA10 profile to support C610 | 
| MR4365 | Fix prompt matching in Fortinet.Fortigate profile | 
| MR4371 | Alcatel.OS62xx.get_version. Set always_prefer to S for better platform detect. | 
| MR4376 | fix_get_lldp_neighbors_NSN.TIMOS | 
| MR4406 | Add AcmePacket.NetNet profile. | 
| MR4431 | noc/noc#1391 Cisco.WLC. Add get_interface_type method. | 
| MR4536 | add_bras_metrics_Juniper_JUNOSe | 
| MR4570 | Fix h3c get_switchport | 
| MR4578 | Eltex.ESR add snmp support | 
| MR4583 | Update DCN.DCWS profile.py | 
| MR4585 | Update sa/profiles/DCN/DCWS/get_config.py | 
| MR4586 | Ericsson.SEOS.get_interfaces. Migrate to Generic SNMP. | 
| MR4596 | Fix DLink.DxS_Smart profile | 
| MR4600 | Huawei.VRP3.get_interface_status_ex. Fix return in/out speed as kbit/sec. | 
| MR4610 | Huawei.VRP3.get_interface_status_ex. Fix trace when SNMP Timeout. | 
| MR4617 | NSN.TIMOS.get_interfaces. Fix empty MAC on output. | 
Collections Changes¶
| MR | Title | 
|---|---|
| MR4277 | Add more Juniper part number | 
| MR4282 | Add new caps - Sensor | Controller | 
| MR4294 | New Environment metrics | 
| MR4305 | Fix bad json on collection. | 
| MR4307 | Cleanup HP fm.eventclassificationrule. | 
| MR4337 | Fix get metrics script for controller | 
| MR4345 | Fix dev.specs SNMP chassis for Huawei and Generic. | 
| MR4411 | Add some Juniper models | 
| MR4451 | Add some Juniper models | 
| MR4460 | noc/noc#1411 Add PhonePeer MetricScope. | 
| MR4499 | Fix default username BI dashboard. | 
| MR4520 | sa.profilecheckrules: Eltex | MES | MES5448 sysObjectID.0 | 
| MR4625 | Add AcmePacket Vendor. | 
Deploy Changes¶
| MR | Title | 
|---|---|
| MR4478 | noc/noc#1241 Merge ansible deploy to master repo | 
| MR4623 | Add liftbridge deployflow | 
| MR4637 | Fix auth path redirect | 
| MR4640 | Catch trace on etl loader when delete lost mapping. | 
| MR4643 | Change start condition |