Out-of-band network for utility company

In 2018, my employers were contracted by a national utility company to rollout a new MPLS network core across the country. The chosen hardware vendor was Juniper (including JUNOS SPACE for all provisioning), and I was tasked with producing a design for the out-of-band solution.

Requirements

In addition to the expected requirements of an out-of-band solution, the following were stipulated:

  • All communications across the OOB network were to be encrypted. This included syslog, SNMP, even ICMP.
  • The OOB network was to prefer in-band connectivity when available, and fail over to 4G mobile data when in-band connectivity was lost.
  • In the event of a failover to 4G, the operational process for support staff was to remain the same - i.e., access to remote nodes (even disconnected ones) was to remain unchanged, including IP addressing, monitoring, and alarming.

Design

The solution employed an OpenGear ACM5004 (4 ethernet, 4 console, dual SIMs) unit at each site. The ACMs were to be connected to a “management” VRF on the MPLS core.

A dedicated hardware appliance of the OpenGear “Lighthouse” platform (basically the OpenGear OS on a Dell 1U host) was deployed in a datacenter, connected to both the management VRF and to an internet-facing firewall. The lighthouse host was configured as an OpenVPN server, and the ACM units were configured as OpenVPN clients.

The ACM units would use either the management VRF or 4G mobile data to establish VPN connections back to the VPN server. OpenVPN was configured such that each client (ACM unit) received a static IP, and advertised a unique /24 subnet back to the OpenVPN server.

A second lighthouse hardware appliance was deployed to a dedicated “northbound” subnet, which housed the JUNOS SPACE components. This appliance was configured with a fully operational instance of OpenGear Lighthouse (for managing remote console access to all nodes), as well as an OpenVPN client configuration. The SPACE lighthouse appliance advertised the northbound subnet range into the OpenVPN, establishing connectivity between SPACE and the nodes.

Connectivity to every element of OOB was provided over the OpenVPN, meeting the requirement for total communications encryption. Consistent access to all the nodes was also addressed, since all the management addressing was established by OpenVPN, independent of whether the connection was terminated by the carrier VRF, or mobile data.

Challenges

One of the surprising challenges encountered during the rollout was that the OOB network was so resilient, that we had to work harder than anticipated to detect outages. Since every element was monitored on a OOB-protected IP address, the elements would still appear to be available even in the event of a total site connectivity failure.

This challenge was addressed by additionally monitoring each ACM’s point-to-point inband IP on the management VRF. We were ultimately able to alert on either (1) entire site down, often due to power or (2) site links down (they were rural sites, this happened fairly often).

Process

The customer had an aggressive schedule of rolling out 3 sites per week (we managed the first 50 or so). To meet this pace, we automated the rollout of new sites using a combination of Ansible (my contribution), and JUNOS Space templating (other engineers). After we’d optimized the process, we could completely establish a site, from-scratch, within 4 hours. At a high level, the process went like this:

  1. Site tech (customer) arrives onsite with factory-fresh hardware (router, switch, firewall), and pre-provisioned ACM unit.
  2. Tech plugs in ACM unit (establishing VPN over 4G), and proceeds with assembling hardware.
  3. Our engineer (remotely) executes ansible script against the ACM, which:
    1. Temporarily enables FTP server on the ACM.
    2. Instructs the Juniper ACX (no SSH on factory-fresh models) to upgrade its firmware via FTP from local ACM (1GB firmware image over rural mobile data connection took too long).
    3. Disables FTP server on ACM.
    4. Applies bootstrap config to local switch, router, firewall.
  4. Tech uses internal templating tool to deploy base config to devices, establishing connectivity.
  5. Tech enrols devices in JUNOS Space, concludes deployment.

Result

The design and deployment was highly successful, and remains a critical part of the customer’s infrastructure.

Image courtesy of unsplash-logo Matthew Henry


© 2019. All rights reserved.