The financial markets often experience bouts of volatility, but market activity since the start of the pandemic in 2020 has been unprecedented. The economic turmoil sparked by the global pandemic created new processing challenges throughout the trading lifecycle, pressuring market participants’ capabilities in the post-trade space. DTCC’s white paper, “Managing Through a Pandemic: The Impact of COVID-19 on Capital Markets Operations,” found that during, and in the immediate aftermath of the COVID-19 pandemic, the industry remained resilient. However, opportunities remain for further optimizing post-trade processes across the capital markets.
Harold Watler, DTCC Managing Director, Enterprise Platform Engineering, and Mark Cucarese, DTCC Executive Director, Enterprise Production Services, met with DTCC Connection to discuss how IT stays on top of spikes in volatility in order to ensure our systems remain stable and operational as the financial markets meet new challenges.
Related: Cyber Risk and Operational Resilience Podcast
DC: What factors have led to the spikes in trading activity over the past two years?
MC: A combination of factors fueled remarkable spikes in trading activity, including an influx of new market investors and the unexpected resurgence of Covid-19, which hampered economic recovery efforts. With millions of new investors and the rapid rebound after the March 2020 selloff, market volumes were at historic levels.
DC: How does DTCC help clients handle market volatility?
MC: Our Enterprise Production Services team provides 24/7 support for the entire DTCC global infrastructure, including all mainframe, distributed platforms and cloud-based systems, with a central focus on production reliability for the financial markets. We’ve created a virtual command center that monitors our production systems, detects any disruptions, and responds to alerts in real-time. We have hundreds of technical support staff around the globe constantly engaging with colleagues and clients to proactively investigate any market anomalies.
Our team builds tools to support systems monitoring and event response. For instance, as part of ongoing IT modernization we created a tool to help us understand Universal Trade Capture performance, with visibility into peak activity or where a trade is in the processing flow. During a period of heightened market volatility we worked closely with one of our clients, and were able to understand their transaction patterns and how to improve their performance by using this tool.
DC: What is Platform Engineering’s role in the process?
HW: Within our support structure, Enterprise Production has the first level response, even the second-level triaging of any issues and then they escalate it to Engineering. We join forces for other production environment issues or problems.
We design, plan and maintain DTCC’s mainframe and private cloud footprint, as well as our network capabilities. Capacity is very important, and we’re always keeping an eye on trends. At the end of 2019 we started to see trading volumes increase. Then Covid-19 hit. It became clear that this wasn’t an anomaly, so we managed with all the horsepower we could muster to keep things running smoothly.
DC: How did Platform Engineering adjust its strategy to account for extreme events?
HW: As 2021 began, we expected the heightened market volume and volatility to continue. We examined the resources we would need to sustain that level and beyond and the cost would have been astronomical. So, we accelerated our 2022 mainframe refresh, which included increasing our processing power by 35%.
We now have infrastructure whose performance gives us a lot more capacity overhead. Even on very busy days we’ve been able to scale up as needed with the capacity we have immediately available. I’ve given up thinking anything will ‘calm down’ anytime soon.
DC: How does auto-scaling ensure we can respond to market volatility?
HW: Auto-scaling is the ability to dynamically add or remove resources as needed. It’s a prepackaged capability that we utilize to handle spikes for some of our products.
From my perspective, if you look at capacity on demand, when you have the additional resources on-premises and readily available, we don’t have to acquire that capability.
Self-healing is where we develop this capability ourselves, and we have jobs and scripts that say, ‘If these conditions are met, add another engine.’
DC: What is being done to enhance production monitoring for the future?
MC:We have a robust monitoring program, and we’re aggressively accelerating automation efforts, building enhanced health checks, dashboards and tools. We’re eliminating manual touch points and building self-healing systems to automatically respond to alerts - before an incident can occur.
This year, we launched the first implementation of our artificial intelligence operations tool for incident prevention. It uses predictive analysis based on historical trends to respond to alerting and predict whether something is going to become an issue. Then the team can take preventive measures to avoid a production outage. Think of it like pilots in a cockpit, always managing the gauges.