Our guest is Manos Schizas — Lead in Regulation and RegTech at Cambridge Center for Alternative Finance at the University of Cambridge. We discuss how regulatory change is accelerating so fast that people alone can’t deal with it and how does the technological solution addressing the problem looks like. Can technology solve this problem at scale? How much innovation are we seeing thanks to machine learning? And we also discuss about the Regulatory Genome Project, a recently launched long-term project that aims to sequence the world’s (financial) regulation, allowing developers and firms to build own applications on top of the platform. Before joining the Cambridge Center for Alternative Finance, Manos also served as a regulator with the UK’s FCA.
It costs something in the order of 4% of turnover for a major financial institution to comply with regulation.
Ben: Manos, thank you very much for coming on the Structural Shifts podcast.
Manos: Thanks for having me on the show, Ben.
[00:01:22.05] Ben: Maybe let’s start by you talking about your background because I think it’s useful for our listeners to know that you’ve seen this interplay of finance, tech, and regulation from many different angles. So, if you don’t mind, Manos, just tell us kind of, you know, how you started off in this world?
Manos: Sure. So, I first got involved with writing and reading about regulation back in 2008. At the time I was a very, very junior lobbyist at an association for accountants — the ACCA. And because I had their access to finance brief, inevitably, around that time, I had to feed into the discussion around Basel III, and the implications for financing of small businesses. But before long, I was talking and writing primarily about FinTech and regulation. At some point, I made the jump over to, I guess what I thought at the time was about the dark side. So, I joined the FCA — the UK regulator — I spent some time there leading their work, at the working level, on things like crowdfunding or their approach to small businesses, surprisingly, political and fraught topics. And then, I moved on to a London-based RegTech startup, where I was their Head of Regulatory Content Operations and also had the product brief for a short period of time. And then, of course, the rest is history. I joined the Cambridge Center for Alternative Finance, where I lead their thought leadership practices, as well as their applied research program on RegTech and machine-readable regulation.
The pace of change and the volume of data has really long outstripped the ability of firms to just throw humans at the problem — human brains and human bodies.
[00:02:53.10] Ben: We’re going to come back to the Regulatory Genome — the project that you’re working on — but before we get there, I think we should zoom out and talk a bit about the whole terrain of regulatory compliance and why it faces so many challenges? So maybe let’s start from the point of view of a regulated financial institution. Why is it so time-consuming and expensive for banks and other financial institutions to comply with regulations?
Manos: Well, alright, let’s start from the top line if you will. It costs something in the order of 4% of turnover for a major financial institution to comply with regulation. Again, that’s turnover. That’s not, you know, breaking margins, that’s not profit. It’s colossal amounts of money on a global scale. And why does it cost so much? Well, I guess, there hasn’t been a time in very recent memory when financial services weren’t heavily regulated. But since the financial crisis, in particular, there’s been an explosion in regulation, that has seen the amount of regulatory notifications rise, I think about seven or eightfold between 2008 and 2018. So, I guess the key point is, the cost is driven primarily by how demanding the regulatory framework is and the pace of change. Now, it’s not the same for every part of the regulated sector. So, a tier-one bank will probably recognize the pace of change as I describe it, whereas let’s say, you know, a smaller asset manager might not, but by and large, there’s been an explosion in regulatory requirements. At the same time, there’s also been an explosion in the sheer amount of data that firms hold, not just the ones that they have to hold for regulatory purposes, but the ones they hold for commercial purposes. You know, only recently — I think it was HSBC — one of the major banks was creating a data lake that was in size exactly the same size as the entire internet had been four years earlier. It gives you a sense of perspective of what we’re talking about. The pace of change and the volume of data has really long outstripped the ability of firms to just throw humans at the problem — human brains and human bodies.
Manos: There’s also other elements related to the way you manage institutions like that. So, you know, many of these major firms are matrix organizations where it’s actually, in the time of change, quite easy to lose visibility as a senior manager of why you’re complying the way you’re complying, what exactly the outcomes you’re achieving are, and so on and so forth. And at the same time, regulators are hardening their stance on the personal responsibility of senior managers. You know, you’ve got senior managers regimes in the UK, in Singapore, in Australia, in Hong Kong, and in an increasing number of jurisdictions. So you’re in this kind of the opposite of a sweet spot, if you will, or the sweet spot for vendors, where the key decision-makers are facing increasing scrutiny on a personal level, and at the same time, are losing visibility. So if you’re a vendor, this is a good time to come in and try to sell them technology.
[00:06:12.08] Ben: What about if we look at it from the point of view of regulators because it sounds a bit like, you know, listening to you, the regulators are really driving the agenda here — which I guess is true to an extent — but the regulator doesn’t control the pace of technology change, which is driving innovation; and the regulator also only can really affect its jurisdiction. And I think one of the things that’s become more apparent over recent years is there’s a lot of competition between jurisdictions to attract new financial institutions and also new FinTech companies. And so, does the regulator also see the need to do things differently in this space?
Manos: Sure, I guess there’s two types of regulations depending on where they come from. So, there are rules that are fundamentally quite harmonized across the globe. AML, for example, prudential requirements — at least in banking and insurance. And for those, the rules come down from Mount Olympus, from the G20. They cascade through the standard-setting bodies and then finally into national regulators. Now, if you are a regulator working in that kind of subject matter area, then your key concern is, am I fundamentally compliant with international standards? And have I found the most efficient way to comply with them? AML is the usual example here because if you’re not compliant, that’s a big problem. The whole country can get graylisted or blacklisted, and you just don’t want to be there as a regulator. But you know, even when the stakes aren’t that high, regulators want to know that they are compliant with international standards. Then there are other areas of regulation which are closer to the matter of technological change that you mentioned earlier, where good practices are bubbling up from the bottom up. So areas like, I don’t know, cybersecurity, data protection — you know, there is no single unifying force or no single cascade of standards from the top. But everyone wants to know how they compare to the jurisdictions that they see as competitors. So, if you’re in Malaysia, you’re the Securities Commission, you will look at what MAS is doing in Singapore. If you are in the UK, you’ll be looking at what the Europeans are doing post-Brexit. Pre-Brexit, obviously, you just have to comply. So this process of regulatory benchmarking is actually one of the factors driving regulatory change internationally. When at the CCF, we surveyed regulators from 111 jurisdictions around the world. They told us that nearly every exercise of review of regulation in relation to FinTech had involved some benchmarking exercise. And, in more than half of these circumstances, it was the benchmarking exercise that had prompted regulators to change how they do things.
if anything, regulators are under more pressure. So when we say something like, you know, the pace of regulatory change has increased sevenfold since the financial crisis — well, you know, firms’ compliance budgets have not increased sevenfold. But regulators’ budgets have not increased at all, not in real terms anyway.
[00:09:11.06] Ben: What about COVID? Has that had much of an impact on the pace of regulatory change?
Manos: Well, that’s what our research tells us. So, we have just come out of a significant project to basically carry out a rapid impact assessment of COVID on the FinTech and RegTech industries, as well as the regulators responsible for them. And obviously, what you hear from regulators is that COVID fundamentally changed the way they approach some areas of their work — not just their rulemaking, but also their hands-on supervision. But I guess what regulators tend to see here is some megatrends that have accelerated — so trends towards you know, more or less material financial services, more online banking, more app-based financial services and so on and so forth, but also greater demand on their resources, so that they can do more with fewer touchpoints with industry. And then, of course, COVID also came with some of its own, if you will, pathologies. So, regulators told us, for instance, that they were much more aware and worried about fraud in a COVID environment where a lot of things have had to be put on the cloud or have had to be done remotely at relatively short notice, or where firms have had to deal with stuff that previously were very closely held in-house on a remote basis. So, of course, the focus of regulators has had to change.
[00:10:48.17] Ben: So, Manos, if we were to try to summarize what you’ve told me, you’re saying that the pace of regulatory change is accelerating to the point where financial institutions can no longer just throw, you know, human resources at this problem because it’s an exponentially changing situation so it requires a technology solution to it. But would you also argue that the regulators need to be putting more technology at play here? Because presumably, they also want to know how regulations are changing and being implemented, and they want to make use of the data to make sure that they’ll keep up with the potential rates of innovation, put that to good use in terms of financial inclusion and everything else. So would you say that the need for new technology applies to both the regulated and the regulators?
Manos: Yeah. I mean, if anything, regulators are under more pressure. So when we say something like, you know, the pace of regulatory change has increased sevenfold since the financial crisis — well, you know, firms’ compliance budgets have not increased sevenfold. But regulators’ budgets have not increased at all, not in real terms anyway. And so, regulators find themselves in these very interesting challenges wherever there’s this use of data involved. Like, to give you a simple example, the first touchpoint with technology around regulation and compliance for most regulators is reporting. And if you talk to an emerging market regulator — not the poorest countries in the world, necessarily; just, you know, significant emerging markets — they will say, “You know, firms report data to us and by the time we’ve validated the data and made sure it’s not garbage, it’s three months old.” Now, let’s go back to that COVID discussion we just had. If you had three-month-old data on the robustness, the financial stability of firms, as a regulator, it would be useless. It’s a snapshot from a completely different world. So you can see how COVID can really create an issue for regulators there and waken some of them to the challenges. But even if you think of more normal times, you know, the FinTech revolution has created a very big fringe of very small, very marginal firms that fly sometimes under the radar of regulators, and sometimes just above. And so, for instance, when the FCA took over payments, for instance, the population of firms that they were supposed to supervise more than doubled overnight. Now, their resources did not increase at all. So, what exactly do you do when faced with a situation like that? You have to find some way of prioritizing your human resources. And the only way, really, to get to a point where you can do that is to invest in technology that allows you to prioritize better by getting insights more cheaply, more efficiently, where the risks are proportionately smaller.
in the AML space, every year there’s a new estimate of what percentage of the illegal flows of funds are actually intercepted by AML controls. And it’s usually always in the low single digits. So, you know, you have to keep wondering, like, is this really the best we can do?
[00:13:49.05] Ben: That’s happening, is that not? So, we are getting thousands of new entrants into this space, new technology companies, new RegTech companies are entering this space to solve these challenges that regulated companies have, and regulators have. I was reading before this podcast that I think collectively, over $10 billion of new venture capital has gone into this space in the last 10 years. So, are we solving this problem at scale?
Manos: Well, it’s interesting. I mean, obviously, throwing more firms at the problem doesn’t necessarily solve anything. It is a good indicator of how valuable the prize is, I guess, for whoever wins the race. Just to be clear, just the number of RegTechs really depends on how you define this sector. So, you know, you will hear estimates from 800 all the way to the 2000 number that you quoted, but the amount raised is almost always estimated the same way because most of the fundraising is concentrated in a handful of large firms. So, this is one of the first things I think we need to keep in mind in the context of this discussion. You will hear about RegTech growing very fast as a sector, and all of the success stories, but the typical firm in the RegTech sector — we did our own research on this — has raised somewhere in the order of $1.5 million. Now, it sounds like a lot of money if you give it to me to buy a car or a house even. But how much runway does it buy a technology company? Like, less than a year. And to put it into further context, how long does it take from the moment, let’s say someone at the bank shakes your hand and says — well, they can’t shake your hand anymore, but you know, looks you in the eye virtually, and says “I love your product, we will definitely buy it” and the moment when you first see any money from them? Usually about 18 months. So, you have to put these two numbers together, like, how much runway do they have versus how long it takes for them to actually convert prospects to paying customers. So, most of this sector isn’t particularly successful financially. And so, the sector is kind of ripe for consolidation. Quite a few of these people are competing in very, very crowded segments. Also, of course, in our own research, what we’ve seen is that there was a golden era of new market entry between let’s say, 2013 and 2017. And the pace of market entry has slowed since then, quite significantly. So, this sector is now growing more from the center than from the margins — so, big firms getting bigger, as opposed to new firms joining.
I’m skeptical about the pace at which we can move towards machine-readable and machine-executable regulation, where we treat regulation as code.
Manos: Now, to your question, though, the actual question was, you know, are they solving this problem? I think the first thing to bear in mind is that the sector has been around for like 20, 30 years, depending on how you define it. So, you know, you had regulatory intelligence applications 20 years ago, you had BPM and GIC applications 20 years ago; they’ve evolved since then, yes, but the fundamental kind of offerings were already being imagined at the time. What firms are now much better able to do, I would say, is, first of all, they can scale a lot faster and deal with smaller institutions because their services can be delivered through the cloud and by APIs. It’s much easier for them to work together, so, hooking up different applications via APIs is now much more realistic than it used to be. And so, what that means is that ideally — and we’ll have to come back to this point — you know, no one firm has to build everything, end to end your entire kind of compliance factory. So, that obviously helps. But there are areas where RegTech has yet to make a significant impact. If you try to map where most of the effort has gone — AML, reporting, risk particularly on the prudential side — between those three areas you’ve probably captured 80–90% of the activity that we’ve seen; probably a lot more if you count it by funds raised. And then there are other areas, notably on conduct, for instance, that are kind of less tangible and quantitative areas of compliance, where, you know, you don’t see the same level of success. And, of course, even where the RegTech sector is making inroads — good on them — you still have to ask yourself, how much success do we have to show for it? So, in the AML space, every year there’s a new estimate of what percentage of the illegal flows of funds are actually intercepted by AML controls. And it’s usually always in the low single digits. So, you know, you have to keep wondering, like, is this really the best we can do?
[00:19:02.21] Ben: And listening to you, it sounds a bit like, you know, even though lots of money has gone into this space, and accepting that, you know, most of it has flown to a few big firms, rather than the long tail of smaller suppliers, it sounds like there’s still a lot of duplication of activities in this space, and also potentially, like, there’s not complete coverage of the regulatory space, i.e. people keep shooting, I guess, for the areas with the largest addressable market. So, would you say that they’re two of the challenges that still persist, that the RegTech community is still duplicating a lot of its own efforts, as well as, you know, perhaps don’t have complete coverage yet of all the areas of regulatory compliance?
Manos: Absolutely. And I’m not sure that any one firm has a particularly good overview of its entire competitive environment, just because so many people are trying this and many of them are still under the radar unless they’ve done two or three funding rounds and you start seeing kind of headlines about them. But I think it’s also important to say that compliance, in general, involves a colossal duplication of effort. If you think about it, the regulations are the regulations. They are what they are. But there’s thousands of financial services firms, each developing their own mapping of rules, you know, against their own internal systems. And you think, “Well, how much of that is duplicating effort? And is there really a business reason to duplicate this for each firm to do it on its own?” Because compliance in itself does not confer a competitive advantage. Being able to manage risk better does. Being able to understand customers better does, of course, so there are some things that firms will always want to keep close to their chest. But compliance in itself does not. So the duplication is quite substantial and not very rational.
[00:20:54.08] Ben: In terms of technology change, you mentioned cloud, you mentioned APIs? What about AI? Because it seems to me that one big area of potential improvement here is to train models… You know, you can imagine this particularly in the case of financial crime, for example, where, you know, many actors contribute information about financial crime and one provider can train the best models and can give the best predictive analysis about where financial crime might arrive, or stop financial growth based on patterns seen in the past. So, are we seeing much innovation and headway being made thanks to AI in this space?
Manos: We are. And I guess we’d better because the amount of processing power we can leverage these days is colossal. So, you know, in the first AI spring, in the ’50s and ’60s — I’m not reminiscing, I wasn’t there — back then it would take about seven minutes for a computer to parse one sentence or one paragraph worth of text. And now we can do, like, billions of them in the same amount of time. You know, obviously, that helps. Having said this, applications of AI mostly end up with a trade-off. So, think of it a little bit like an industrial process, where, because at the end of the day, most of the applications of AI that you’ll see in compliance come down to statistical models. You’ve got error rates, you’ve got false positives, you’ve got false negatives. And the whole kind of quality assurance process is around saying, “Well, how many false positives and false negatives can we tolerate?” And particularly, like, “How many false negatives can we tolerate?” Because that’s where you get fined or put in jail. And so, usually, what happens is firms, certainly in compliance, are very, very reluctant to accept that there will be a consistent level of errors in a compliance process, particularly around things like AML. And so, you know, many will seek a level of certainty that is just not possible. Some of them will tolerate redundancies and duplication, just to make sure that they are covered. And particularly in the larger firms, often you will have a duplication internally. If you’re a tier-one bank, there is actually a decent chance that you’ve licensed software that duplicates things you’ve built in-house, that you have licensed software from two different people that overlap. So, the strategy around incorporating AI in this area is still not fully fleshed out.
to get from the messy regulatory language to something that humans can work with, you have to have some kind of mental map of what regulations are out there, a kind of taxonomy of regulatory obligations and concepts. That’s one side. And you have to have a corresponding mental map of what the firm looks like — what matters to the firm. So a firm doesn’t see itself as a collection of compliance obligations. It sees itself as a collection of products and functions and locations, and yes, even processes and controls and policies, and so on, and so forth. So, you have to have both of those maps, and then get them to talk to each other — so create linkages between the two sides of the equation.
[00:23:41.02] Ben: What about this whole area of machine-executable regulation? So, you know, certainly, I’ve been reading about a lot of companies that are working on, you know, basically turning regulation into code, which can then be executed by the machine. And this seems, you know, at least prima facie, like, this is the most elegant solution to this problem, right? Because if regulators can put out very precise regulations, and they can be turned into code, not only can that code then be executed immediately, but it will be executed exactly as the regulator intended to be executed. So that seems like the holy grail here, would you agree? And do you believe that this is realistic and that we’re making progress in this direction?
Manos: I mean, it is the holy grail. And it’s interesting because it’s one area where software developers and lawyers kind of lead in the middle. Both sides think like machines. They want very precise and consistently worded inputs and outputs. But in reality, most regulation doesn’t work that way. So, the hype around machine-readable, machine-executable regulation is what it is because some of the earliest use cases for RegTech and SubTech are around reporting. And reporting use cases involve heavily standardized data — I say heavily standardized, but if you see them upfront in their raw form, they’re not always that good but they involve much more standardized and much more quantitative data, more structured data as well than most other RegTech use cases. So, if you’re only really interested in reporting and adjacent use cases, actually machine-readable and machine-executable regulation will happen. You know, it’s already happening in some domains, and it will happen in most others. Enormous amounts of money, enormous amounts of attention, and standard setting effort has gone into those. But then there is a lot of regulation where this level of standardization, of quantification and of structure just doesn’t exist, partly because that’s not how it’s been designed and it’s very expensive to redesign it from scratch, but partly because regulators want it that way, or legislators want it that way.
Manos: So, to give you an example that’s close to my experience: let’s say consumer credit regulations in the UK do not include any indication of what criteria somebody should meet in order to get a loan. Not because they couldn’t come up with, you know, a good sense of what credit worthiness looks like, but because legislators and regulators want firms to have the flexibility to come up with their own answer to the question. In other cases, the point isn’t flexibility, but responsibility. So, very often, what the regulator wants is for the onus to be firmly on the firm to find a way to reassure the regulator that the outcomes are as the regulator expects. And so, you can imagine a situation at the limit of this road towards machine-readable, machine-executable regulation where the regulator just releases their code and they say, “Okay, plug this in, connect it to your data lakes, and out will come compliant outcomes.” If something goes wrong, who’s to blame? The only person left to blame now is the regulator. That’s not a very comfortable place to be, certainly not if you’re an independent regulator. Like, if you become a sandwich between industry and government, that’s the sort of thing that would end up with the regulator being crushed. So, there will be a natural resistance in some areas of regulation against this level of mechanization. But even in reporting where this is supposed to work well, you know, if you hear the noises coming out of some of the kind of leading regulators in the world — not least the FCA here in the UK — what you will hear is that there’s enormous amounts of data standardization that needs to be done before the promise of even that use case — which is the most promising RegTech use case of all — can be fulfilled. So I’m skeptical about the pace at which we can move towards machine-readable and machine-executable regulation, where we treat regulation as code.
Treating regulation-as-content where we say the regulatory language is what it is and the job of RegTech isn’t really to turn it into push-button executable code, but rather to turn it into workflows and business rules.
Manos: Now the opposite, which does work, but is more human in the way that it does work, is treating regulation as content where we say the regulatory language is what it is and the job of RegTech isn’t really to turn it into push-button executable code, but rather to turn it into workflows and business rules. And so, the idea is that to get from the messy regulatory language to something that humans can work with, you have to have some kind of mental map of what regulations are out there, a kind of taxonomy of regulatory obligations and concepts. That’s one side. And you have to have a corresponding mental map of what the firm looks like — what matters to the firm. So a firm doesn’t see itself as a collection of compliance obligations. It sees itself as a collection of products and functions and locations, and yes, even processes and controls and policies, and so on, and so forth. So, you have to have both of those maps, and then get them to talk to each other — so create linkages between the two sides of the equation. If you’ve done that, then effectively you can get either one application or multiple applications talking to each other by APIs to do this interesting kind of relay of regulatory content. So regulatory content comes in, it gets labeled according to where it has to go, what it’s related to, and then it’s passed on to the appropriate application, to the appropriate subject matter owner with an instruction that implies what kind of workflow is expected afterward. So, that’s messier, it’s more human, but for the same reasons, it’s bulletproof. Eventually, someone will make sure that the system works. Whereas end-to-end machine-readable and machine-executable regulation will usually break down.
[00:30:19.28] Ben: You know, if we think about the idea of machine-executable regulation as being… You know, if we were to be on the Gartner Hype Cycle, it would probably say machine-executable regulation in brackets for reporting, right? And then it would be somewhere quite early in the hype cycle, because, you know, this is probably being hyped, and we’re going to go to the trough of disillusionment. Where are we with the alternative approach, which is, you know, using, I guess, AI and classifiers, and so on, to be able to classify regulatory text at scale, and to serve it up, as you said, into workflows. So this seems like the more promising approach and where are we in the hype cycle with that kind of bridge?
Manos: Just before we move on from machine-executable regulation, I think the key moments in the hype cycle for that, you know, probably, the key moments would have been the FCA and bank of England’s digital regulatory reporting pilot. So that was definitely a hype point in the hype cycle. And if you’ve read all of their lessons-learned reports, you actually feel yourself sliding down the hype cycle. It’s hard to read those and think, “Oh, this was this was a slam dunk.” But then you look at things like, you know, ISDA’s Common Domain Model that basically gives you a way of making both machine-readable and machine-executable a lot of the contract terms around derivatives. And you think, “Well, that’s quiet there. But actually, that seems to be working reasonably well.” And the whole kind of cause of machine-readable and executable regulation has been given a new lease of life with the Saudi-led G20 sandbox, which really is focused on these types of applications. So, you know, I think we’ve still got some time of hype left in the machine-executable side of things.
Manos: But as you said, I think there’s a lot more to be said for regulation as content and the other side or the less ambitious kind of side of RegTech. And there, I guess, the level of maturity is very good. So, when we looked at the market last — you can probably name something in the order of 25 to 30 platforms or tools that are in the regulatory intelligence space, that are really making significant headway in organizing regulation, according to their themes and topics and using things like natural language processing and machine learning to automate that so that they can read rule books at scale. Now, where you want to go eventually is that there’s one kind of virtual front end to every rule book in the world. We’re not there yet. But equally, I think, as long as you’re thinking of private standards only, we’re not that far either. I mean, there’s very significant work done and you can already name three or four firms that are way out ahead of anyone else — I won’t name them here. Now, what you don’t have, though, is some way of reconciling all these proprietary standards into one language of regulation. And that’s quite hard for someone on the purchasing side because what it means is, if you’ve done a lot of work to onboard one of these suppliers and mapped all of your internal systems and controls and processes to their dictionaries and their map of compliance, what then happens if you want to change the supplier? You know, or what has to happen if you want to onboard some other compliance application that needs to talk to that first one, but just doesn’t know the language? That’s the bit that we don’t yet have a very good answer for and there’s no clear kind of commercial incentive for firms to create that.
[00:34:18.08] Ben: Which is the segway into the Regulatory Genome Project, because that is at least partly a public good, right? And it’s aimed at solving exactly this problem of creating common standards and interoperability, right? At the level below commercial applications.
Manos: That’s correct. So let’s start with a little bit of background on the Regulatory Genome Project. So, at the CCF, we were approached in 2017 by what is now Flourish Ventures and was then part of the Omidyar Network with a very specific use case. So these guys were impact investors, they invested in FinTechs mostly in emerging and frontier markets, that were kind of mission-driven to improve financial inclusion. And what they said was, “Look, our portfolio is doing quite well. But one of the things that usually get in the way of growth and manifests itself in the kind of growth plateau at a time that is not really helpful for our firms is that if you want to grow beyond a certain point, then you have to expand at least on a regional basis.” So let’s say you start off in Kenya and you want to cover all of East Africa. Very reasonable. So, when the firms reach that stage in their development, it’s actually quite hard for them to grow because different markets, even within the same region, even if there’s a certain level of integration, have different rules. And so, a lot of time and money, and lawyers fees have to go into making sure that you get market entry just right from a compliance basis. And there’s no obligation for regulators to be consistent with each other or to make life easy for you.
Manos: So, they came to us with that question, saying, “You know, you have access to resources at the university, you know, cutting-edge research on NLP, you know, machine learning engineers — isn’t there something that you could build, that would pass regulation across jurisdictions and make it comparable?” And we thought at the time, well, look, this is a nice applied research program. Of course, we would be interested in looking into this. But what we found as we went along and created a pilot application and tested it, and saw they worked reasonably well, we thought, well, we’ve only covered one domain in this area. We came up with an AML model. We’ve only covered one domain and anyone we tried to take this to as a potential user would say, “Well, what about this other area of application?” So they might say, “Okay, AML good. What about cyber? Or payments, great. But what about insurance?” And it seemed to us that we were going down this rabbit hole of mapping out all the regulations in the world in order to create this one product.
Manos: Obviously, there was also a kind of existential question — you know, the university isn’t really a RegTech vendor, we didn’t want to be permanently in the business of building applications. And it’s a busy space out there, right? Other people have done this longer, and they know this better. So, we thought, what is it that we feel is really needed? Is there a public good that our research can produce? Now, that is consistent with the mission of the university. And so, we thought of an analogy to, I guess, the life sciences. And, at the time, because we were dealing with people who had been involved in the Human Genome Project, it kind of triggered this thinking of, is what we’re trying to build really kind of parallel to the Human Genome Project? And is this pilot application we built, something analogous to an application like 23andMe? And then, from that kind of thinking became the genesis of what we now call the Regulatory Genome Project.
even if you’ve already gone quite a way and had a lot of success in implementing RegTech within the organization, the appeal of interoperable applications and open standards, I think, should be quite significant.
Manos: So, we basically thought we need to find a way to fund and resource and guide a long-term project that maps all regulation. And then, to make sure that it’s available to people truly as a public good, we have to not only make the marked-up rules, I should say — the classified rules — as open data or as near as open as we can make it, but also, we need to find a way to release some of the pent-up innovation out there, by allowing developers and firms to work on this map of regulation, this global map of regulation, and build their own applications. And that way, we don’t have to be, you know, the guys who build everything. We can tap into the creativity and technical skills out there.
Manos: I think what’s really important also, just to bear in mind is the skill sets on the two ends of this journey are just very different. So, building a map of regulation requires a certain amount of technical expertise in the areas of regulation, it requires very strong ties with regulators — which the university has. Whereas, building applications on what we call ‘the right-hand side’ of this journey requires very different skills and a deeper understanding of how the institutions work internally as organizations. So, what does it mean to keep the machine kind of running? And so, to expect somebody to cover all of that is actually quite hard. That means that most people who have innovative ideas in RegTech, either coming from one end or the other end, can’t really deliver the whole thing. So, I guess this is a long way of saying that the key principles behind the Genome Project are, first of all, regulations should be available in machine-readable form as a public good. This is stuff that firms are required to know, by law. They’re made with public money. There is no reason for it to not be open data in a machine-readable format. That’s principle number one. Principle number two is, all of this information must be available to developers in such a way that people can build applications around it. And finally — and this is a key point — both the representation of regulation and the resulting application need to be interoperable. You need to have one common language of regulation. It’s true, different jurisdictions regulate in different ways so, you’ll never get to the point where you say, “Well, this requirement in Brazil is exactly equivalent to that requirement in Mongolia.” But what you do have in the middle is a kind of regulatory Rosetta Stone that can map regulations from any given country against a common framework. Think about, I don’t know, the Dewey Decimal System, right? If you go into a library and you’re a librarian from anywhere in the world, of course, the books are going to be different, but you know that nonfiction is going to be there and you know that life sciences are going to be there. So, that’s the level of interoperability we want to get to.
[00:41:27.18] Ben: And how do you get there? How do you sequence the genome of regulatory information?
Manos: So, let’s get as practical as we can. So, it starts with a paper exercise — I mean Excel exercise — whereby you create almost a hierarchical list of regulatory concepts and obligations. You usually do it by domain. So, you might say, “Here’s my taxonomy of AML concepts and obligations, here’s my taxonomy of cybersecurity, and so on and so forth.” And you know, some of these taxonomies are what you might call horizontal — they cut across the entire financial services industry, so the two examples I gave just now — some of these are vertical. So you might have payments, for instance, insurance, crowdfunding, which was one of the areas of the Center’s particular attention and expertise. And what you do is you create these hierarchical lists of obligations. So for instance, you might say, I don’t know, let’s say you’re dealing with investments, right? You might have client categorization and within that, the definition of an accredited or professional counterparty. You know, perhaps not the best example, but the point is that you always move from a higher level, more general obligations or families of obligations, to more specific ones. Now, at the end of each of these branches, if you will, you will have an end node. You will have the most detailed level of classification of regulations that the genome can manage.
Manos: Now, in theory, there is no limit. You can keep making them more specific, and more specific, and more specific. But remember, the genome as a public good is about making regulations comparable across jurisdictions. So there is a natural stopping rule. You want to stop at the point where the regulatory requirements at the end node are still comparable internationally. So, for instance, client categorization, yes, that’s comparable. You know, the distinction between professional slash accredited investors and more ordinary retail investors, yes, that’s comparable. But if you go all the way to saying, you know, ‘treatment of local authorities for the purposes of client categorization’, you are getting now so fine to the weeds that you’re going to draw blanks for most jurisdictions. And then for everyone who’s subject to MiFID, you will just have this note that says, actually, in most cases, these people are retail clients. So you can guess what the stopping rule is. You go as many levels down as you can until you reach a point where international comparability is compromised. So, that’s how you build that.
Manos: Up to this point, you’re still kind of in the paper world. You can still be doing that in Excel. But then, once you’re happy with the structure you have created, then you can start using machine learning. And machine learning relies basically on collecting large amounts of data from a diverse sample and teaching the machine that a specific example corresponds to a specific node. So, for instance, let’s say you have rules around credit worthiness assessments of consumer borrowers in different jurisdictions. You basically say to the machine, “This is a credit worthiness assessment-related obligation. This is as well. This is as well. This isn’t.” You repeat that over and over and over again until you can train basically a statistical model — which lives as code and we call ‘a classifier’ — so that model can now take in unfamiliar text, and take a stab at what category it fits into. So the next time around you feed regulatory text that you’ve never seen before to the same classifier, and it can say what the probability is that it is about credit worthiness, and you set yourself a cutoff and you say, “Well, if it’s above, let’s say, 70%, 80%, we’ll mark that as a one.” And so, what that does is, if you try to imagine now the machine-readable version of the same regulatory document, that paragraph or that piece of text now carries a tag, an electronic tag that says, “This corresponds to this type of obligation.” And any other application that knows the universe of tags that you’re working with — your taxonomy — can now read this and say, “Oh, okay. I know that this paragraph now is about this.” And that’s how you might be able, for instance, to run queries via an API; you might say, “Can you bring me all the text that’s tagged as credit worthiness assessment?”
[00:46:13.17] Ben: How difficult is the tech there? It sounds almost like, you know, provided you train the classifiers with enough data, then the results will get better and better and better. So, would you say it’s more of a challenge to get the data than it is to get the tech, or am I oversimplifying?
Manos: It’s a good question. I mean, I don’t want to downplay how difficult it is to get the tech. Like, the colleagues who we have working on this are obviously at the top of their game. Having said that, the technology comes with its own significant challenges. What do I mean by that? You know, there isn’t an enormous amount of regulatory tax out there. Now, this may sound really funny bearing in mind what I said earlier.
Ben: Yeah, the sevenfold increase you mentioned earlier. Yeah.
Manos: That’s true. But, you know, from a machine learning point of view, if you look at what kind of corpora people are working with to train machine learning models, they will usually use, you know, all of Twitter for the last three years, or, you know, the entire text of Wikipedia, or the entire internet if it comes to that. So, you know, in comparison to things like that, the amount of regulatory text out there is not enormous. And so, a lot of the challenge is around making sure you have enough samples to actually build good models. The other thing I guess, which people need to appreciate is that the returns to just having more samples start to diminish reasonably early. So, you know, the models don’t get exponentially better as you double or triple the amount of data you have access to.
Manos: Where this becomes really challenging, is, first of all, when you look at really new or niche areas. So, let’s say tomorrow, you know, one of our regulators came up with a very, very specific type of obligation in relation to making, let’s say, AI auditable. So it says, “If you implement any AI applications as a firm, you have to make sure that they are auditable by a regulator — whatever that means. You know, in the early days, only one regulator will have any references to that. So your sample is going to be tiny, right? That is a problem because it means your model runs the risk of having blind spots and you have to find ways of bootstrapping the small sample that you do have, in order to make sure that the classifiers work. I’m not saying that’s not possible, and obviously, my colleagues are working on things like that, but it is challenging. And it’s also challenging when you look at non-English techs because if you create a classifier for AML obligations written in English, that’s going to be completely useless if you’re reading documents in Spanish. But the problem is, if you want to replicate that process in Spanish, your corpus of documents now becomes a lot smaller. And Spanish is, you know, a major global language. Try doing that in Japanese, try doing that in less widely-used languages, that are not the language of business for many people. That is another major issue in that area. But I guess the final issue will always be with these things — and I’ve already mentioned it once already — is that, at the end of the day, there will be errors. And there’s a question of, you know, how much liability should the parties accept for these errors, and who does it sit with?
[00:49:45.07] Ben: If we move beyond the tech and the data — although I think this is a bit related to the data — to this idea of the chicken and egg problem because it’s not difficult to foresee a time when the genome exists and therefore if you’re a RegTech provider, you would build any new RegTech application on the genome because you then don’t need to do all of the mapping of taxonomies yourself. You can just query the public good, right? But between now and then, you’ve basically got to convince software providers to build on the genome, you’ve got to convince regulators to work with you, you’ve got to convince commercial users to use it. So, how do you go about building that ecosystem around the genome to make it successful in the first place? Or, in other words, how do you solve that chicken and egg problem?
Manos: So it’s a fair question. I mean, there is a place you can start, obviously, and it depends on where your relative strengths are. So if you look at other initiatives that have tried to kind of force some level of convergence within industry, they would usually have some strength in one area or the other. Now, if you’re talking about the university’s areas of expertise, obviously, because of our work in capacity building with financial regulators, that for us is the obvious place to start. So we’ve got very strong links to financial regulators around the world and we also know that they have a very strong use case around regulatory benchmarking. So, remember what we said earlier in this podcast that regulators are always checking their homework against the guy who sits next to them. And so, these benchmarking exercises are big painstaking things — expensive, very slow. I remember one regulator saying, “You know, if I had a tool that could do this, I would have nine months of my life back on just the last project.” Which was quite intense but I sympathize with that.
Manos: So the first people to reach out to are regulators. But regulators being involved gives confidence to financial services firms. And not just confidence in the quality of the taxonomies and the classifiers because frankly, regulators will never pull out a big rubber stamp and saying, “I approve of this.” But what a firm can see is that if this is good enough for the regulator to use for their own use cases, then, you know, maybe this is good enough for us as well. I think — you know, as far as industry is concerned — this standard-setting process is also an opportunity to influence in the direction of the common good, in the sense that, of course, you know, no regulator is going to go to a consortium of firms and say, how should I write my AML rules? But giving them the tools to compare against their peers, will usually give you, as a result, better regulation, because people will now have an evidence base on which to say, what is common practice? What is good practice? How do different things correlate with market outcomes or consumer outcomes? So, from an industry perspective, even though you can’t just lobby these people in a crude way, they have been given tools whereby, internally, they can come up with better outcomes for things that you care about. So that’s another reason why industry really, you know, ought to care about creating something like this.
Manos: And then, once you’ve got a few major banks, a few major fund managers, a few major insurers on board, as well as a developer platform through which you can access these assets, then, as a developer, it becomes quite reassuring to know that you can build on this standard because you’ve got the sense that whatever else happens, there are some people who are already on board, and will use applications or will build applications against that standard. So, your investment, your one-off investment in mapping all of your internal systems to this common denominator set will not be wasted. And, as a developer, that can be quite attractive, because the alternative is that every time you onboard a new major client, you have to do all sorts of ad-hoc fixes, so that your systems talk to theirs, which is, you know, expensive work that you’re not always going to get paid for because the client, as far as they’re concerned, it pays for the actual result not for the path you have to walk in order to make sure you can service them.
[00:54:16.15] Ben: So you’ve just launched the Genome Project, and you just started to try to recruit new members, new consortium members — the private sector, the regulated users of the genome. First of all, how is that going? And secondly, if I were a large financial institution, and I had, you know, significant resources to invest in RegTech, and as you say, already had many, many existing RegTech applications and suppliers, what would be the case you would make to join the consortium?
Manos: It is true. We have been in conversation with a number of major financial institutions starting with some of the larger ones, as you might imagine, for obvious reasons, which are now starting to yield results in the form of potential collaborations. Now, that activity is not going to end anytime soon, because, at the end of the day, you want as much of the industry onboard the consortium as possible. But once the first step of recruiting firms is significantly under way, then the work begins to build out the rest of the genome, and also to recruit developers and make sure that you raise awareness of the benefits of your platform and to build the kind of tools that will help developers build applications against the genome. So, there’s a significant kind of technology roadmap, there’s a significant business development roadmap, as well as, of course, the semantic roadmap whereby we’re actually creating the genome itself. So this is just the beginning. But we’re already seeing some of the first successes. Similarly, on the regulatory engagement side. So, you know, we’ve had our first few workshops with individuals from the regulatory community who are willing to dedicate their time to review and make suggestions to improve the various taxonomies. And so, you know, I’m quite confident that if we’re speaking again this time, next year, a significant percentage of financial regulation will have been mapped — and come 2022 we’ll be in a position where people can actually start building applications.
[00:56:31.28] Ben: If I’m a bank and I want to make this case internally — because I presume there’s a price point to join the consortium — how would you convince me, practically, that it makes sense?
Manos: Yeah. I guess it’s always a very different conversation when you’re dealing with a major financial institution that actually has done a fair amount of work in the RegTech space — and pretty much all of them do. If you speak to tier one bank, they have been bombarded with proposals from RegTechs, and even from potential consortiums as well. And so, I guess the way people will usually respond this — you know, why do I really need this sort of thing? I’ve already got fairly mature solutions in-house that I’m reasonably happy with. So where is the real kind of long-term strategic value?” And I guess there’s three layers to this. The first one has to do with how procurement works effectively. It’s great that you’ve got the supplier that you’re happy with. That’s amazing. However, what it also does is it locks you in because you’ve invested a significant amount adjusting your internal systems to fit with theirs, and particularly adjusting at the semantic level — so, making sure that all of your other applications speak the same language as the vendor and can map to the same taxonomies. Now, that’s usually a significant sunk cost. And so, a firm that wants to move away from a supplier relationship doesn’t actually have a lot of good options, because they’ll have to take on the cost of doing this all over again if they onboard somebody new. And it’s very unlikely that they’ll be able to get a startup, for instance, to do that work because the startup just doesn’t have the cash and the runway with which to do it. So you end up in a situation where you’ve got a significant supplier lock-in. And it shouldn’t really be the way that a major financial institution runs compliance technology. So, that’s one part of the answer.
Manos: The other part of the answer is that usually, even when you do have really good applications, they tend to be limited in scope. So they will either be limited to a few domains that they were originally built on. So let’s say, you know, anywhere in Europe or anywhere in firms that deal with Europe in any way, people will have built ad-hoc systems to deal with MiFID compliance, for instance. You can’t then repurpose that to deal with some new type of securities law that comes in 10 years down the line. If you’re lucky, maybe you have architected that way but most people will not have. So the benefit is that dealing with a kind of de facto standard, like the genome, as and when it becomes available, builds some longevity into the applications that you do build. And obviously, it’s not just scalability across domains. It’s also, are you able to serve jurisdictions that are not in the magic circle of jurisdictions that suppliers usually target? So if you think about what most applications can deal with, they can deal with EU, UK, US and Canada, Australia, Hong Kong, Singapore — that’s your magic circle. Beyond that, you know, here be dragons in many cases. So being able to have that same level of scalability and functionality beyond those core jurisdictions is a huge benefit.
Manos: And then finally — and I think this is the more where interoperability really comes into its own — is when you deal with suppliers or partners to whom you have the cascade regulatory obligations, or with which you are tied together in a compliance pipeline. So I’m thinking of things like, for instance, product governance, where the producer of a financial product and the distributor of a financial product are tied together in a set of obligations around, for instance, identifying what the target market of a product is, identifying any applicable risks, understanding what kind of uses the clients are supposed to have for these products, reporting on whether it is sold and distributed in the way that was envisaged. Now, all of that requires that information flows between two very different firms — you know, the distributor might be a huge bank or it might be an IFA; the producer will usually be a very substantial financial institution — but they can be very different is what I’m saying. Similar things happen, for instance, when you cascade obligations in the area of cybersecurity or cyber resilience, where the two organizations — the supplier, the vendor, and the buyer — are actually very different organizations. So, if you need their systems to talk to each other, you need some common denominator to map them against each other. Otherwise, you risk, again, that kind of lock-in that we talked about earlier with regards to suppliers. So, I think the bottom line here is even if you’ve already gone quite a way and had a lot of success in implementing RegTech within the organization, the appeal of interoperable applications and open standards, I think, should be quite significant.
[01:02:03.05] Ben: Let’s assume that you build this, it gets wide usage, you overcome the chicken and egg problem, then we can imagine the network effects — the flywheel of network effects — will really start to kick in. And you know, then you’ll be able to level the playing field between regulators, regulators will get better feedback to make better regulations, there’ll be fewer barriers to entry for new vector companies. And so, you’ll see this unleashing of new RegTech innovation. Firms will be able to comply with regulation more cost-effectively, more quickly. Would you describe that as the end state, the kind of collective good that will be created, or is there anything I’ve missed?
Manos: So, no, I think you’re mostly there. I mean, what I would expect to see if this whole thing works properly, is that in the end, there is a marketplace where firms can engage developers to work on the genome — you know, they don’t need to involve any of us in any way. But also, regulators can start writing regulation that is as machine-readable as possible. So, for instance, right now, there are standards like a common torso for writing machine-readable documents at the document level. You know, you can do a lot better than that if you have a common standard for what is in an AML document or what might be in a cybersecurity document. At some point, once you’ve reached critical mass, you’ll start to penetrate a lot more deeply into how regulators do their work, and also a lot more deeply into how people build applications. And that, to me, is what success will really look like — that people start considering your standards at the outset of building their tools and applications.
Ben: Manos, thank you so much for coming on the show. It’s been great!
Manos: Thanks for having me! A real pleasure!