New York Summer Data Summit 2024
Stopping Data Leakage Before it Happens? Now, That’s What I Call ‘Poka-yoke’
Data leakage happens all too often. Many even see it as a ‘necessary evil’ within financial services when so much information is changing hands between so many people. But is that the case? Instead of mitigating this problem with error-prone, manual processes, why don’t we work toward preventing it in the first place?
Those are the questions that led Mizuho’s Japan Equities COO, Ren Kuroda, to build the Poka-yoke Bot, shown live at Symphony Innovate Asia 2019. Read on to see how the bot works to prevent data leakage and enable compliance teams.
Ren Kuroda, Japan Equities COO, Mizuho:
So Poka-yoke versus information leakage. First, a self introduction. I’ve spent more than half of my life in Japan. You can tell by the accent that I am actually American, but I hardly lived in that country. Most of my career has been in finance. I actually started with the Japanese government, and I thought working for the government would teach me something about bureaucracy. Now I work at a Japanese bank and I’m learning a whole bunch of new things. I’ve spent most of my time in IT and operations. I started in Morgan Stanley where I was a software engineer, and I only mention that because all of the software I’m going to demo today, I wrote myself. So if it all works, it’s because I’m really good. And if it all fails it’s because I haven’t done this professionally in many years. After that I was at Deutsche Bank where I was in the business management and COO team.
After that was Saxo Bank where I was the COO of the Japan office deploying a lot of OTC derivatives, and listed a foreign securities, foreign equities business. So I have a lot of experience across global markets. Also spent a few years on the global banking side doing equity capital markets and investment banking. So I have a very, very broad and very shallow knowledge about just about everything in sales and trading and investment banking. And again, that’s been my entire experience has been in Japan. So I have very deep knowledge about the financial instruments and securities business in Japan, and I don’t know anything about anywhere else in the rest of the world. So Poka-yoke okay, there’s a long definition up on the screen, but basically I like to define Poka-yoke as making the correct inevitable and the wrong impossible. And if you think about how our industry operates right now, think about how trivially easy it is to mis-send an email. Think about how simple it is to do absolutely the wrong thing. To mistype a price into a trading application. We have an entire industry that makes doing the wrong thing easy and it makes doing the right thing really, really hard.
And this is not a good way to run a business that is as heavily regulated as securities. And I’ve been dealing with this as well have for many, many years. And every time I bring up this topic, everybody understands how obvious it is, but nobody can really figure out how to solve it. And so as an example, everybody says, “Well, why are we doing this? Do we want to spark joy? Does this make us happy?” No, we want to make money. That’s it, right? We’re not gonna mess around here.
You can either save costs, you can increase revenue, or you can control things better, better governance, reduce risk. And those all translate directly into making more money. So I don’t preach this as a career because I think it’s a wonderful thing to do and it’s morally correct. I do this because we’re in the business of making money, and this is how you make money. So I have this graph, I’ll tell you a little story about this graph. I’m not going to explain the details of this, but this is the number of data leakage emails per month for a year. And I come from a lean six sigma background so I like statistics. I like data, I like to make decisions based on facts and reality, not based on what I think is good, but what I know is correct. So I sat down with my compliance department after having been at Mizuho for about six months.
And I said, “Okay, but statistics, you have a bell curve distribution of errors per month and you have a location, which is the middle of the bell curve. The location for our errors is 95, and the standard deviation is five.” So in statistical terms, anything between like 90 and 95 is statistically the same thing. It could just be noise. We could’ve just miscounted.
So I said, “Actually the difference between 97 and 94 is statistically insignificant. You can’t actually say that things got better.” And he sort of sat back in his chair and he breathed through his teeth as old Japanese men tend to do. And then he leaned back and put his elbows on the table. He said, “But Kuroda-san, 94 is smaller than 97.” So no it’s not mathematically, statistically. We had this conversation for an hour. Needless to say, I failed.
But the point is that we have implemented for the past couple of years a number of things to prevent data leakage. The amount of pain and suffering that all of the employees at Mizuho have to go through doing online training, learning the internal rules, and if you send an email that crosses a firewall, you have to put this in the subject line. And if you’re going to have to make an attachment, you have to do the encryption and with the password like this and all these things.
They don’t work. They don’t work. Now, to his credit, the number of data leakage emails per month has not gone up, but it also hasn’t gone down. And I am in the business of control. I want better governance. I want this number to go down and it’s just going horizontal. So when you talk about the complexity of working for a Japanese bank, one of the reasons this is so hard, and I think this is possibly unique to Japan, I give credit again for making this Rubik’s cube image. We have different functions across the bank, sales and trading, investment banking. We have different geographies. We’re located in Japan, we have an office in Singapore, the US, London. And then we have the different entities. We have an asset management entity, we have a corporate retail bank entity, we have a securities entity. So in the region of Japan, I work in the securities entity for the sales and trading function. And somebody who works in the US in investment banking in equity capital markets is not only across an entity firewall from me, but they are across a Chinese wall as well because they’re the insider and I’m on the public side.
So crossing these information barriers is very complicated, and let’s think about the words we use to talk about this. It’s an information barrier. It is there to prevent the flow of information. This is not how you operate a modern sales and trading infrastructure or a securities firm. You need to communicate. We need to share information. All day today we have seen all kinds of really cool RFQ systems all based on one thing, sharing information. What is your bid on this? Oh, here’s a quote. If you cannot communicate, we cannot run our business. And the entire compliance department of pretty much every bank represented by this room is about building walls, building walls. Let’s stop building walls. Let’s build pipes. Let’s not make information barriers. Let’s make information conduits. How do we connect the right people? So instead of saying, stop you from miscommunicating with you, why don’t we say, make sure you can talk to her because you need to communicate.
So changing the way we think about how we manage information and what data leakage means. Because I’ve actually had a conversation with a person in compliance who said, “If we never trade, we’ll never make an error.” Factual, right? 100% correct. Utterly useless. So that is not how we govern our business. That is not good control. Having huge walls and never communicating does not help. So as an example of how easy it is to do the wrong thing and how hard it is to do the right thing, I have obfuscated the horrendous UI from like 1994. This is what it takes to send an email attachment outside of my securities firm. So you have a pop up, you attach the Excel or whatever to Outlook. It takes nine clicks. I counted them. Nine clicks to go through all these pop up screens. There are four options that I must check. And by the way, if I uncheck them, it’s irrelevant. The options are ignored, making them not options, which is a whole UI issue we have to get into. So four not optional options, nine clicks, and a UI from literally last century, it was designed in 1995, just to do the right thing.
This is to appropriately send an attachment on an email with the correct encryption and password protection so I don’t mis-send the email and have data leakage. This is not Poka-yoke. This is not making the right thing inevitable. This is completely wrong. And by the way, every time we screwed this up, compliance says, “Oh well no, everybody should do the online training again to remember to do the nine clicks with the four options.” And it’s a 25 minute training with a 20 minute question at the end of which you have to get 100% of the questions correct or you can’t pass the training.
And once we’d done the training, we go back to the regulars and say, “Everybody did the training, so we all get the rule. It won’t happen again.” Yes it will happen every month. The numbers never change. So when we talk about data loss prevention, we have this dichotomy between detection and prevention. We have low efficiency and high efficiency. And right now at my bank, we live in this world down here, where we do post facto detection very, very inefficiently. What does that mean?
It means every day we take a batch of all communications from three days ago, because it takes three days to batch a bunch of stuff into a compress journal file, which gets uploaded into an ancient system. And by ancient, I mean it’s a system older than me, and that system uses very, very bad regex grepping for one and only one keyword at a time.
And then we detect, three days ago you mis-sent an email talking about some information to the wrong guy. This is not ideal. So what we want to do is we want to move up into the right. Everything, all good charts go up into the right. We want to go from low efficiency to high efficiency. We want to be able to do better searches. We want to be able to utilize natural language understanding. We want strong search using real time data, not batch data. So we want to move up in efficiency. We also want to move from detection to prevention. We don’t want to know we made a mistake after we made it. We want to not make a mistake in the first place. So when you sense or detect somebody about to do the wrong thing, stop it.
And this is very obvious. I put this up and everybody’s like, “Where do you want to be on this chart?” Everybody wants to go up into the right. It’s obvious. How come none of us do this? How come every solution we have is, “Well I’ll take all the data I got, all my trades from today after end of market and all the trades are done. I’ll put them all together and then I’ll reconcile them to see if I made any mistakes.” Why don’t you check the prices and do the confirmation as you’re matching the bid in the offer in real time? As we’ve seen today, a lot of these DFQ systems do just that. So we’re moving up into the right, trying to get to strong search on live messages.
So I’ve built a proof of concept that tries to do this on messaging, and the goal is do not allow forbidden messages.
What is a forbidden message? It’s very easy to define a forbidden message. A forbidden message includes information that shouldn’t be shared with the person to whom you’re sending the message. So the easiest definition is we have a watch list. It comes out of the compliance department. It is a watch list generated from all of the names for which we have insider information. So if we’re doing an underwriting and capital markets for Sony, Sony is on the watch list. And the basic idea is that if somebody attempts to send information where it doesn’t belong, it stops you. So we checked the contents and the message against the restricted list. If the message has benign and there’s no problem, let it go. There’s nothing to do. You don’t need to approve every message. Just if the message is considered benign, let it go. And if the message is suspect, then you should hold it for some additional action in this place. It’s basically pending compliance approval. Asking compliance to check the message.
So let’s see what that actually looks like. If we could switch on my laptop and we’re going to try this live. I’m supposed to … No, that’s alright. Sorry. Yeah. So my laptop is asleep. Please wake up.
So the way I’m going to demonstrate this is I have three users, and I have here Chris in compliance, and Susan the sender, and Roland is the receiver. Susan the sender is going to send a message to Roland the receiver, but she’s not allowed to actually send it directly to him. So she’s going to send a message through the Poka-yoke bot and she’s going to send a message through the Poka-yoke bot to Roland. There we go. And you’re gonna say, “Hi Roland, how are you? Lunch tomorrow?”
And the bot died because I lost my internet connection. So I’ll restart the bot as I talk through this. Sorry about that. This is what happens when we demo live. Like I said, I wrote all this software myself and it’s running on a Google cloud instance, and in accordance with the guy who spoke an hour ago, I probably or should’ve put it on Amazon web services and let it run better. That’s how it goes.
But let me just connect here and then you can watch as I fire this thing up in real time. And this is actually one of the really cool things about these bots is you can see in real time what’s going on. So I’m just going to pop over here. I’m going to fire this thing up. Oops. This is the fun part where you get to watch me type how I start up my Bot. It’s written in python by the way. So I’m gonna activate this guy. And then I’m going to fire it up, Python 3 Demo Bot. And I’m gonna run in debug mode because I don’t actually trust it to work. There we go. Now the bots up. So I’m going to go back here and I’m going to send a message to Roland.
Come on Roland, where are you? Come on. Thank you. “Hi, how about lunch tomorrow?” So again, totally benign message. There should be no problem with it, message is forwarded. So if we go over here to Roland, we can see that the Pok-yoke bot has forwarded me a message from Susan calling out, “how about lunch tomorrow?” So what happened here – totally benign message. The bot Said, “Message looks fine, send it along.” So no impact on workflow, no impact on messaging.
What if we send something that isn’t okay? So again, I’m going to send a message to Roland and I’m going to say, “Here is that secret Sony info.” Just to make it really obvious, I’m going to attach the secret Sony info. So here’s the pdf of the secret Sony info. Going to attach that to the message and shoot that off to the bot.
The Bot gets the message that says, “Yeah, we’re pending approval on that because I suspected something.” So as a user I can go ahead and say, well what is the status on my messages? I can look and see that my latest message sent right here is pending approval. And if I go over to Chris compliance, we see that he has a, please approve the pending message. Okay, well what is the pending message? Show me the pending messages. Oh there’s a number 19 just sent. Let’s view that message and see what’s going on. So it looks like Susan is trying to send that secret Sony info to Roland with the attachment. Yeah. I’ll allow that. We’ll go ahead and approve that because it makes for a more interesting demo. So I’m going to go ahead and approve that. And that demo is approved, the message has been forwarded.
We pop back over to the receiver, Roland, and we see that there’s the message and we can even check the PDF. Go ahead and open the attachment and, hey look at that, we got our secret Sony stuff.
So it’s a very simplistic demonstration and I’ve spent a lot of time talking to Symphony directly about getting better data leakage protection embedded into the system. And clearly the goal is that my bot will become deprecated and useless because this will be native to Symphony. But again, if we think about how we take down barriers and build pipes, we can make things better. And if we could go back to the slides, what’s next? Obviously natural language understanding is better than natural language processing. I’m not going to get into the details there, but we’ve seen a lot of really cool use of natural language today and those things are gonna get better, especially in multiple languages.
I work for a Japanese bank, I am bilingual. I perform most of my job in Japanese all day. This is the first time I’ve done an English speech in many years and I’m really nervous. But we have a bunch of equity derivatives traders who speak French because all the equities, derivatives traders are French, even in Japan. And I actually had a compliance guy go to my traders and said, “Can you please stop speaking French?” And his answer was, “I quit.” So you can’t do your job trading equity derivatives if you can’t speak French. But the reason is we don’t have anybody in compliance who speaks French. So they couldn’t check any of his emails. This is not a problem for bots. The other thing that I’d like to do, and I’m really hoping to get this from Symphony, right now, there’s a relatively easy way to pull all of the data out of Symphony via a batch process.
You can download all of the messages in a batch, every hour, every day. You can also use the API to pull real time. You can use the “fire hose” to get a stream of data. And the goal actually, because even the fire hose is post message sending, that’s still detection. To do prevention. We want to actually get into the message matrix and every time Symphony actually sends a message from counterparty A to B, before it gets delivered to the receiver, it actually goes out through whatever your data leakage protection system is, and only delivers the message after you’ve okayed it. And then if we really want to get cool and think pie in the sky, speech to text. So if you’re on a video call or a phone call on Symphony or you’re using an IP phone in real time, convert that conversation to text, run that text through all of these processes as if it’s just content, and then you can do things like put a five second delay on the phone or on the video. And when you say the wrong thing, because the data leakage engine or heuristic has discovered you’ve just said the wrong thing, like cursing during the Superbowl halftime show, bleep it out.
All these things are technically possible. They all make a lot of sense. Nobody will get in trouble for implementing these things, and it just requires us to change our thinking from walls to pipes. And it’s about connecting and communication. And that is the definition of Poka-yoke. Making the right thing as easy as possible, and making the bad thing as hard as possible. Thank you very much.