The GDPR ransomware attack on Components
An attack on our MongoDB ends with another lesson in tech journalism's obsequious position
 
        A ransomware attacker hit Components' MongoDB database, which is used to store the data used in our analyses, in mid-December. While checking the status of a migration from one server to another, I found that suddenly, everything on the new server was gone except a single newly created database titled READ__ME_TO_RECOVER_YOUR_DATA, containing the following note (typos included):
All your data was backed up from your server. You need to email us at [EMAIL]@onionmail.org to recover your data. If you dont contact us we will reach the General Data Protection Regulation, GDPR, and notify them that you store user data in an open form that is not safe. Under the rules of the law, you face a heavy fine or arrest and your database dump will be deleted from our server forever!
As it turns out, this specific attack has been ongoing since at least 2020, when it is estimated to have hit nearly half of all MongoDB instances, numbering about 23,000. The added twist to this particular attack is the way the note leverages the fuzzy boundaries of Europe's data protection law, a set of regulations so vast that it can often be difficult to know whether or not one has legitimately run afoul.
I know I'm not the only one who reacted with this same mixture of dread, and despite this very attack hitting many tens of thousands of servers, I have come across very little actionable information answering questions about how it happened and whether attackers actually got any data. Instead, nearly everything I've discovered has come either in the form of vapid tech journalism that does little more than acknowledge its very existence, or armchair lawyers and infosec larpers confidently issuing bald misinformation on Reddit.
So I'd like to do three things here: Provide some answers to potentially exigent questions for anyone who has just faced such an attack, explain how this happened to me and likely how it happened to a good portion of other MongoDB administrators, and once again take aim at the perennially useless institution of tech journalism.
Did the attackers get any data?
The single useful piece of information I found on this attack absolutely anywhere came from Samuel Clay, who experienced the same attack in June 2021 when doing maintenance on his news reader, NewsBlur. I can't thank him enough for providing the commands needed to sift through the massive Mongo log files so I could arrive at the same conclusion he did: The attackers almost certainly made off with nothing. (Also, this post inadvertently follows the header structure of his own somewhat closely.)
Like Sam, looking for logins directly to the MongoDB itself from unknown IP addresses revealed a connection from a Tor exit node, a `dropDatabase` command about one second later, and a disconnection one more second later — far less time than would be necessary to download the dozens of gigabytes stored in the database.
From the standpoint of the threat actor, such efficiency makes sense. First, it's important to keep in mind that hitting tens of thousands of servers requires an automated script that scans the internet for IPs linked to Mongo databases, and is therefore totally agnostic about what kind of data it's hitting. Actually downloading entire databases would place major speed bumps in front of the attack to hit as many instances as possible — it adds risk and decreases efficiency, with nearly no upside. Second, it would require the attack model to include resources to actually store that data. And third, it would increase the attacker's connection time and make them more vulnerable to detection, precluding them from deleting what's there and leaving their bluff note.
Of course, that data is gone, and unless you've backed it up, you're not getting it back. But the specter of it winding up in malicious hands or under European review (an already risible idea when one considers threat actors dutifully filling out GDPR complaint forms to make good on their threats, something I genuinely considered in my delirium) is almost certainly a nonissue.
How did this happen?
What perplexed me was how the attackers were actually able to access the server, a question that also perplexed one user on Hacker News in April 2022:
This week I got my mongodb instances hacked two times. Since it is a test database there is no damage. But my question is how it possible? I've setup a unique password for my db? If the hacker can figure out my password, is it not possible they can hack every other accounts on the internet too? I've hosted my db on AWS. And my port is open to connect from anywhere. I know it shouldn't be kept like that. I am only doing it since it is a test db. My biggest question is how easy is to a hacker to break passwords?
One responder suggested the user had a weak password. Another, who worked for MongoDB, said the proper response was to pay for the company's own managed (and expensive) solution.
But I had the same question, since I had used a 24-character password that a brute force attack shouldn't have cracked within a few days of launching the server. In looking into this, I found two things:
First, password authentication on MongoDB is **turned off by default**. Even more confusing, there is not even a line in the MongoDB config file with authorization explicitly disabled. Instead, the database administrator must go into the `/etc/mongod.conf` log file, go under the `#Security` section, uncomment that, and then manually add the parameter `authorization: "enabled"` underneath. This is not obvious, and you have to rifle through the MongoDB docs to figure it out.
Second, Linode, the cloud provider we use, **did not turn authorization on** when the server was created through their one-click MongoDB instantiation process (which lets you create a MongoDB server in a GUI without having to download, install and configure it yourself), despite having users like me create passwords for the server admin account. 
To illustrate the misunderstanding that users would have that they've created a password-protected server, this is what Linode's MongoDB creation process looks like:

A reasonable person would believe that the password set under "Mongo Password" is the password needed to access their database. In fact, at the time I created my server, it wasn't — as it turned out, the config file still had no line for authorization. The password was set, but it wasn't actually used for anything.
When I brought this up to Linode after the attack, this was their response:
Hello,
I took some time to evaluate the information that you provided. I would like to thank you for the effort that you put into investigating this, and I apologize for the hardship that you have encountered over this issue.
I took more time to evaluate the documentation from MongoDB, specifically the MongoDB Authentication. Reading through the documentation I was able to verify what you stated. My assumption was that our Marketplace App had enabled this during installation. I created a new Linode with the MongoDB Marketplace App that you used. I followed our guide, as you likely did as well.
I had hoped to find that /etc/mongod.conf would show security: authorization: "enabled".
Like you stated, it instead shows:security:
authorization: "disabled"
I would like to thank you for bringing this to our attention. We value security at Linode, and you have done us a service by letting us know about this. I have escalated this issue to our Marketplace Apps Team.
There are a few items of note here. First, Linode deserves credit for taking the issue seriously and actually responding with at least a pledge to action. Second, they themselves were still under the misunderstanding that there is a specific config line for authorization that is disabled by default, rather than there being no config line at all. Third, and most importantly, Linode's admission is an important wrench thrown into the interpretation offered by the dismissive figures at MongoDB and their mouthpieces in the tech press — that this attack is easily attributed to the catchall diagnosis of "user error".
Among the top results I found when searching for any information on this attack was an article on the tech news site ZDNet by Catalin Cimpanu, whose bio says he was "security reporter" for the site from 2018 to 2021. From what I can gather from his archived articles, Catalin's work was the same public relations wingmanship that constitutes nearly all mainstream tech journalism, as examined in our report on the consumer tech review industry. Catalin's last five articles published before he decamped from ZDNET were: a rewritten press release from Google about Chrome, a rewritten press release from infosec company Intezer about the Go language, a rewritten press release from infosec company Proofpoint, and a republished content marketing infographic created by CrowdStrike.
True to form, Catalin's 2020 article on the Mongo attack seems to be the result of a single conversation with a security researcher who handed him the news, peppered with some background information on Mongo's yearslong history of its servers being hacked by the tens of thousands. Rather than use this new attack to interrogate that history and question if there's maybe something about Mongo's design that makes it susceptible to recurrent crisis, Catalin deflected the question entirely using official MongoDB PR:
[T]hese "MongoDB wiping & ransom" attacks aren't new, per-se. The attacks Gevers spotted today are just the latest phase of a series of attacks that started back in December 2016…More than 28,000 servers were ransomed in a series of attacks in January 2017, another 26,000 in September 2017, and then another 3,000 in February 2019.
Back in 2017, David Ottenheimer, Senior Director of Product Security at MongoDB, Inc., blamed the attacks — **and rightfully so** — on database owners who failed to set a password for their databases, and then left their servers exposed online without a firewall.
Almost three years later, nothing appears to have changed. From the 60,000 MongoDB servers left exposed online in early 2017, the needle has barely moved to 48,000 exposed servers today, most of which have no authentication enabled. (Emphasis added)"
As we've seen, setting a password isn't enough, despite the assessment of the very senior figure at MongoDB Catalin quotes responsible for ensuring that this stuff doesn't happen. That more than a hundred thousand MongoDB users would be capable of instantiating databases but would leave them online without password protection doesn't seem all that plausible.
These kinds of discrepancies hold little interest for Catalin. It would be bad enough had he simply parroted a MongoDB upper-manager's dismissal of this entire problem as user error and moved on. It's the willing addition of explicit support for the company line, his bootlicking "rightfully so", that points to a deeper neurosis.
Generally speaking, deferral to user error allows product designers to both avoid solving a problem as well as stand superior next to users getting wrecked from product — a winning stroke where one garners all the entitlement and none of the responsibility. Here, Catalin's "rightfully so" is a lame gesture to command expertise by proxy — in this case, that of the finger-wagging MongoDB manager. For people like Catalin, whose careers are built on stenographic transmission of corporate messages, siding with a dismissive Director puts them on the Director's level, above incompetent users, allowing them to signal expert knowledge without having to know much at all.
This genius-through-mimesis is everywhere in the field, and it's something PR managers work to cultivate. One example that has stuck with me over the years comes from a 2019 *Verge* article on Apple's then-new A13 chip. The title, "Apple says its new A13 Bionic chip brings hours of extra battery life to new iPhones," is already a press release by itself, and one can sense the warmth of engineering knowledge the writer, Sean Holliser, felt he was cozying next to when he wrote this line:
That efficiency boost is despite fitting a record 8.5 billion transistors inside and upping performance by roughly 20 percent across the board — and Apple says the chip now has the most machine learning (ML) performance, too, with an eight-core neural engine that adds “6x faster matrix multiplication.” Overall, the chip’s capable of one trillion operations per second, according to Apple.
Whether Sean knows what matrix multiplication actually is or how it's related to machine learning is doubtful given that he stuck it in quotes, and it's irrelevant anyway. What mattered for Sean when he typed those words in his Google Doc was the privilege he felt to write "matrix multiplication," and for a moment *feel* like a person who knew what it was, as one might repeat intelligible syllables in Japanese without understanding what they're saying.
I've personally known people in this field like Sean and Catalin — none of them have run a Linux command their entire lives and would be unable to investigate this MongoDB issue or compare the speed at which different Apple chip generations find dot products even if they wanted to. When faux tech journalists have to cover a topic as existential as information security (a specialty where expertise is routinely battle tested in a way that milkier disciplines like user experience aren't), the chasm between their putative role and their actual execution reaches its starkest and most embarrassing contrast.
This isn't a new critique of journalism in the age of content, nor of the broader phenomenon of absolute, worshipful supplication to brands that has become unimaginably worse over the past 15 years. But even if the ignominious field of tech journalism has no prospects of self-improvement and public relations is baked into its DNA, there is always at least the temporary relief of disgracing its actors, as well as the figures they serve, by doing their jobs better than them.