Anyone who runs a web site knows that they’re constantly under attack. You only have to look at your log files to know that hackers running site-scanners are constantly hitting your servers looking for unpatched vulnerabilities to exploit.
One of the servers I wrote for Guild Wars 1 — named AcctHttpSrv — was designed to behave like a web server. When you received an email from the Guild Wars account management system and clicked on the link it contained, AcctHttpSrv would convert that click into a database transaction to update your account information. Pretty standard stuff.
For debugging purposes I configured AcctHttpSrv to log all unknown requests it received for later review. It turned out the number of errors logged was larger by far than the number of valid transactions. Peeking at the error log provided a view into an entirely different universe, a firsthand look at the dark side of the Internet, each line an exploit that had likely been successful on other sites in the past.
Just the other day one of my readers pointed out that this site started misbehaving: the articles RSS feed stopped functioning. Since I hadn’t logged in for several weeks it likely wasn’t a change I had made. My first guess was a new WordPress hack as there have been several recently, so I ran some security scans (nothing found) and rolled back the site to a previous backup, which fixed the bug. My thanks to Steven Y. for pointing out the problem.
So I thought I’d use the occasion share a story about a web deploy that went awry.
At a previous company I worked at the web team was getting ready to roll out a completely new game site. The old site was in need of a revamp, but rather than re-use the crufty and difficult-to-use web content-management system (CMS) that had been built by an external web agency our team chose a new CMS that would make it easy for non-developers to post game news and updates with no programming required.
The web team was under the gun to deliver the site on a deadline because the game’s platform and services were cross-linked with it, and consequently those services could not be rolled out until the main web site went live. Several missed deadlines, late nights, heroic efforts and so forth, but eventually the site content was ready.
The web team pushed the content to the servers and started forwarding traffic from old servers to new. We were all happy to see the results of their efforts live at last, with everything running smoothly.
Then one of the web developers made a startling announcement. “Greg” (not his real name) announced that he was seeing an “adult” image displayed on the site. Ack! We’ve only been live a few minutes and it has been hacked already?!?
The web team, operations team, and security team jump on the problem. Let’s verify this — RIGHT NOW!
Oddly, nobody besides Greg can reproduce the problem. Should we switch back to the old servers? Brazen it out and call it a feature?
While it wouldn’t be impossible for the site to be hacked, it would be challenging. See, we weren’t actually using the CMS yet — it wasn’t ready in time. We were actually hosting the entire site as static files. Instead of running the site on a smart web server capable of dynamically generating content, we ran the site on the equivalent of a dumb file server.
As we approached the deadline it became clear that, while the site’s content was complete, there were some CMS issues that prevented it from being deployed. But we HAD to deploy the site! So I hacked together a trivial solution which scraped all the web pages from a QA-test server and we posted those files on the public web server. For the technically inclined:
wget --input-file list-of-urls-to-scrape.txt --mirror
Since we weren’t running a dynamic language on the web server, the attack surface was small enough that we could verify the configuration was correct pretty quickly. Hmmm… what are we missing; the clock is ticking!
After a few minutes, Greg sheepishly announced that he had discovered the problem. It turned out that the web browser he was using on his mobile phone (Opera) sometimes rendered pages incorrectly, leaving images from a previously viewed page visible while downloading a new page. Well … you can guess the kind of pages he was checking out at lunch just before the site was deployed.
I have to give him props, though. A lot of folks would not have said anything and we all would have been held in suspense for months. He owned up and saved us all a lot of grief, and we didn’t have to roll back to the old site. Problem solved!
This reminds me of this one time when I was testing a new real time functionality in our web system. Real time web things leave a connection open and respond when the server has to notify the client. So I deployed the code and decided to perform a poor man’s load test before declaring it ready for QA. I opened 20 tabs in the browser just to see what happens. The first tabs opened just fine but then the next set of tabs wouldn’t be able to load resources like images and the last set of tabs would not open at all. I spend the next day in a performance monitor on the server checking various stats and I could not find a reason why 20 simultaneous users could bring the site down. No resource was a bottleneck. Then I realized that browsers had a limit on simultaneous connections to a single domain. When we introduced our real time web feature every tab would leave a connection hanging and some tabs will be left with no connections to work it. It was the browser throttling the connections not the server. Sometimes a bug really is not a bug :)
In case anybody ever wants to do something like this (successfully), in Firefox go to about:config and multiply network.http.max-persistent-connections-per-server and network.http.max-connections (and now network.websocket.max-connections and dom.workers.maxPerDomain if those are relevant to you) by the number of tabs you’re going to try opening.
You might also want to set browser.cache.memory.enable and browser.cache.disk.enable to false.
(Needless to say, make sure you set them back when you’re done, and don’t try this on anybody else’s server.)
I’ve always been scared at the rise in popularity of WordPress sites as an easy target for hacking. While not quite as user friendly, we use Jekyll to generate static pages we push out to our web server, which I guess is also good for performance.
The performance benefits of systems like Jekyll can easily be duplicated by many web frameworks via the concept of output caching. Basically the system caches the whole string generated for the response for a second or two. Of course the real motivation behind using Jekyll is security.
XD
That’s a funny story.
he must have been embarrassed
“I have to give him props, though.”
Was he fired anyway?
Why would you fire someone over that? People make mistakes; mistakes aren’t firing offenses.
Why? That guy is a sure keeper. He conducted himself professionally.
Interesting read! Surprisingly, I can actually relate to an incident I had while I was a volunteer-GM at a MMO company six years ago. From my experience, the company was very experienced with security and had a well-built system to prevent attacks. The incident happened when one of the “paid” GMs working from the offices decided to register at a fansite that was created by one of the volunteers. Long story short, he used the same password as his GM account, and ended up causing the company to do a major security overhaul because they were not sure how a random “volunteer” had surprisingly gained access to their GM tools (which I don’t know why they were client sided in the first place). The volunteer even claimed that he had hacked the system, causing more questioning.
Was this AcctHttpSrv server a completely inhouse solution or did you use a certain framework/base server ?
And as always: very good article!
Custom written: the socket library we used was built on top of Windows IoCompletionPorts (IOCP) since we had lots of good experiences with them for battle.net, but that meant we had to write our own code for *everything* because — in those days — most libraries were not written to handle the “inversion of control” that comes with using IOCP.
These days — assuming I was using IOCP — I’d use something like this: http://twimgs.com/ddj/images/article/2013/0113/fire.cpp
As someone who regularly does security work for a big (vBulletin based) gaming forum I can’t tell you how many times we get attacked a day. Normally we get simple SQL Injection attacks that don’t work anymore but I have had to fight hackers who got through our system using a 0day exploit. In my experience it’s always been fun to try to outsmart hackers.
Haha! Nice. Back in 1995 one of our hosted services stopped responding to requests after I’d tripped over the power lead, stretched through the middle of the office.
Perhaps I shouldn’t be admitting to this in public.