The confession of Dr. Conspiracy–Part 3

In Part 2 I talked about identifying sock puppets at Birther Report by exploiting information in its avatar system. The underlying email addresses are obscured by a cryptographic digest, but is there a way around that?

Troll Hunter

It started two years ago with my article, “Troll hunter.” It’s about a Swedish group’s attempt to expose individuals who posted at a right-wing web site, taking advantage of a poor security model in the popular Disqus commenting system. Disqus provided, through a public interface (API), a cryptographic digest or hash of a commenter’s email address. Troll Hunter’s approach was to collect a huge number of email addresses (around 200 million), compute their cryptographic hashes and match them to commenters on the right-wing web site. When the email hash matches the commenter’s hash, then the commenter’s email address is exposed.

Disqus subsequently changed its API, and the specific approach used by Troll Hunter no longer works, but I wondered if a similar approach would work at Birther Report. There were two initial goals: one was to determine if any prominent person was secretly a birther, and the second was to figure out the identity of the BR commenter named ★FALCON★. BR uses the IntenseDebate plug-in, and to my knowledge it has no public API. It does, however, leak user email MD5 hashes for commenters who use Avatars supplied by gravatar.com, that is, most of them.

In order to display the avatar (unless the user signs in with Facebook), IntenseDebate generates a URL, for example this one for me:

http://gravatar.com/avatar/561bb74e93a2400ed235cd5d3fc5fa43?s=86&d=identicon

The bit between the slash and the question mark (“56 1b b7 4e 93 a2 40 0e d2 35 cd 5d 3f c5 fa 43”) is the MD5 hash of my email address here at obamaconspiracy.org. All it takes to get that URL is to right-click on the avatar and select “Copy image address” (in Chrome) from the context menu. Even some generic looking avatars may have an MD5 hash, sometimes even users with the name “Guest.” Without an API, harvesting these gravatar MD5 hashes and entering them into a database is a tedious and time-consuming manual task, but I did it over several months, and collected 711 of them from BR (not counting a huge number of sock puppets I discovered and discarded). While my focus was Birther Report, it was not the only web site I looked at and found leaking MD5 hashes. CDR Kerchner, Fellowship of the Minds, drkatesview, Citizen WElls, Western Journalism, JAG Hunter, Impeach Obama Campaign and wtpotus were some others. Fortunately, not all websites required manual right-clicking, copying and pasting. Some could be scanned with automation that read the site’s HTML and navigated from page to page. All in all, I recorded 4,308 screen names and 4162 distinct email hashes from 27 sites (not all of the harvested email hashes belonged to birthers and not all sites were exclusively birther sites).

The next step was to collect lots of  email addresses. While that process was largely automated, it took months also. Various Internet web sites contain bulk lists of emails in various formats, typically a hundred or two per page. Some accidentally leave lists around. I found a magazine’s subscriber list. I found lists of results from hacking attacks posted on the Internet at dazzlepod.com. I used email addresses listed in birther lawsuits, sloppy redaction by Orly Taitz, and amazingly an XML export of all the comments from a prominent birther website that was just laying around for Google to find. (I notified the site owner that the file existed and I believe it has since been deleted.) Eventually, I collected 146 million email addresses in my Microsoft SQL Server database, far more than I ever expected. I would let scanning and scraping programs run for days to get the email addresses from tens of thousands of pages of email listings. Some sites figured out what I was doing and blocked my IP address. I went to the Google cache. I could not have done this without my programming background and sometimes 10-hour days coding.

In none of this was anyone “hacked” nor any web site penetrated. No passwords were guessed. No malware was employed. No social engineering was used. All of the collected information, both hashes and email addresses, was freely available on the Internet. I just looked really hard and really long and really smart. Long story short, many emails were identified, but not Falcon’s.

OARPA

Now we enter the second phase of the project that became known as OARPA (Obot Advanced Research Projects Administration). I dropped hints about OARPA, but they were largely misdirection. OARPA started out as software to generate email addresses: birther1@aol.com, birther2@aol.com, birther3@aol.com …. I collected huge lists of first and last names, lists of common words and uncommon words, and I assembled them in various ways into trial email addresses. I hashed about a trillion  combinations of words, prefixes, suffixes, special characters and digits. ★FALCON★’s email address was low-hanging fruit for this approach because it consisted of a common word plus some digits at a popular email domain. In the end, however, it was not the email address that gave ★FALCON★ away, but his own rambling self disclosures on various web sites (more on that later).

This brute force approach was very productive, but I still had other nuts to crack. Barry Soetoro Esq. was still unidentified. I asked for help.

OARPANET

imageOARPANET was a distributed processing framework where remote computers could connect to the central OARPA server and check out a range of email guesses to scan and a list of unknown hashes. These were  subsets of all possible letters, numbers and special symbols of a specific length, AAAAAAA, AAAAAAB, AAAAAAC …). The software was really pretty cool, including web services, and multi-processing (users could specify now many of their computers’ cores to dedicate to OARPA). A tremendous amount of effort went into optimizing the process for speed. I could check network progress remotely from my smartphone.

I collected over 3.6 million domain names, far too many to pair with all the generated random user names. Only a small list of common domains was used for most scans, but one particularly fruitful technique was to take known email names and try them combined with the full list of domains.

OBOT volunteers installed the software, and the network hummed along for months, generating and testing several millions of random email addresses per second, but maybe finding no more than one new email address match on a good day.  Still, it paid off, and we did finally guess Barry Soetoro Esq.’s address in a scan of random 7-character strings at a common email domain. A computer that I bought primarily as a dedicated OARPA scanner got the trophy for nailing Barry and 57 others in the final round.

By mid 2015, OARPANET was shut down due to diminishing returns. Generated email account names were getting longer and hits scarcer. There is no way that every possible email domain could be searched against trillions (yes trillions) of sequentially generated email addresses. Unstructured user names consisting of letters, numbers and characters were exhaustively searched up to 8 characters in length. From all sources we matched 2,098 screen names to email addresses (including Dr. Deb, two Joe Mannixes and furtive), and specifically 68.6% of those taken from Birther Report. In total 1,961 distinct email addresses were uncovered.

Some birthers were more careful than others. Anyone taking even moderate precautions would never have told my project anything. A few of the 2226 forum names we didn’t crack include:

  • 4zoltan6
  • AmazingGrace6
  • Barack_D_Fraud
  • Birther1 (this is Mike Volin, no secret, but his email address at BR is unknown)
  • charlesmountain (two addresses)
  • EWO
  • Fast Falcon
  • Grand Birther
  • Guest (several)
  • John Gault
  • Logical Patriot
  • Miki Booth
  • NaturalBornCitizen
  • NoKidding
  • Orly Taitz
  • Reagans_Ghost
  • Reality Checker
  • TANGENT 01

The final phase of the OARPA project was to match email addresses to actual people. This is somewhat of an art. My commercial experience in record matching helped me to understand how easy it is to make a false match; it’s confirmation bias. I wrote about the difficulty with false positives in my article, “Confirmation v. prediction.” While a little automation was developed to reduce the manual effort of searching and recording information, that step involved nothing particularly innovative. It’s all Google. Here’s a screenshot of my Information Manager (click image to enlarge) for Dr. Deb. “BF115” in the “Source” column refers to the 115th run of the Brute Force scanner.

InfoMgr

Each item recorded has an estimated confidence number along with it. For some, we found a lot of information–for others, nothing. With Barry Soetoro, Esq. I was able to connect him to a Facebook page, but that appeared to be under a fake name. A particularly information rich scenario was when someone had registered a domain under their real name using the same email address on the registration that they used to comment.

[Update: BSE was eventually identified in the Fall of 2017 by matching his email address to a website that contained his real name and location. His name is not one you would recognize.]

If a birther commented here or at the birther site that leaked the exported comment file, then I also had an IP address (an IP address may lead to a geographic location, although this is not 100% reliable). I also found a few LinkedIn profiles, Facebook pages (e.g., for Dr. Deb), resumes, work addresses, domain registrations and miscellaneous stuff. I decided not to record any phone numbers, although some were available.

During the entire process, no prominent individual was found commenting at any birther website.

The final product was a huge HTML file of everything. There are three copies in three locations with three custodians, so if anything happens to me …

A related project involved software and a database to collect IntenseDebate comments for around 70 selected individuals (including myself), both to prevent their loss if deleted and more importantly to make it easier to search them. I can add someone new and the software will load all previous comments from their IntenseDebate profile page. The software, if run soon after a comment is made, captures a more accurate time stamp than is available from Intense Debate later. Another big advantage is its ability to follow an Intense Debate user across web sites and particularly helpful in assembling the bread crumbs to his identity ★FALCON★ left across several sites.

image

In the final analysis I have to ask myself why go to all that trouble to gather information that will never be released. Part of the answer, and I think probably the main answer, is the challenge. It was one last big project for a retired software developer. It was hard problem. It forced me to learn new things. It also proved that I’m smarter than the average birther.

As for the Birthers, they still don’t know who RC is, and you’re not going to find out from me.

This concludes the Confession of Dr. Conspiracy.

About Dr. Conspiracy

I'm not a real doctor, but I have a master's degree.
This entry was posted in Dr. C. Comments and tagged , , , , , , , , , , . Bookmark the permalink.

26 Responses to The confession of Dr. Conspiracy–Part 3

  1. Cody Judy says:

    Wow! That’s pretty amazing Doc! 146M email addresses 320M Americans, any idea how many are USA Originating, or is there anyway to discern that?

    With your Emails and my Campaign we might have stopped Trump. Lol 😎

    Hey, did you hear?

    https://m.facebook.com/story.php?story_fbid=1406102356091210&id=510896692278452

    Birther GOLDEN GLOBE AWARDS Meryl Streep goes mad mocks U.S. Constitution Qualification for President at Golden Globe Awards referencing Obama’s Qualification Fraud as if it was not a Bullying Tactic

    [ Where are their birth certificates?’ she asked]

    http://www.dailymail.co.uk/news/article-4100774/Meryl-Streep-slams-Donald-Trump-Golden-Globes-acceptance-speech.html

    What part of Fraud is not a Bully Tactic Ms. Streep? Have you seen the evidence? #MerylStreep

    https://youtu.be/BGEMHOEil5c

  2. Pathetic. Judy pimping his own poop, again.

  3. bob says:

    Ex-con Judy is too blinded by his own hated and stupidity to see whatever else sees: Streep was mocking only Trump. Bullying requires superior strength or influence; surely ex-con Judy is not suggesting that Streep is stronger or more influential than Trump.

    And — yet again — Judy pre-emptively dumps on a very interesting topic.

  4. bob says:

    Question for Doc: What would have been the results if a birther had been smart enough to use misdirection in an email address, i.e., mynameisdonaldtrump@yahoo.com? Were there processes (human or computer) to root out those false positives?

  5. Northland10 says:

    Cody Judy: Wow! That’s pretty amazing Doc! 146M email addresses 320M Americans, any idea how many are USA Originating, or is there anyway to discern that?

    I have 5 or 6 active email addresses right now. One or two exist for some site registrations so I can keep spam away from my regular ID’S and not use emails that actual have my name.

  6. @ bob
    I think Intense Debate requires that you have a valid email address to sign up. They send a verification email. Most comment systems work like that. Of course one could use one of the services providing temporary email addresses like 10minutemail.com. Don’t ask me how I know.

    The OARPA project was amazing. That was certainly beyond my programming capabilities.

  7. Andrew Vrba, PmG. says:

    Judy has this confused with the open thread. Then again, he confuses a lot of things, like fantasy and reality.

  8. The computer matches were 100% accurate matching forum names to email addresses. Matching an email address to a person was all done by hand (and large numbers of email addresses were never examined).

    I didn’t see anything that set off an impersonation flag. Your example goes to the heart of why it didn’t happen. Very rarely was the name ultimately linked at BR the name of someone notable, the exceptions being someone like Kerchner or Booth. The problem with impersonating someone’s email address is that you have to know it in advance. I certainly wouldn’t conclude that an address like firstname.lastname@gmail.com was the real address of the named person. I’d want confirmation for that association somewhere else. When it came to linking an email address to a person. Most of the email addresses I harvested were pretty obscure.

    What I saw more often was simply made-up email addresses like this one:

    123@yahoo.com

    or a@b.com, abc@comcast.net, anon@yahoo.com, anonymous@verizon.net and info@hell.com.

    If there were any attempts at impersonation through the given email address, I didn’t spot it unless it was such an obvious spoof that it was never considered. I am assuming that the Orly Taitz poster (who used no known Orly Taitz email address) was a fake.

    But the direct answer is that if a birther were clever enough and knowledgeable, then they could impersonate someone and I couldn’t tell it except in some cases by IP address.

    bob:
    Question for Doc: What would have been the results if a birther had been smart enough to use misdirection in an email address, i.e., mynameisdonaldtrump@yahoo.com?Were there processes (human or computer) to root out those false positives?

  9. bob says:

    Reality Check:
    @ bob
    I think Intense Debate requires that you have a valid email address to sign up. They send a verification email.

    Yes, but verification only verifies that there was a request to open an email account. It doesn’t verify, for example, the identity of the person who requested to open barackobama1961@yahoo.com.

    The OARPA project was amazing. That was certainly beyond my programming capabilities.

    It is impressive. So impressive I’m curious whether such efforts have been replicated for a commercial/industrial use. And, if not, is Doc going to share (or sell?) his program. Because it sounds like it has uses beyond birther hunting.

  10. Verification only verifies that the email address is valid and the person signing up can open up the verification email. That’s all it means and doesn’t say anything about the identity of the owner.. I believe verification is primarily used to filter out automated spammers in comment streams.

    bob: Yes, but verification only verifies that there was a request to open an email account. It doesn’t verify, for example, the identity of the person who requested to open barackobama1961@yahoo.com.

  11. Pete says:

    Interesting what a few technically savvy volunteers can do.

    Now think of what the NSA knows.

  12. Rickey says:

    That is fascinating stuff.

    Doc, I recall that you once had an e-mail address for BSE but my databases were unable to match it to a name. Without giving the details, were you ever able to identify him?

  13. Sluffy1 says:

    Doc, You da man!

    I’ll bet Ramo Ike is shitting himself …

  14. I should mention that I am not taking credit for finding out Falcon’s name. I found his email address, and through that some of his web sites. It was the Intense Debate comment tool that located comments that pointed in the right direction. At the end several people were racing along the threads that I had found, and that others were finding. The final coup de gras was found at a public library, not on the Internet.

  15. Arthur B. says:

    Dr. Conspiracy: The final co=u

    Uh-oh! They got to Doc!

  16. He, he. I was wondering what folks would think of that. Windows decided that it needed to reboot to install an update, and at the same time I was interrupted IRL, so I just hit “save.” I fixed after the reboot.

    Arthur B.: Uh-oh! They got to Doc!

  17. I wonder if WND or the birther press will pick up on these articles. I can imagine the spin.

  18. H. Keith says:

    Man im gonna miss this place

  19. One bit that I omitted from the article is that the database has a “private” flag attached to emails. Private entries don’t go into the HTML file kept by the custodians. These are essentially OBOT commenters.

    The 27 websites where I harvested commenters does not include the one that misplaced the comment export file. It’s not in the database. That would have added some 1300 forum names.

  20. The original Troll Hunter project was designed to embarrass government officials who were posting anonymously on right-wing forums and saying racist things. My project didn’t uncover anyone like that. I guess officials are too good for Birther Report.

  21. I did find one media person who posted at JAG Hunger, but it was under her own name. I haven’t been able to go back and find the comment.

    I found this neat file with about 4,400 new emails, so I loaded it. While I was looking around, I ran across one of my previous matches for RacerJim. I found some new confirmation about what his name is. Address and phone number found, but not recorded.

  22. The Magic M (not logged in) says:

    A few of the 2245 forum names we didn’t crack include:

    […] Logical Patriot

    That was me. 🙂

  23. The Magic M (not logged in) says:

    Dr. Conspiracy: At the end several people were racing along the threads that I had found, and that others were finding.

    I like your investigators who can’t believe what they’re finding. 😉

  24. Well, what goes around comes around.

    I got an email from Have I Been Pwned:

    You’ve been pwned!

    You signed up for notifications when your account was pwned in a data breach and unfortunately, it’s happened. Here’s what’s known about the breach:

    n October 2020, a security researcher published a technique for scraping large volumes of data from Gravatar, the service for providing globally unique avatars. 167 million names, usernames and MD5 hashes of email addresses used to reference users’ avatars were subsequently scraped and distributed within the hacking community. 114 million of the MD5 hashes were cracked and distributed alongside the source hash, thus disclosing the original email address and accompanying data.

    Yup, they did it to me.

  25. Hmmm, mine was listed in the Gravatar breach as well as in one at Disqus.

    Dr. Conspiracy: Well, what goes around comes around.

    I got an email from Have I Been Pwned:

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.