All posts by Zanzibar McFate

Wednesday Linkfest

ETHICS!

    FirefoxScreenSnapz166

  • Columnist Dan Savage asks the entirely reasonable question, “Why are people horrified at Gawker for outing one cheating dude, yet gleeful over hackers outing 37 million of them?”

Security

  • After Hieu Minh Ngo was convicted of a massive series of identity thefts, a class-action suit has been instituted against credit bureau Experian, which is accused of violating the Fair Credit Reporting Act, among a variety of other things. The plaintiffs want to force Experian to contact anyone who may have been affected by Ngo’s activities, to offer them a full year of free credit monitoring, to disgorge any profits Experian may have realized from Ngo’s scheme and to establish a fund to reimburse people affected by Ngo’s activities.
  • AshleyMadison CEO Noel Biderman made an effort to pitch Robert Scoble on how incredibly serious they were about security. You’d think they might’ve considered encrypting their databases.
  • A possible breach at PNI Digital Media, providers of a widely-used online photo management platform, has had the effect of causing CVS, Rite-Aid, CostCo and a number of others to shut down their photo-processing services.
  • The Federal Trade Commission is taking action against putative identity-protection firm Lifelock, for lying about its services, a charge it has faced in the past. Additionally, the FTC has charged that LifeLock failed to implement a meaningful security program (STOP ME IF YOU’VE HEARD THIS ONE BEFORE!), falsely claimed that it offered consumers protection comparable with that of major financial institutions with regard to their own data, and had failed to meet the record-keeping requirements of the company’s $12 million, 2010 settlement with the Commission and 35 states’ attorneys general.

Technology

DEVONthink Pro OfficeScreenSnapz003

Drones, Drones, Drones

Society & Culture

FirefoxScreenSnapz165

  • If you talk with your co-workers about your salaries at Google — discovering all sorts of untoward things in the process, evidently — your manager will give you a hard time about it, in spite of the fact that doing so is completely illegal in California.
  • Breitbart chucklehead and Donald Trump impersonator Milo Yiannopoulos doesn’t believe women should be involved in tech. No one’s got time for that kind of stupid, especially not Margaret Hamilton, who led software development for the moon landing and coined the term “software engineering”.
  • Speaking of “Breitbart chuckleheads”, editor Ben Shapiro has filed assault charges against transgender reporter Zoey Tur after she put her hand on his neck and called him a “little man”. Why so serious, Ben? Feeling…inadequate…?
  • A young iOS developer hurls herself to her death from a 20th-floor rooftop bar in Manhattan’s Flatiron district. Other patrons, attending a “corporate event”, are unperturbed and just keep on drinking.
  • A study by researchers at the University of New South Wales and the University of Florida has found that the worse a guy is at games the more likely he is to make negative comments toward women gamers. U JELLY BRO?
  • Remember how people used to “run away to the Big City” to make their fortune? Got a median income? Here are all the big cities you can’t afford to live in, and when they became unaffordable. San Francisco crossed that line in 1982.
  • A 6-foot 4-inch 260 pound South Carolina construction worker has been arrested for slapping a waitress (on whom he had 120 pounds and 13 inches) when she took issue with his racially harassing a black family while they were trying to have dinner. Reportedly, he was under the impression that the family “didn’t mind” being abused.
  • Been bitten by a rattlesnake? Expect an enormous hospital bill. ObamaCare has improved things, but the American health care system is very broken.

Other Stuff (and #CannibalismInTheNews!)

DEVONthink Pro OfficeScreenSnapz002

  • A burglar manages to take a selfie by accident while stealing an iPhone from the apartment he’s broken into. Venice, California police are requesting help in identifying this dolt. Guys like this are the reason “crime doesn’t pay” — they bring down the average.
  • “Pot polish” — the rounding of broken bone edges when they’ve been cooking in a pot — as well as cut marks on the bones show pretty conclusively that the doomed Franklin Expedition of 1845 did, indeed, resort to cannibalism.
    Relatedly, Dan Simmons’ book The Terror is a terrific fantasy novelization of the privations and demise of John Franklin and his two ships full of hapless, doomed explorers.
  • Missed yesterday’s Festival of Links? Check it out here.

    The Internet is Broken: The Trolls

    Back in the day on USENET, we had trolls, although they were relatively mild (for the most part—there are distinct exceptions, and you’ll get to meet one in the next installment). We also had a reasonably effective way of dealing with them: we sent them to alt.flame, USENET’s own little “basement”. It worked reasonably well, mostly through force of tradition and peer pressure; there were certainly no technical measures to enforce it, nor could you “throw someone off USENET”. It wasn’t a “site”, it was a distributed system of servers which synchronized with one another, and there was no notion of “membership”, you simply posted things to a group.

    Mostly, trolling amounted to name-calling. Some of it was clever, some of it was dopey, but it was a rare case that ever went beyond that.

    (And just to demonstrate how far South things have gone, a Google search on “alt.flame” turns up numerous references to something called “alt.flame.niggers”.)

    Today, we hear—from folks like the denizens of ChanLand and its territories, like #GamerGate—that people who complain about online harassment are just “getting their jimmies rustled” over people “saying mean things on the Internet”.

    Anyone who believes that needs a swat upside the head with a clue-by-four, and then to read this story, once they’ve regained consciousness. A cabal of anonymous trolls literally drove a man almost to suicide and terrorized his family in Virginia. If that’s not enough, read how Nazi harasser Andrew “weev” Auernheimer (in our “featured image”) drove Kathy Sierra off the web.

    If you need more evidence of how out-of-control this can get, I strongly recommend Danielle Keats Citron’s Hate Crimes in Cyberspace, which is a pretty chilling read—at least if you couldn’t have written a bunch of it yourself. Here’s a very partial extract of some of the harassment experienced by a grad student Citron refers to by the pseudonym “Anna Mayer”.

    Over the next year, the attacks grew more gruesome and numerous. Sites appeared with names like “Anna Mayer’s Fat Ass Chronicles” and “Anna Mayer Keeps Ho’ing It Up”. Posts warned that “guys who might be thinking of nailing” her should know about her “untreated herpes”. A post said, “Just be DAMN SURE you put on TWO rubbers before ass raping Anna Mayer’s ST diseased pooper!” Posters claimed she had bipolar disorder and a criminal record for exposing herself in public. Racist comments she never made were attributed to her. Posts listed her professors email addresses, instructing readers to tell them about Mayer’s “sickening racist rants”. Someone set up a Twitter acount in Mayer’s name that claimed she fantasized about rape and rough sex. Hundreds of posts were devoted to attacking her.

    I want you to keep this in mind, especially when we get to the next installment, which will go over my experiences having an online stalker for (so far) over a decade. You’re going to see some similarities.

    However, I want to relate the specifics of why I’ve “gotten off” Twitter—actually my account is now protected, and I’m limiting the people who have access to it. As I’ve mentioned, I’ve got some strong opinions about GamerGate, and I haven’t been terribly shy about expressing them, forcefully. This has been going on for several months, but in the last few weeks, the response from GamerGate went from name-calling to actual harassment.

    One form this harassment took was the posting of photographs of members of my family by a GamerGate-r, @rustBeltExpat. I reported them to Twitter, and learned a couple of things about how “seriously” Twitter takes harassment. First, it seems to have taken them about three days to get to dealing with my report; at least that’s how long it took to get a response on it. Second, I was informed that if the harassing user takes down obviously privacy-invading material before Twitter looks at it, it “doesn’t count” as harassment.

    MailScreenSnapz035

    Another form was the creation, and subsequent deletion—over a period of maybe ten or fifteen minutes apiece—of numerous new Twitter IDs impersonating various members of my family, again with photos. These were used to bring my attention to their existence by favoriting or retweeting various defamatory posts I was mentioned in. The bottom line here is that Twitter is a great deal less than serious about its commitment to dealing with harassment on its platform.

    When I say “defamatory”, I’ve been accused of being a arsonist, a Satanist, a blackmailer, a “revenge pornographer”, an attempted murderer, and a pedophile. That’s fine, I don’t worry too much about stuff like that, particularly when the “evidence” is nothing but anonymous comments somewhere that link to other anonymous comments somewhere to provide a façade of “support” for the claims.

    In spite of this apparently-lengthy criminal record, I’ve never heard from an actual representative of law enforcement on any of these very serious charges. Go figure.

    When uninvolved third-parties who have absolutely no horse in the race get dragged into things to be used as a blunt object, that’s a strong sign that someone out there is valuing their viewpoint a little too highly. And since Twitter only offers lip service to its “concern” about its users being attacked (and attacked and attacked), this represents my “strategic retreat” to “higher ground”. We’ll see how things proceed.

    In the interests of fairness and balance, I need to point out that trolling — at least of the milder, name-calling sort — is not limited to #GamerGate partisans. At around the same time this was all going on, I had gotten involved in the usual sort of heated #GamerGate discussion in which one of the other participants was #GamerGate critic Sarah Nyberg. It should be noted that Nyberg has herself been subjected to harassment by #GamerGate as well over the past six months, much of it in the form of accusations that she’s a “pedophile” and a “dog-fucker”. (In a similar vein, ggblocklist ccreator and OAPI executive director Randi Harper has been accused of selling her child for methamphetamine.)

    Nyberg effectively issued me an order that I untag not her, but an unspecified “us”, from the conversation at one point. I pointed out to her that she wasn’t the boss of me, and that if she wanted something from me, she could ask nicely and say “please”. Her response to this was to block me and start tweeting about how I was “the archetype of a problematical male ally”, along with the help of about a half-dozen of her minions.

    FirefoxScreenSnapz164

    Of course, #GamerGate happily picked this up, and has been broadcasting the news that the (actually non-existent) “aGG” — the monolithic block of “Social Justice Warriors” they’re crusading against — had “excommunicated” me.

    No worries, I’ve been declared a heretic before by much more impressive groups.

    In the next installment, I’ll talk about my own personal stalker, a fellow with more names than most people have housekeys and a very sad excuse for a human being who’s managed to be a pothole on the Information Highway for two decades now.

    His given name is Jason Christopher Hughes.

    This is the second of a series of articles; the previous installment is here.

    UPDATE, 7/22/15: Apparently Twitter went back and took a closer look at @rustBeltExpat; the account has now been suspended for “abusive behavior”.

    MailScreenSnapz038

    Tuesday Linkfest

    What I’m hearing from people is that they’re going to miss the raft of links I typically would post on Facebook about the news and other stories that had caught my interest. Here’s a first stab at providing something as a replacement.

    Discussion in the comments. (Comments are moderated, get over it.)

    Security

    • AshleyMadison.com, a site which purports to enable one to have a “discreet affair”, got hacked and badly. As many as 37 million users are at risk of having their personal data—including ther “kinky sexual fantasies” posted online. A group calling itself “Impact Team” is taking responsibility, and it all seems to be over an issue of not scrubbing data after hitting up customers for $19 to do so.
      What astounds me here is the moralizing in the comments. A site with your medical or financial records, or lists of purchases you’ve made that you might not want your boss to know about, can be hacked just as easily as a cheating-on-your-wife site you despise. The blue-nosed “YOU DESERVE THIS!” tone of the attackers’ taunts suggests to me that Chansters are behind this somewhere. They’re all about ethics, you know.
    • From the “Stop Me If You’ve Hurt This One Before” desk, Android is again being called out as a security risk owing to the tremendous amount of fragmentation in the platform. This time, security researchers have written a paper on it (link in the article).
    • After discovering that one of the zero-day exploits they sold to hacking team had in turn been sold to human-rights-violating regimes, Netagard has shut down its controversial Exploit Acquisition Program. Again.
    • Completely aside from the potential impact on jobs and the economy, there’s a side of those “self-driving cars” that’s not getting discussed in all the excitement. Here’s a story relating a demonstration of hacking a Jeep on the highway, and taking it over. We’ve already seen drones spoofed with fake GPS; while robot cars may well reduce accidental traffic fatalities, I’d bet any amount of money we’ll see some deliberate ones. You know: for the “lulz”.

    Technology

    Society & Culture

    • Yesterday was the 46th anniversary of the first landing on the moon. I remember sitting in my dad’s apartment, watching it on a tiny black-and-white TV. Here’s a story of someone the same age as I, who was watching it while I was, and how it affected his life.
    • According to this story in Salon, instant rāmen noodles are an environmental disaster. Instant rāmen noodles are fried to creates “holes” in the noodles that let them cook in three minutes; the frying is done with palm oil. Lots of palm oil. So much palm oil that it’s destroying entire habitats and endangering orangutan populations in Borneo and Sumatra.
    • Astoundingly, it’s being reported that Google’s targeted advertisement algorithms are doing things like showing higher-paying jobs to men than they do to women. Further detail at the MIT Technology Review.
    • Robert W. Gibson, who wrote a number of issues of “Captain Harlock”, has passed away at age 55.

    Other Stuff (and #WhiskeyTangoFoxtrot!?)

    • Ever wondered how you’d portray a mute on stage, or write a blind character in a story, or any number of things? This wonderful reference site is an objet trouvé from the wonderful Mordant Carnival.
    • And from the “Tech Geniuses of Silicon Valley” desk, a “wiccan witch” (?) has apparently gotten a business going protecting company’s computers and networks through sorcery.

    The Internet is Broken: An Introduction

    This is a departure from my usual content.

    I’m getting off Twitter and Facebook. I want to talk about why. It’s going to take a while, I’m afraid, but I’ll be doing it in, not easily-digestible perhaps, but step-by-step chunks.

    I’ve felt for a long time that there were some real problems with what’s happening with the Internet, and they manifest most obviously in “social media”—which I believe, thanks to a combination of misdesign, mistaken beliefs, and the active exploitation of those factors by, roughly, the Worst People In The Entire World, is actually “anti-social media” more and more frequently.

    I refer you to Reddit, where the female CEO was just thrown under a bus for a decision that was forced on her, and in terms that were misogynistic, racist and vile beyond belief—the featured image (depicting a Chinese-American woman in what looks like a North Korean uniform in front of a World War 2 Japanese battle flag) is a very mild example. In its efforts to reform itself, the new CEO has come up with a scheme whereby Nazis, racists, and trolls of all persuasions will be provided with an ad-free, subsidized platform, a “basement”, if you will.

    As Chuq von Rospach has observed, running an online community is like running a sports bar. If a pile of bikers come in and start getting rowdy in the basement, you’ve got two basic choices before you: you can either toss the bikers out, or you’re running a biker bar.

    I’ve written about data-mining the GamerGate “controversy”, and I have some distinct opinions about it. Whatever this “movement” or “hashtag” or “consumer revolt” or “independent individuals” claims to be about, it’s associated with a lot of harassment, and in fact, that’s documented in a study by Women, Action, and the Media.

    The study collected a number of harassment reports, and among other things, checked the allegedly harassing IDs to see if they happened to be on a “GamerGate blocklist” created by Randi Harper. The blocklist is totally simple-minded: if an ID follows more than two of a small set of IDs involved in GamerGate and known to have harassed people—one example was Slade Villena, who was banned from Twitter for making death threats (but who turned around and immediately set up a new ID, despite this)—you’re on the blocklist.

    (This is only one possible approach, but it seems to work surprisingly well. A different approach to generating a blocklist is described here, with some Python to fool around with.)

    Despite this, close to one in eight, 12%, of the harassment reports in the study were linked to IDs on the blocklist. Why is this significant? As I’ve said, the blocklist is a very broad brush. GamerGate, en masse, can’t possibly represent more than 0.01%, one-ten-thousandth of the population of Twitter. The biggest estimate I’ve seen from anyone involved was a frankly absurd 170,000. That’s one-one-hundred-thousandth, 0.001%, of Twitter’s active user population of about a quarter-billion users. But let’s go with an estimate of 1.7 million GamerGaters on Twitter.

    Let’s further assume that the WAM! report overstates the involvement of GamerGate in harassment in general by an order of magnitude. That would mean that GamerGate was involved in, not twelve percent, but only 1% of overall harassment reports.

    This winds us up in a situation where one user in ten thousand is generating one harassment incident out of every hundred. That is two orders of magnitude out-of-kilter, one hundred times as many reports as one would otherwise suspect, all other things being equal. And that’s giving GamerGate the benefit of significant doubt, both in their numbers and their involvement.

    amplification

    (For contrast, the worst case, taking the unrealistic-but-smaller 170,000 estimate of GamerGate’s size and sticking with the 12% incidence of involvement, you have 1/100,000th of the user base generating a staggering almost an eighth of the harassment reports, meaning they’re creating ten-thousand times as many harassment reports as you’d expect.)

    Houston, we have a big problem here. The WAM! study understates their findings here by noting that 88% of the reports they studied were not associated with GamerGate—a factoid you’ll hear GamerGaters happily quoting—but not noting the disparity between likely population size and likelihood to be tied to a report of harassment. You can download a copy of the full WAM! study from here.

    The Internet is broken, the trolls are in control, and GamerGate is only the most recent manifestation. The facts pretty much make it clear that, rather than having anything to do with ethics, journalistic or otherwise, GamerGate is a Chan “op” that was simply a lot more viral and successful than their previous attempts, and it built on their successful “#EndFathersDay” troll.

    So, back to where we started: I’m getting off Twitter because GamerGate finally decided to make the harassment personal. I’ll get to the specifics of that in the next installment.

    WordPress Security Tip: NEVER Use Your Admin ID Visibly

    WordPress is the most-hacked platform out there, and the most common attempt at hacking it involves attempting to get access to your administrative ID. For this reason you should

    • NEVER use an ID like “admin”, “administrator”, or the name of your domain for your admin ID — Pick an admin ID that will not be guessed!
    • NEVER use your admin ID to make postings or comments on your WordPress blog
    • NEVER publish a posting unless you’ve checked a preview to be sure you’re not exposing anything you don’t want to. Save drafts, use preview.

    Here are some simple better practices to keep your admin ID safe:

    • Only use your admin ID for administrative tasks that actually require it.

      This includes things like updating themes and plugins — which you should be doing via SFTP — and so forth.

    • Create a second “public” persona and give it “author” or “editor” privileges only. Be sure to use this ID for anything that will be visible on your site.

      On this site, “Zanzibar McFate” has no administrative privileges. (He has a 30-character, randomly-generated password, in spite of that.)

    Tip 1: Use the “Author” Pop-up Menu on the “Edit Post” page

    If you click on the “Screen Options” button toward the top-righthand corner of the “Edit Post” page, it will drop down and present you with a number of things you can show or hide. Check the box next to “Author”.

    FirefoxDeveloperEditionScreenSnapz084

    Now, there will be a pop-up menu section below the editing panel on the edit post page which will let you select your “public persona” as the post’s author instead of your admin ID.

    FirefoxDeveloperEditionScreenSnapz085

    Tip 2: Use the WP Masquerade Plugin

    Comments are a bit tougher to manage, since they’re published immediately. You can log out of your administrative ID and log in as your persona, but fortunately, there’s an easier way.

    A nice tool to facilitate better operational security here is the WP Masquerade plugin. I can verify that it works with my WP 4.2.2 install. Download the plugin and activate it. Now, when you go to your user list as an administrator, there will be a hover-visible link beneath every user saying “Masquerade”, as shown.

    FirefoxDeveloperEditionScreenSnapz083

    Click on that link, and you’ll be effectively logged in as that user, and any comments you make will be posted under that ID. To close the Masquerade session, click on the link in the banner at the bottom of the page.

    FirefoxDeveloperEditionScreenSnapz088

    Alternatively, you can use the “Masquerade as…” drop-down which the plugin adds to the Admin toolbar.

    FirefoxDeveloperEditionScreenSnapz087

    Summary

    Your WordPress administrative ID is — assuming you’ve installed and secured your WordPress correctly — the weakest link in your security chain. If you expose it publicly, you’ve greatly increased the susceptibility of your site to being hacked. Practice good opsec, and use the tools available to keep your admin ID under wraps.

    Data-mining Twitter for GamerGate—Visualization

    In the previous posting, I went over how to connect to Twitter’s streaming API using a connector app and the Tweepy Python library, as well as a quick overview of how to construct a Pandas dataframe from the tweets we’ve collected.

    In this posting, we’ll extract all of the information we’ll need to use NetworkX to create a directed graph that we can visualize in Gephi of who’s retweeting whom, keeping track of the age in days and the number of followers that each user has so we can filter on those factors if we like.

    First, if you don’t have NetworkX, install it with pip, and download and install Gephi.

    Again, we’ll assume that our tweets are collected in a text file, “gamergate.txt”. Let’s pull the data out of the text file into a new data frame.

    import json
    import re
    import pandas as pd
    from time import gmtime, mktime, strptime
    
    tweets_data = []
    tweets_file = open(tweets_data_path, "r")
    for line in tweets_file:
        try:
            tweet = json.loads(line)
            tweets_data.append(tweet)
        except:
            continue
    #
    # Clean out limit messages, etc.
    #
    for tweet in tweets_data:
        try:
            user = tweet['user']
        except:
            tweets_data.remove(tweet)
    
    for tweet in tweets_data:
        try:
            user = tweet['text']
        except:
            tweets_data.remove(tweet)
    
    #
    # See how many we wound up with
    #
    print len(tweets_data)
    
    #
    # Pull the data we're interested in out of the Twitter data we captured
    #
    rows_list = []
    now = mktime(gmtime())
    for tweet in tweets_data:
        author = ""
        rtauthor = ""
        age = rtage = followers = rtfollowers = 0
    #
    # If it was a retweet, get both the original author and the retweeter, save the original author's
    # follower count and age
    #
        try:
            author = tweet['user']['screen_name']
            rtauthor = tweet['retweeted_status']['user']['screen_name']
            rtage = int(now - mktime(strptime(tweet['retweeted_status']['user']['created_at'], "%a %b %d %H:%M:%S +0000 %Y")))/(60*60*24)
            rtfollowers = tweet['retweeted_status']['user']['followers_count']
        except:
    #
    # Otherwise, just get the original author
    #
            try:
                author = tweet['user']['screen_name']
            except:
                continue
    #
    # If this was a reply, save the screen name being replied to
    #
        reply_to = ""
        if (tweet['in_reply_to_screen_name'] != None):
            reply_to = tweet['in_reply_to_screen_name']
    #
    # Calculate the age, in days, of this Twitter ID
    #
        age = int(now - mktime(strptime(tweet['user']['created_at'], "%a %b %d %H:%M:%S +0000 %Y")))/(60*60*24)
    #
    # Grab this ID's follower count and the text of the tweet
    #
        followers = tweet['user']['followers_count']
        text = tweet['text']
        dict1 = {}
    #
    # Construct a row, add it to our list
    #
        dict1.update({'author': author, 'reply_to': reply_to, 'age': age, 'followers': followers, 'retweet_of': rtauthor, 'rtfollowers': rtfollowers, 'rtage': rtage, 'text': text})
        rows_list.append(dict1)
    
    #
    # When we've processed all the tweets, build the DataFrame from the rows
    # we've collected
    #
    tweets = pd.DataFrame(rows_list)
    

    Here’s a script that will iterate through the dataframe, row by row, and construct a directed graph of who’s retweeting whom. Each directed edge represented the relationship “is retweeted by”, the higher the weight of an edge, the more person B is getting retweeted by person A. Each node represents an individual ID on Twitter, and has attributes to track the number of followers and the age of the ID in days.

    import networkx as nx
    
    #
    # Create a new directed graph
    #
    J = nx.DiGraph()
    #
    # Iterate through the rows of our dataframe
    #
    for index, row in tweets.iterrows():
    #
    # Gather the data out of the row
    #
        this_user_id = row['author']
        author = row['retweet_of']
        followers = row['followers']
        age = row['age']
        rtfollowers = row['rtfollowers']
        rtage = row['rtage']
    #
    # Is the sender of this tweet in our network?
    #
        if not this_user_id in J:
            J.add_node(this_user_id, attr_dict={
                    'followers': row['followers'],
                    'age': row['age'],
                })
    #
    # If this is a retweet, is the original author a node?
    #
        if author != "" and not author in J:
            J.add_node(author, attr_dict={
                    'followers': row['rtfollowers'],
                    'age': row['rtage'],
                })
    #
    # If this is a retweet, add an edge between the two nodes.
    #
        if author != "":
            if J.has_edge(author, this_user_id):
                J[author][this_user_id]['weight'] += 1
            else:
                J.add_weighted_edges_from([(author, this_user_id, 1.0)])
    
    nx.write_gexf(J, 'ggrtages.gexf')

    The last thing we did was to save out a GEFX file we can then read into Gephi. Start Gephi up, and open our file; we called ours “ggrtages.gexf”.

    gephiScreenSnapz013

    You’ll get a dialog telling you how many nodes and edges there are in the graph, whether it’s directed or not, and other information, warnings, etc. Click “OK”.

    gephiScreenSnapz014

    Gephi will import the GEFX file. You can now look at the information it contains by clicking on the “Data Laboratory” button at the top.

    gephiScreenSnapz015

    Click on the “Overview” button to start working with the network. At first, it doesn’t look like anything, since we haven’t actually run a visualization on it. Before we do, we can use some of the node attributes to color nodes a darker blue based on their age.

    gephiScreenSnapz016

    We can use the “Ranking” settings to color our nodes. Click on the “Select attribute” popup, and choose “age”.

    gephiScreenSnapz017

    You can choose difference color schemes, change the spline curve used to apply color, etc., from here as well.

    gephiScreenSnapz018

    Click on the “Apply” button to apply the ranking to the network. The nodes will now be colored rather than gray.

    gephiScreenSnapz019

    Now, we’re ready to run a visualization on our data. From the “Layout” section, let’s choose “ForceAtlas 2″—it’s fast and good at showing relationships in a network.

    gephiScreenSnapz020

    Press the “Run” button, and let it go for a bit. A network this size—about 10K nodes and 30K edges—settled down on my MacBook Pro within five minutes or less. When you feel it’s stabilized into something interesting, press the “Stop” button, and then click on the “Preview” button at the top.

    gephiScreenSnapz022

    The preview panel won’t show anything at first. Click the “Refresh” button.

    gephiScreenSnapz023

    Gephi will render your visualization. You can use the mouse to drag it around, and you can zoom in and out with a scroll-wheel or with the “+” and “-” buttons below.

    gephiScreenSnapz024

    gephiScreenSnapz025

    gephiScreenSnapz026

    Mining Twitter for #GamerGate: A How-To

    I’ve gotten interested in the #GamerGate “controversy”—I’m pretty completely persuaded that any talk about “ethics” is a façade for a lot of reactionary nonsense, as well as abundant harassment and misogny—and it occurred to me that it represented an interesting data set to mine using Python. This is a quick guide for how to get started, but it could be adapted to any effort to datamine Twitter.

    Setting Up to Connect to Twitter

    First, you’re going to need to set up a Twitter app that you can use for authentication. You can do this at apps.twitter.com/app/new. You’ll need to have a valid Twitter account with an authenticated phone number.

    Enter a name, description and web site URL for your application. You won’t need a callback URL.

    FirefoxDeveloperEditionScreenSnapz068

    Check “Yes, I agree” at the bottom of the Developer Agreement, and click the “Create your Twitter application” button.

    FirefoxDeveloperEditionScreenSnapz069

    Your application will be created. To use Tweepy to capture tweets, we’ll need the Consumer Key and Consumer Secret, and we’ll also need to set up an access token. Click on the “manage keys and access tokens” link next to your “Consumer Key (API Key)” in the “Application Settings” section.

    FirefoxDeveloperEditionScreenSnapz070

    This will take you to the “Keys and Access Tokens” tab. Note your “Consumer Key” and “Consumer Secret” (greyed out here).

    FirefoxDeveloperEditionScreenSnapz071

    In the “Your Access Token” section at the bottom of the page, click on “Create my access token”.

    FirefoxDeveloperEditionScreenSnapz072

    An “Access Token” and an “Access Token Secret” — again, greyed out here — will be generated, you’ll need these as well.

    FirefoxDeveloperEditionScreenSnapz073

    Install the Python Prerequisites

    For this project, we’re going to need the Tweepy, Pandas, and matplotlib libraries

    pip install tweepy pandas matplotlib

    Here’s a simple-minded Python script using Tweepy to collect tweets mentioning “gamergate” from the Twitter streaming API:

    from tweepy.streaming import StreamListener
    from tweepy import OAuthHandler
    from tweepy import Stream
    
    access_token = "YOUR ACCESS TOKEN GOES HERE"
    access_token_secret = "YOUR ACCESS TOKEN SECRET GOES HERE"
    consumer_key ="YOUR CONSUMER KEY GOES HERE"
    consumer_secret = "YOUR CONSUMER KEY SECRET GOES HERE"
    
    class StdOutListener(StreamListener):
    
        def on_data(self, data):
            print data
            return True
    
        def on_error(self, status):
            print status
    
    if __name__ == '__main__':
    
        listener = StdOutListener()
        auth_handler = OAuthHandler(consumer_key, consumer_secret)
        auth_handler.set_access_token(access_token, access_token_secret)
        stream = Stream(auth_handler, listener)
    
        stream.filter(track=['gamergate'])
    

    UPDATE

    The script, as it stands, times out on a read every once in a while, so there’s a minor improvement to be had here by embedding the collection in a while loop with a try and an except to keep it from crashing back to the shell prompt occasionally:

        while True:
            try:
                stream.filter(track=['gamergate'])
            except:
                continue
    

    All this script does is print out every tweet which is captured by Tweepy, in JSON format. If you run it, the output will look something like this — this is a single tweet in JSON notation:

    {u'contributors': None, u'truncated': False, u'text': u'RT @CommissarOfGG: Anti taking pride that nobody can tell the difference between them and someone pretending to be retarded.\n\n#GamerGate ht\u2026', 'retweet': True, u'in_reply_to_status_id': None, u'id': 584828601125773314, u'favorite_count': 0, u'source': u'<a href="http://twitter.com" rel="nofollow">Twitter Web Client</a>', u'retweeted': False, u'coordinates': None, u'timestamp_ms': u'1428268978755', u'entities': {u'symbols': [], u'media': [{u'source_status_id_str': u'584828243808661504', u'expanded_url': u'http://twitter.com/CommissarOfGG/status/584828243808661504/photo/1', u'display_url': u'pic.twitter.com/CS3Kb2Bkcm', u'url': u'http://t.co/CS3Kb2Bkcm', u'media_url_https': u'https://pbs.twimg.com/media/CB26L3HWAAIVSKF.png', u'source_status_id': 584828243808661504, u'id_str': u'584828239564111874', u'sizes': {u'small': {u'h': 351, u'resize': u'fit', u'w': 340}, u'large': {u'h': 607, u'resize': u'fit', u'w': 587}, u'medium': {u'h': 607, u'resize': u'fit', u'w': 587}, u'thumb': {u'h': 150, u'resize': u'crop', u'w': 150}}, u'indices': [139, 140], u'type': u'photo', u'id': 584828239564111874, u'media_url': u'http://pbs.twimg.com/media/CB26L3HWAAIVSKF.png'}], u'hashtags': [{u'indices': [126, 136], u'text': u'GamerGate'}], u'user_mentions': [{u'id': 2729513808, u'indices': [3, 17], u'id_str': u'2729513808', u'screen_name': u'CommissarOfGG', u'name': u'Comrade Commissar'}], u'trends': [], u'urls': []}, u'in_reply_to_screen_name': None, u'id_str': u'584828601125773314', u'retweet_count': 0, u'in_reply_to_user_id': None, u'favorited': False, u'retweeted_status': {u'contributors': None, u'truncated': False, u'text': u'Anti taking pride that nobody can tell the difference between them and someone pretending to be retarded.\n\n#GamerGate http://t.co/CS3Kb2Bkcm', u'in_reply_to_status_id': None, u'id': 584828243808661504, u'favorite_count': 2, u'source': u'<a href="http://twitter.com" rel="nofollow">Twitter Web Client</a>', u'retweeted': False, u'coordinates': None, u'entities': {u'symbols': [], u'media': [{u'expanded_url': u'http://twitter.com/CommissarOfGG/status/584828243808661504/photo/1', u'display_url': u'pic.twitter.com/CS3Kb2Bkcm', u'url': u'http://t.co/CS3Kb2Bkcm', u'media_url_https': u'https://pbs.twimg.com/media/CB26L3HWAAIVSKF.png', u'id_str': u'584828239564111874', u'sizes': {u'small': {u'h': 351, u'resize': u'fit', u'w': 340}, u'large': {u'h': 607, u'resize': u'fit', u'w': 587}, u'medium': {u'h': 607, u'resize': u'fit', u'w': 587}, u'thumb': {u'h': 150, u'resize': u'crop', u'w': 150}}, u'indices': [118, 140], u'type': u'photo', u'id': 584828239564111874, u'media_url': u'http://pbs.twimg.com/media/CB26L3HWAAIVSKF.png'}], u'hashtags': [{u'indices': [107, 117], u'text': u'GamerGate'}], u'user_mentions': [], u'trends': [], u'urls': []}, u'in_reply_to_screen_name': None, u'id_str': u'584828243808661504', u'retweet_count': 4, u'in_reply_to_user_id': None, u'favorited': False, u'user': {u'follow_request_sent': None, u'profile_use_background_image': False, u'default_profile_image': False, u'id': 2729513808, u'verified': False, u'profile_image_url_https': u'https://pbs.twimg.com/profile_images/533572934799876096/DYR05LI4_normal.png', u'profile_sidebar_fill_color': u'000000', u'profile_text_color': u'000000', u'followers_count': 2047, u'profile_sidebar_border_color': u'000000', u'id_str': u'2729513808', u'profile_background_color': u'000000', u'listed_count': 26, u'profile_background_image_url_https': u'https://abs.twimg.com/images/themes/theme1/bg.png', u'utc_offset': -14400, u'statuses_count': 4940, u'description': u'#GamerGate #OpSKYNET', u'friends_count': 1584, u'location': u'Moscow', u'profile_link_color': u'DD2E44', u'profile_image_url': u'http://pbs.twimg.com/profile_images/533572934799876096/DYR05LI4_normal.png', u'following': None, u'geo_enabled': False, u'profile_banner_url': u'https://pbs.twimg.com/profile_banners/2729513808/1407939361', u'profile_background_image_url': u'http://abs.twimg.com/images/themes/theme1/bg.png', u'name': u'Comrade Commissar', u'lang': u'en', u'profile_background_tile': False, u'favourites_count': 1980, u'screen_name': u'CommissarOfGG', u'notifications': None, u'url': u'http://www.facebook.com/commissarofgamergate', u'created_at': u'Wed Aug 13 14:10:24 +0000 2014', u'contributors_enabled': False, u'time_zone': u'Eastern Time (US & Canada)', u'protected': False, u'default_profile': False, u'is_translator': False}, u'geo': None, u'in_reply_to_user_id_str': None, u'possibly_sensitive': False, u'lang': u'en', u'created_at': u'Sun Apr 05 21:21:33 +0000 2015', u'filter_level': u'low', u'in_reply_to_status_id_str': None, u'place': None, u'extended_entities': {u'media': [{u'expanded_url': u'http://twitter.com/CommissarOfGG/status/584828243808661504/photo/1', u'display_url': u'pic.twitter.com/CS3Kb2Bkcm', u'url': u'http://t.co/CS3Kb2Bkcm', u'media_url_https': u'https://pbs.twimg.com/media/CB26L3HWAAIVSKF.png', u'id_str': u'584828239564111874', u'sizes': {u'small': {u'h': 351, u'resize': u'fit', u'w': 340}, u'large': {u'h': 607, u'resize': u'fit', u'w': 587}, u'medium': {u'h': 607, u'resize': u'fit', u'w': 587}, u'thumb': {u'h': 150, u'resize': u'crop', u'w': 150}}, u'indices': [118, 140], u'type': u'photo', u'id': 584828239564111874, u'media_url': u'http://pbs.twimg.com/media/CB26L3HWAAIVSKF.png'}]}}, u'user': {u'follow_request_sent': None, u'profile_use_background_image': False, u'default_profile_image': False, u'id': 2784597626, u'verified': False, u'profile_image_url_https': u'https://pbs.twimg.com/profile_images/532401111823822848/KSIxqiLe_normal.jpeg', u'profile_sidebar_fill_color': u'000000', u'profile_text_color': u'000000', u'followers_count': 986, u'profile_sidebar_border_color': u'000000', u'id_str': u'2784597626', u'profile_background_color': u'000000', u'listed_count': 35, u'profile_background_image_url_https': u'https://abs.twimg.com/images/themes/theme1/bg.png', u'utc_offset': -18000, u'statuses_count': 25217, u'description': u"I wasn't born with enough middle fingers for perpetually outraged hipster douchebags compensating for their mediocrity with shelves of participation trophies.", u'friends_count': 785, u'location': u'Parts Unknown', u'profile_link_color': u'4A913C', u'profile_image_url': u'http://pbs.twimg.com/profile_images/532401111823822848/KSIxqiLe_normal.jpeg', u'following': None, u'geo_enabled': False, u'profile_banner_url': u'https://pbs.twimg.com/profile_banners/2784597626/1425335831', u'profile_background_image_url': u'http://abs.twimg.com/images/themes/theme1/bg.png', u'name': u'Unnecessary Robness', u'lang': u'en', u'profile_background_tile': False, u'favourites_count': 15912, u'screen_name': u'aDouScheiBler', u'notifications': None, u'url': None, u'created_at': u'Mon Sep 01 19:36:40 +0000 2014', u'contributors_enabled': False, u'time_zone': u'Central Time (US & Canada)', u'protected': False, u'default_profile': False, u'is_translator': False}, u'geo': None, u'in_reply_to_user_id_str': None, u'possibly_sensitive': False, u'lang': u'en', u'created_at': u'Sun Apr 05 21:22:58 +0000 2015', u'filter_level': u'low', u'in_reply_to_status_id_str': None, u'place': None, u'extended_entities': {u'media': [{u'source_status_id_str': u'584828243808661504', u'expanded_url': u'http://twitter.com/CommissarOfGG/status/584828243808661504/photo/1', u'display_url': u'pic.twitter.com/CS3Kb2Bkcm', u'url': u'http://t.co/CS3Kb2Bkcm', u'media_url_https': u'https://pbs.twimg.com/media/CB26L3HWAAIVSKF.png', u'source_status_id': 584828243808661504, u'id_str': u'584828239564111874', u'sizes': {u'small': {u'h': 351, u'resize': u'fit', u'w': 340}, u'large': {u'h': 607, u'resize': u'fit', u'w': 587}, u'medium': {u'h': 607, u'resize': u'fit', u'w': 587}, u'thumb': {u'h': 150, u'resize': u'crop', u'w': 150}}, u'indices': [139, 140], u'type': u'photo', u'id': 584828239564111874, u'media_url': u'http://pbs.twimg.com/media/CB26L3HWAAIVSKF.png'}]}}

    Set a terminal running the script above for as long as you like. I left mine going for 42 hours, and collected about 65000 tweets in a text file about 300MB long.

    python tweetminer.py >> gamergate.txt

    When you’ve collected your data, here’s some Python to set up a sample pandas DataFrame containing information of interest: who tweeted, how many days old their account is, how many followers they have, who it was a retweet of (if it was one) and to whom it was a reply (if it was one).

    That should give you plenty of grist for analysis.

    import json
    import pandas as pd
    import matplotlib.pyplot as plt
    from time import gmtime, mktime, strptime
    
    tweets_data_path = 'gamergate.txt'
    
    tweets_data = []
    tweets_file = open(tweets_data_path, "r")
    for line in tweets_file:
        try:
            tweet = json.loads(line)
            tweets_data.append(tweet)
        except:
            continue
    #
    # Clean out limit messages, etc.
    #
    for tweet in tweets_data:
        try:
            user = tweet['user']
        except:
            tweets_data.remove(tweet)
    
    print len(tweets_data)
    
    #
    # Pull the data we're interested in out of the Twitter data we captured
    #
    rows_list = []
    now=mktime(gmtime())
    for tweet in tweets_data:
        author = ""
        rtauthor = ""
    #
    # If it was a retweet, get both the original author and the retweeter
    #
        try:
            author = tweet['user']['screen_name']
            rtauthor = tweet['retweeted_status']['user']['screen_name']
        except:
    #
    # Otherwise, just get the original author
    #
            try:
                author = tweet['user']['screen_name']
            except:
                continue
    
        reply_to = ""
        if (tweet['in_reply_to_screen_name'] != None):
            reply_to = tweet['in_reply_to_screen_name']
        
        age = int(now - mktime(strptime(tweet['user']['created_at'], "%a %b %d %H:%M:%S +0000 %Y"))/(60*60*24))
        followers = tweet['user']['followers_count']
        dict1 = {}
        dict1.update({'author': author, 'retweet_of': rtauthor, 'reply_to': reply_to, 'age': age, 'followers': followers})
        rows_list.append(dict1)
    
    tweets = pd.DataFrame(rows_list)
    

    The resulting DataFrame will look something like this—note that rows 0-4 are retweets, and row 6 is a reply; “age” is days since the Twitter ID was created:

            age           author  followers     reply_to       retweet_of
    0       137      Maskgamer64        428                  CultOfVivian
    1       231   Smackfacemcgee       1304                  Daddy_Warpig
    2      2240      LenFirewood       1658                   RSG_VILLENA
    3       171     8bitsofsound        650                 CommissarOfGG
    4       102    devilstwosome          9                   atlasnodded
    5        24       tophatdril         34                              
    6        11   TheRalphRetart         63     Dr_Louse                 
    7...    ...              ...        ...          ...              ...
    64531    65     4EverPlayer2        614                        mombot
    64532   143  EnwroughtDreams        222                thewtfmagazine
    64533  1996          _icze4r      22689                       dauthaz
    64534  1581  __DavidFlanagan       8315                   Spacekatgal
    64535   872         jtdg_b8z        621               GamingAndPandas
    64536  2238        hanytimeh        914                thewtfmagazine
    

    At this point you could easily find out the most-retweeted IDs in the DataFrame, for example:

    In [146]: tweets['retweet_of'].value_counts()
    Out[146]: 
                       17974
    Sargon_of_Akkad     1574
    ItalyGG             1516
    TheRalphRetort      1064
    Blaugast             910
    mylittlepwnies3      899
    thewtfmagazine       823
    Nero                 721
    srhbutts             706
    Daddy_Warpig         705
    randomfox            627
    atlasnodded          592
    full_mcintosh        586
    whenindoubtdo        584
    ToKnowIsToBe         569
    ...
    

    Check out the follow-on posting to see how to use NetworkX and Gephi to make visualizations of the data.

    Got Music on Your iPod You Need to Get Back Into iTunes? Here’s How.

    I set up a new MacBook Pro several months back, but in moving things around, I managed to misplace some music. Most of my stuff is on a 160GB iPod Classic, and I wanted to synchronize the contents of the iPod with my iTunes library, recovering any tracks that were on the iPod but not in the library.

    Out of the box, Apple doesn’t make this especially easy for you. Syncing within iTunes is strictly a one-way affair: what’s in your library replaces what’s on the iPod, precisely the opposite of what I want. Mounting the iPod on the desktop doesn’t help — you only get access to a shared file area, and the music is organized in a way calculated to keep you from figuring out where anything is.

    Fortunately, there’s an easy solution, but it takes a few pieces to put together. This presumes that you’ve got your music in one folder on your computer, and you’re letting iTunes organize it for you (which it will do by artist).

    First, you’re going to need FUSE for OS X, which allows you to create user-space file systems on OS X. Download the .dmg file — current version is 2.7.4 — and install it.

    That sets you up to install iTunesFS. Download the iTunesFS 1.3.4 installer disk image from the downloads page, mount the .dmg and drag the iTunesFS app to your Applications folder.

    Connect your iPod to an available USB port on your Mac, but don’t launch iTunes. Instead, launch the iTunesFS application. An “iTunesFS” icon will appear on your desktop, and opening it will reveal several folders, “Albums”, “Artists”, etc. You want the “Artists” folder.

    Open another window and browse to your iTunes library folder — you can find out where this is under the “Advanced” tab in iTunes’ Preferences. You want the “Music” folder. (You’ll see why we want these files open in a moment.)

    Now, open a terminal window. We’re going to use the rsync command to copy any files that exist in the “Artists” folder on the “iTunesFS” volume — a synthetic folder constructed by iTunesFS which mirrors the organization of the “Music” folder in iTunes’ library folder. Type the beginning of the command:

    rsync -av --dry-run

    Leave a trailing space at the end. Now, go to the “iTunesFS” window in Finder and drag the “Artists” folder on top of the Terminal window, and “drop” it there. The Terminal app will fill in the path to the folder you dropped, and the command will now look something like this:

    rsync -av --dry-run /Volumes/iTunesFS/Artists

    Add a trailing slash and a space at the end, so it looks like this:

    rsync -av --dry-run /Volumes/iTunesFS/Artists/

    Now, go back to Finder, and from the window that’s open to your iTunes library folder, drag and drop the “Music” folder onto the Terminal window. The command should now look something like this (it will vary depending on the names of your volumes, etc., obviously):

    rsync -av --dry-run /Volumes/iTunesFS/Artists/ /Volumes/Vibranium/iTunes/Music

    The --dry-run flag tells rsync to just simulate moving the files, this will give you a chance to check that things look reasonable first. The -av flags tell rsync to use “archive” mode, which means it will walk down the directories within the folder you specify recursively, and to provide “verbose” output. Execute the command, and it will provide a list of all the directories the command would create and the files it would move. If you’re satisfied with what you see (and you probably actually want to review it!), use the up-arrow to bring back your last command, edit out the --dry-run flag, and execute it again.

    Pretty easy.