Tuesday Linkfest

What I’m hearing from people is that they’re going to miss the raft of links I typically would post on Facebook about the news and other stories that had caught my interest. Here’s a first stab at providing something as a replacement.

Discussion in the comments. (Comments are moderated, get over it.)

Security

  • AshleyMadison.com, a site which purports to enable one to have a “discreet affair”, got hacked and badly. As many as 37 million users are at risk of having their personal data—including ther “kinky sexual fantasies” posted online. A group calling itself “Impact Team” is taking responsibility, and it all seems to be over an issue of not scrubbing data after hitting up customers for $19 to do so.
    What astounds me here is the moralizing in the comments. A site with your medical or financial records, or lists of purchases you’ve made that you might not want your boss to know about, can be hacked just as easily as a cheating-on-your-wife site you despise. The blue-nosed “YOU DESERVE THIS!” tone of the attackers’ taunts suggests to me that Chansters are behind this somewhere. They’re all about ethics, you know.
  • From the “Stop Me If You’ve Hurt This One Before” desk, Android is again being called out as a security risk owing to the tremendous amount of fragmentation in the platform. This time, security researchers have written a paper on it (link in the article).
  • After discovering that one of the zero-day exploits they sold to hacking team had in turn been sold to human-rights-violating regimes, Netagard has shut down its controversial Exploit Acquisition Program. Again.
  • Completely aside from the potential impact on jobs and the economy, there’s a side of those “self-driving cars” that’s not getting discussed in all the excitement. Here’s a story relating a demonstration of hacking a Jeep on the highway, and taking it over. We’ve already seen drones spoofed with fake GPS; while robot cars may well reduce accidental traffic fatalities, I’d bet any amount of money we’ll see some deliberate ones. You know: for the “lulz”.

Technology

Society & Culture

  • Yesterday was the 46th anniversary of the first landing on the moon. I remember sitting in my dad’s apartment, watching it on a tiny black-and-white TV. Here’s a story of someone the same age as I, who was watching it while I was, and how it affected his life.
  • According to this story in Salon, instant rāmen noodles are an environmental disaster. Instant rāmen noodles are fried to creates “holes” in the noodles that let them cook in three minutes; the frying is done with palm oil. Lots of palm oil. So much palm oil that it’s destroying entire habitats and endangering orangutan populations in Borneo and Sumatra.
  • Astoundingly, it’s being reported that Google’s targeted advertisement algorithms are doing things like showing higher-paying jobs to men than they do to women. Further detail at the MIT Technology Review.
  • Robert W. Gibson, who wrote a number of issues of “Captain Harlock”, has passed away at age 55.

Other Stuff (and #WhiskeyTangoFoxtrot!?)

  • Ever wondered how you’d portray a mute on stage, or write a blind character in a story, or any number of things? This wonderful reference site is an objet trouvé from the wonderful Mordant Carnival.
  • And from the “Tech Geniuses of Silicon Valley” desk, a “wiccan witch” (?) has apparently gotten a business going protecting company’s computers and networks through sorcery.

WordPress Security Tip: NEVER Use Your Admin ID Visibly

WordPress is the most-hacked platform out there, and the most common attempt at hacking it involves attempting to get access to your administrative ID. For this reason you should

  • NEVER use an ID like “admin”, “administrator”, or the name of your domain for your admin ID — Pick an admin ID that will not be guessed!
  • NEVER use your admin ID to make postings or comments on your WordPress blog
  • NEVER publish a posting unless you’ve checked a preview to be sure you’re not exposing anything you don’t want to. Save drafts, use preview.

Here are some simple better practices to keep your admin ID safe:

  • Only use your admin ID for administrative tasks that actually require it.

    This includes things like updating themes and plugins — which you should be doing via SFTP — and so forth.

  • Create a second “public” persona and give it “author” or “editor” privileges only. Be sure to use this ID for anything that will be visible on your site.

    On this site, “Zanzibar McFate” has no administrative privileges. (He has a 30-character, randomly-generated password, in spite of that.)

Tip 1: Use the “Author” Pop-up Menu on the “Edit Post” page

If you click on the “Screen Options” button toward the top-righthand corner of the “Edit Post” page, it will drop down and present you with a number of things you can show or hide. Check the box next to “Author”.

FirefoxDeveloperEditionScreenSnapz084

Now, there will be a pop-up menu section below the editing panel on the edit post page which will let you select your “public persona” as the post’s author instead of your admin ID.

FirefoxDeveloperEditionScreenSnapz085

Tip 2: Use the WP Masquerade Plugin

Comments are a bit tougher to manage, since they’re published immediately. You can log out of your administrative ID and log in as your persona, but fortunately, there’s an easier way.

A nice tool to facilitate better operational security here is the WP Masquerade plugin. I can verify that it works with my WP 4.2.2 install. Download the plugin and activate it. Now, when you go to your user list as an administrator, there will be a hover-visible link beneath every user saying “Masquerade”, as shown.

FirefoxDeveloperEditionScreenSnapz083

Click on that link, and you’ll be effectively logged in as that user, and any comments you make will be posted under that ID. To close the Masquerade session, click on the link in the banner at the bottom of the page.

FirefoxDeveloperEditionScreenSnapz088

Alternatively, you can use the “Masquerade as…” drop-down which the plugin adds to the Admin toolbar.

FirefoxDeveloperEditionScreenSnapz087

Summary

Your WordPress administrative ID is — assuming you’ve installed and secured your WordPress correctly — the weakest link in your security chain. If you expose it publicly, you’ve greatly increased the susceptibility of your site to being hacked. Practice good opsec, and use the tools available to keep your admin ID under wraps.

Data-mining Twitter for GamerGate—Visualization

In the previous posting, I went over how to connect to Twitter’s streaming API using a connector app and the Tweepy Python library, as well as a quick overview of how to construct a Pandas dataframe from the tweets we’ve collected.

In this posting, we’ll extract all of the information we’ll need to use NetworkX to create a directed graph that we can visualize in Gephi of who’s retweeting whom, keeping track of the age in days and the number of followers that each user has so we can filter on those factors if we like.

First, if you don’t have NetworkX, install it with pip, and download and install Gephi.

Again, we’ll assume that our tweets are collected in a text file, “gamergate.txt”. Let’s pull the data out of the text file into a new data frame.

import json
import re
import pandas as pd
from time import gmtime, mktime, strptime

tweets_data = []
tweets_file = open(tweets_data_path, "r")
for line in tweets_file:
    try:
        tweet = json.loads(line)
        tweets_data.append(tweet)
    except:
        continue
#
# Clean out limit messages, etc.
#
for tweet in tweets_data:
    try:
        user = tweet['user']
    except:
        tweets_data.remove(tweet)

for tweet in tweets_data:
    try:
        user = tweet['text']
    except:
        tweets_data.remove(tweet)

#
# See how many we wound up with
#
print len(tweets_data)

#
# Pull the data we're interested in out of the Twitter data we captured
#
rows_list = []
now = mktime(gmtime())
for tweet in tweets_data:
    author = ""
    rtauthor = ""
    age = rtage = followers = rtfollowers = 0
#
# If it was a retweet, get both the original author and the retweeter, save the original author's
# follower count and age
#
    try:
        author = tweet['user']['screen_name']
        rtauthor = tweet['retweeted_status']['user']['screen_name']
        rtage = int(now - mktime(strptime(tweet['retweeted_status']['user']['created_at'], "%a %b %d %H:%M:%S +0000 %Y")))/(60*60*24)
        rtfollowers = tweet['retweeted_status']['user']['followers_count']
    except:
#
# Otherwise, just get the original author
#
        try:
            author = tweet['user']['screen_name']
        except:
            continue
#
# If this was a reply, save the screen name being replied to
#
    reply_to = ""
    if (tweet['in_reply_to_screen_name'] != None):
        reply_to = tweet['in_reply_to_screen_name']
#
# Calculate the age, in days, of this Twitter ID
#
    age = int(now - mktime(strptime(tweet['user']['created_at'], "%a %b %d %H:%M:%S +0000 %Y")))/(60*60*24)
#
# Grab this ID's follower count and the text of the tweet
#
    followers = tweet['user']['followers_count']
    text = tweet['text']
    dict1 = {}
#
# Construct a row, add it to our list
#
    dict1.update({'author': author, 'reply_to': reply_to, 'age': age, 'followers': followers, 'retweet_of': rtauthor, 'rtfollowers': rtfollowers, 'rtage': rtage, 'text': text})
    rows_list.append(dict1)

#
# When we've processed all the tweets, build the DataFrame from the rows
# we've collected
#
tweets = pd.DataFrame(rows_list)

Here’s a script that will iterate through the dataframe, row by row, and construct a directed graph of who’s retweeting whom. Each directed edge represented the relationship “is retweeted by”, the higher the weight of an edge, the more person B is getting retweeted by person A. Each node represents an individual ID on Twitter, and has attributes to track the number of followers and the age of the ID in days.

import networkx as nx

#
# Create a new directed graph
#
J = nx.DiGraph()
#
# Iterate through the rows of our dataframe
#
for index, row in tweets.iterrows():
#
# Gather the data out of the row
#
    this_user_id = row['author']
    author = row['retweet_of']
    followers = row['followers']
    age = row['age']
    rtfollowers = row['rtfollowers']
    rtage = row['rtage']
#
# Is the sender of this tweet in our network?
#
    if not this_user_id in J:
        J.add_node(this_user_id, attr_dict={
                'followers': row['followers'],
                'age': row['age'],
            })
#
# If this is a retweet, is the original author a node?
#
    if author != "" and not author in J:
        J.add_node(author, attr_dict={
                'followers': row['rtfollowers'],
                'age': row['rtage'],
            })
#
# If this is a retweet, add an edge between the two nodes.
#
    if author != "":
        if J.has_edge(author, this_user_id):
            J[author][this_user_id]['weight'] += 1
        else:
            J.add_weighted_edges_from([(author, this_user_id, 1.0)])

nx.write_gexf(J, 'ggrtages.gexf')

The last thing we did was to save out a GEFX file we can then read into Gephi. Start Gephi up, and open our file; we called ours “ggrtages.gexf”.

gephiScreenSnapz013

You’ll get a dialog telling you how many nodes and edges there are in the graph, whether it’s directed or not, and other information, warnings, etc. Click “OK”.

gephiScreenSnapz014

Gephi will import the GEFX file. You can now look at the information it contains by clicking on the “Data Laboratory” button at the top.

gephiScreenSnapz015

Click on the “Overview” button to start working with the network. At first, it doesn’t look like anything, since we haven’t actually run a visualization on it. Before we do, we can use some of the node attributes to color nodes a darker blue based on their age.

gephiScreenSnapz016

We can use the “Ranking” settings to color our nodes. Click on the “Select attribute” popup, and choose “age”.

gephiScreenSnapz017

You can choose difference color schemes, change the spline curve used to apply color, etc., from here as well.

gephiScreenSnapz018

Click on the “Apply” button to apply the ranking to the network. The nodes will now be colored rather than gray.

gephiScreenSnapz019

Now, we’re ready to run a visualization on our data. From the “Layout” section, let’s choose “ForceAtlas 2″—it’s fast and good at showing relationships in a network.

gephiScreenSnapz020

Press the “Run” button, and let it go for a bit. A network this size—about 10K nodes and 30K edges—settled down on my MacBook Pro within five minutes or less. When you feel it’s stabilized into something interesting, press the “Stop” button, and then click on the “Preview” button at the top.

gephiScreenSnapz022

The preview panel won’t show anything at first. Click the “Refresh” button.

gephiScreenSnapz023

Gephi will render your visualization. You can use the mouse to drag it around, and you can zoom in and out with a scroll-wheel or with the “+” and “-” buttons below.

gephiScreenSnapz024

gephiScreenSnapz025

gephiScreenSnapz026

Mining Twitter for #GamerGate: A How-To

I’ve gotten interested in the #GamerGate “controversy”—I’m pretty completely persuaded that any talk about “ethics” is a façade for a lot of reactionary nonsense, as well as abundant harassment and misogny—and it occurred to me that it represented an interesting data set to mine using Python. This is a quick guide for how to get started, but it could be adapted to any effort to datamine Twitter.

Setting Up to Connect to Twitter

First, you’re going to need to set up a Twitter app that you can use for authentication. You can do this at apps.twitter.com/app/new. You’ll need to have a valid Twitter account with an authenticated phone number.

Enter a name, description and web site URL for your application. You won’t need a callback URL.

FirefoxDeveloperEditionScreenSnapz068

Check “Yes, I agree” at the bottom of the Developer Agreement, and click the “Create your Twitter application” button.

FirefoxDeveloperEditionScreenSnapz069

Your application will be created. To use Tweepy to capture tweets, we’ll need the Consumer Key and Consumer Secret, and we’ll also need to set up an access token. Click on the “manage keys and access tokens” link next to your “Consumer Key (API Key)” in the “Application Settings” section.

FirefoxDeveloperEditionScreenSnapz070

This will take you to the “Keys and Access Tokens” tab. Note your “Consumer Key” and “Consumer Secret” (greyed out here).

FirefoxDeveloperEditionScreenSnapz071

In the “Your Access Token” section at the bottom of the page, click on “Create my access token”.

FirefoxDeveloperEditionScreenSnapz072

An “Access Token” and an “Access Token Secret” — again, greyed out here — will be generated, you’ll need these as well.

FirefoxDeveloperEditionScreenSnapz073

Install the Python Prerequisites

For this project, we’re going to need the Tweepy, Pandas, and matplotlib libraries

pip install tweepy pandas matplotlib

Here’s a simple-minded Python script using Tweepy to collect tweets mentioning “gamergate” from the Twitter streaming API:

from tweepy.streaming import StreamListener
from tweepy import OAuthHandler
from tweepy import Stream

access_token = "YOUR ACCESS TOKEN GOES HERE"
access_token_secret = "YOUR ACCESS TOKEN SECRET GOES HERE"
consumer_key ="YOUR CONSUMER KEY GOES HERE"
consumer_secret = "YOUR CONSUMER KEY SECRET GOES HERE"

class StdOutListener(StreamListener):

    def on_data(self, data):
        print data
        return True

    def on_error(self, status):
        print status

if __name__ == '__main__':

    listener = StdOutListener()
    auth_handler = OAuthHandler(consumer_key, consumer_secret)
    auth_handler.set_access_token(access_token, access_token_secret)
    stream = Stream(auth_handler, listener)

    stream.filter(track=['gamergate'])

UPDATE

The script, as it stands, times out on a read every once in a while, so there’s a minor improvement to be had here by embedding the collection in a while loop with a try and an except to keep it from crashing back to the shell prompt occasionally:

    while True:
        try:
            stream.filter(track=['gamergate'])
        except:
            continue

All this script does is print out every tweet which is captured by Tweepy, in JSON format. If you run it, the output will look something like this — this is a single tweet in JSON notation:

{u'contributors': None, u'truncated': False, u'text': u'RT @CommissarOfGG: Anti taking pride that nobody can tell the difference between them and someone pretending to be retarded.\n\n#GamerGate ht\u2026', 'retweet': True, u'in_reply_to_status_id': None, u'id': 584828601125773314, u'favorite_count': 0, u'source': u'<a href="http://twitter.com" rel="nofollow">Twitter Web Client</a>', u'retweeted': False, u'coordinates': None, u'timestamp_ms': u'1428268978755', u'entities': {u'symbols': [], u'media': [{u'source_status_id_str': u'584828243808661504', u'expanded_url': u'http://twitter.com/CommissarOfGG/status/584828243808661504/photo/1', u'display_url': u'pic.twitter.com/CS3Kb2Bkcm', u'url': u'http://t.co/CS3Kb2Bkcm', u'media_url_https': u'https://pbs.twimg.com/media/CB26L3HWAAIVSKF.png', u'source_status_id': 584828243808661504, u'id_str': u'584828239564111874', u'sizes': {u'small': {u'h': 351, u'resize': u'fit', u'w': 340}, u'large': {u'h': 607, u'resize': u'fit', u'w': 587}, u'medium': {u'h': 607, u'resize': u'fit', u'w': 587}, u'thumb': {u'h': 150, u'resize': u'crop', u'w': 150}}, u'indices': [139, 140], u'type': u'photo', u'id': 584828239564111874, u'media_url': u'http://pbs.twimg.com/media/CB26L3HWAAIVSKF.png'}], u'hashtags': [{u'indices': [126, 136], u'text': u'GamerGate'}], u'user_mentions': [{u'id': 2729513808, u'indices': [3, 17], u'id_str': u'2729513808', u'screen_name': u'CommissarOfGG', u'name': u'Comrade Commissar'}], u'trends': [], u'urls': []}, u'in_reply_to_screen_name': None, u'id_str': u'584828601125773314', u'retweet_count': 0, u'in_reply_to_user_id': None, u'favorited': False, u'retweeted_status': {u'contributors': None, u'truncated': False, u'text': u'Anti taking pride that nobody can tell the difference between them and someone pretending to be retarded.\n\n#GamerGate http://t.co/CS3Kb2Bkcm', u'in_reply_to_status_id': None, u'id': 584828243808661504, u'favorite_count': 2, u'source': u'<a href="http://twitter.com" rel="nofollow">Twitter Web Client</a>', u'retweeted': False, u'coordinates': None, u'entities': {u'symbols': [], u'media': [{u'expanded_url': u'http://twitter.com/CommissarOfGG/status/584828243808661504/photo/1', u'display_url': u'pic.twitter.com/CS3Kb2Bkcm', u'url': u'http://t.co/CS3Kb2Bkcm', u'media_url_https': u'https://pbs.twimg.com/media/CB26L3HWAAIVSKF.png', u'id_str': u'584828239564111874', u'sizes': {u'small': {u'h': 351, u'resize': u'fit', u'w': 340}, u'large': {u'h': 607, u'resize': u'fit', u'w': 587}, u'medium': {u'h': 607, u'resize': u'fit', u'w': 587}, u'thumb': {u'h': 150, u'resize': u'crop', u'w': 150}}, u'indices': [118, 140], u'type': u'photo', u'id': 584828239564111874, u'media_url': u'http://pbs.twimg.com/media/CB26L3HWAAIVSKF.png'}], u'hashtags': [{u'indices': [107, 117], u'text': u'GamerGate'}], u'user_mentions': [], u'trends': [], u'urls': []}, u'in_reply_to_screen_name': None, u'id_str': u'584828243808661504', u'retweet_count': 4, u'in_reply_to_user_id': None, u'favorited': False, u'user': {u'follow_request_sent': None, u'profile_use_background_image': False, u'default_profile_image': False, u'id': 2729513808, u'verified': False, u'profile_image_url_https': u'https://pbs.twimg.com/profile_images/533572934799876096/DYR05LI4_normal.png', u'profile_sidebar_fill_color': u'000000', u'profile_text_color': u'000000', u'followers_count': 2047, u'profile_sidebar_border_color': u'000000', u'id_str': u'2729513808', u'profile_background_color': u'000000', u'listed_count': 26, u'profile_background_image_url_https': u'https://abs.twimg.com/images/themes/theme1/bg.png', u'utc_offset': -14400, u'statuses_count': 4940, u'description': u'#GamerGate #OpSKYNET', u'friends_count': 1584, u'location': u'Moscow', u'profile_link_color': u'DD2E44', u'profile_image_url': u'http://pbs.twimg.com/profile_images/533572934799876096/DYR05LI4_normal.png', u'following': None, u'geo_enabled': False, u'profile_banner_url': u'https://pbs.twimg.com/profile_banners/2729513808/1407939361', u'profile_background_image_url': u'http://abs.twimg.com/images/themes/theme1/bg.png', u'name': u'Comrade Commissar', u'lang': u'en', u'profile_background_tile': False, u'favourites_count': 1980, u'screen_name': u'CommissarOfGG', u'notifications': None, u'url': u'http://www.facebook.com/commissarofgamergate', u'created_at': u'Wed Aug 13 14:10:24 +0000 2014', u'contributors_enabled': False, u'time_zone': u'Eastern Time (US & Canada)', u'protected': False, u'default_profile': False, u'is_translator': False}, u'geo': None, u'in_reply_to_user_id_str': None, u'possibly_sensitive': False, u'lang': u'en', u'created_at': u'Sun Apr 05 21:21:33 +0000 2015', u'filter_level': u'low', u'in_reply_to_status_id_str': None, u'place': None, u'extended_entities': {u'media': [{u'expanded_url': u'http://twitter.com/CommissarOfGG/status/584828243808661504/photo/1', u'display_url': u'pic.twitter.com/CS3Kb2Bkcm', u'url': u'http://t.co/CS3Kb2Bkcm', u'media_url_https': u'https://pbs.twimg.com/media/CB26L3HWAAIVSKF.png', u'id_str': u'584828239564111874', u'sizes': {u'small': {u'h': 351, u'resize': u'fit', u'w': 340}, u'large': {u'h': 607, u'resize': u'fit', u'w': 587}, u'medium': {u'h': 607, u'resize': u'fit', u'w': 587}, u'thumb': {u'h': 150, u'resize': u'crop', u'w': 150}}, u'indices': [118, 140], u'type': u'photo', u'id': 584828239564111874, u'media_url': u'http://pbs.twimg.com/media/CB26L3HWAAIVSKF.png'}]}}, u'user': {u'follow_request_sent': None, u'profile_use_background_image': False, u'default_profile_image': False, u'id': 2784597626, u'verified': False, u'profile_image_url_https': u'https://pbs.twimg.com/profile_images/532401111823822848/KSIxqiLe_normal.jpeg', u'profile_sidebar_fill_color': u'000000', u'profile_text_color': u'000000', u'followers_count': 986, u'profile_sidebar_border_color': u'000000', u'id_str': u'2784597626', u'profile_background_color': u'000000', u'listed_count': 35, u'profile_background_image_url_https': u'https://abs.twimg.com/images/themes/theme1/bg.png', u'utc_offset': -18000, u'statuses_count': 25217, u'description': u"I wasn't born with enough middle fingers for perpetually outraged hipster douchebags compensating for their mediocrity with shelves of participation trophies.", u'friends_count': 785, u'location': u'Parts Unknown', u'profile_link_color': u'4A913C', u'profile_image_url': u'http://pbs.twimg.com/profile_images/532401111823822848/KSIxqiLe_normal.jpeg', u'following': None, u'geo_enabled': False, u'profile_banner_url': u'https://pbs.twimg.com/profile_banners/2784597626/1425335831', u'profile_background_image_url': u'http://abs.twimg.com/images/themes/theme1/bg.png', u'name': u'Unnecessary Robness', u'lang': u'en', u'profile_background_tile': False, u'favourites_count': 15912, u'screen_name': u'aDouScheiBler', u'notifications': None, u'url': None, u'created_at': u'Mon Sep 01 19:36:40 +0000 2014', u'contributors_enabled': False, u'time_zone': u'Central Time (US & Canada)', u'protected': False, u'default_profile': False, u'is_translator': False}, u'geo': None, u'in_reply_to_user_id_str': None, u'possibly_sensitive': False, u'lang': u'en', u'created_at': u'Sun Apr 05 21:22:58 +0000 2015', u'filter_level': u'low', u'in_reply_to_status_id_str': None, u'place': None, u'extended_entities': {u'media': [{u'source_status_id_str': u'584828243808661504', u'expanded_url': u'http://twitter.com/CommissarOfGG/status/584828243808661504/photo/1', u'display_url': u'pic.twitter.com/CS3Kb2Bkcm', u'url': u'http://t.co/CS3Kb2Bkcm', u'media_url_https': u'https://pbs.twimg.com/media/CB26L3HWAAIVSKF.png', u'source_status_id': 584828243808661504, u'id_str': u'584828239564111874', u'sizes': {u'small': {u'h': 351, u'resize': u'fit', u'w': 340}, u'large': {u'h': 607, u'resize': u'fit', u'w': 587}, u'medium': {u'h': 607, u'resize': u'fit', u'w': 587}, u'thumb': {u'h': 150, u'resize': u'crop', u'w': 150}}, u'indices': [139, 140], u'type': u'photo', u'id': 584828239564111874, u'media_url': u'http://pbs.twimg.com/media/CB26L3HWAAIVSKF.png'}]}}

Set a terminal running the script above for as long as you like. I left mine going for 42 hours, and collected about 65000 tweets in a text file about 300MB long.

python tweetminer.py >> gamergate.txt

When you’ve collected your data, here’s some Python to set up a sample pandas DataFrame containing information of interest: who tweeted, how many days old their account is, how many followers they have, who it was a retweet of (if it was one) and to whom it was a reply (if it was one).

That should give you plenty of grist for analysis.

import json
import pandas as pd
import matplotlib.pyplot as plt
from time import gmtime, mktime, strptime

tweets_data_path = 'gamergate.txt'

tweets_data = []
tweets_file = open(tweets_data_path, "r")
for line in tweets_file:
    try:
        tweet = json.loads(line)
        tweets_data.append(tweet)
    except:
        continue
#
# Clean out limit messages, etc.
#
for tweet in tweets_data:
    try:
        user = tweet['user']
    except:
        tweets_data.remove(tweet)

print len(tweets_data)

#
# Pull the data we're interested in out of the Twitter data we captured
#
rows_list = []
now=mktime(gmtime())
for tweet in tweets_data:
    author = ""
    rtauthor = ""
#
# If it was a retweet, get both the original author and the retweeter
#
    try:
        author = tweet['user']['screen_name']
        rtauthor = tweet['retweeted_status']['user']['screen_name']
    except:
#
# Otherwise, just get the original author
#
        try:
            author = tweet['user']['screen_name']
        except:
            continue

    reply_to = ""
    if (tweet['in_reply_to_screen_name'] != None):
        reply_to = tweet['in_reply_to_screen_name']
    
    age = int(now - mktime(strptime(tweet['user']['created_at'], "%a %b %d %H:%M:%S +0000 %Y"))/(60*60*24))
    followers = tweet['user']['followers_count']
    dict1 = {}
    dict1.update({'author': author, 'retweet_of': rtauthor, 'reply_to': reply_to, 'age': age, 'followers': followers})
    rows_list.append(dict1)

tweets = pd.DataFrame(rows_list)

The resulting DataFrame will look something like this—note that rows 0-4 are retweets, and row 6 is a reply; “age” is days since the Twitter ID was created:

        age           author  followers     reply_to       retweet_of
0       137      Maskgamer64        428                  CultOfVivian
1       231   Smackfacemcgee       1304                  Daddy_Warpig
2      2240      LenFirewood       1658                   RSG_VILLENA
3       171     8bitsofsound        650                 CommissarOfGG
4       102    devilstwosome          9                   atlasnodded
5        24       tophatdril         34                              
6        11   TheRalphRetart         63     Dr_Louse                 
7...    ...              ...        ...          ...              ...
64531    65     4EverPlayer2        614                        mombot
64532   143  EnwroughtDreams        222                thewtfmagazine
64533  1996          _icze4r      22689                       dauthaz
64534  1581  __DavidFlanagan       8315                   Spacekatgal
64535   872         jtdg_b8z        621               GamingAndPandas
64536  2238        hanytimeh        914                thewtfmagazine

At this point you could easily find out the most-retweeted IDs in the DataFrame, for example:

In [146]: tweets['retweet_of'].value_counts()
Out[146]: 
                   17974
Sargon_of_Akkad     1574
ItalyGG             1516
TheRalphRetort      1064
Blaugast             910
mylittlepwnies3      899
thewtfmagazine       823
Nero                 721
srhbutts             706
Daddy_Warpig         705
randomfox            627
atlasnodded          592
full_mcintosh        586
whenindoubtdo        584
ToKnowIsToBe         569
...

Check out the follow-on posting to see how to use NetworkX and Gephi to make visualizations of the data.

Got Music on Your iPod You Need to Get Back Into iTunes? Here’s How.

I set up a new MacBook Pro several months back, but in moving things around, I managed to misplace some music. Most of my stuff is on a 160GB iPod Classic, and I wanted to synchronize the contents of the iPod with my iTunes library, recovering any tracks that were on the iPod but not in the library.

Out of the box, Apple doesn’t make this especially easy for you. Syncing within iTunes is strictly a one-way affair: what’s in your library replaces what’s on the iPod, precisely the opposite of what I want. Mounting the iPod on the desktop doesn’t help — you only get access to a shared file area, and the music is organized in a way calculated to keep you from figuring out where anything is.

Fortunately, there’s an easy solution, but it takes a few pieces to put together. This presumes that you’ve got your music in one folder on your computer, and you’re letting iTunes organize it for you (which it will do by artist).

First, you’re going to need FUSE for OS X, which allows you to create user-space file systems on OS X. Download the .dmg file — current version is 2.7.4 — and install it.

That sets you up to install iTunesFS. Download the iTunesFS 1.3.4 installer disk image from the downloads page, mount the .dmg and drag the iTunesFS app to your Applications folder.

Connect your iPod to an available USB port on your Mac, but don’t launch iTunes. Instead, launch the iTunesFS application. An “iTunesFS” icon will appear on your desktop, and opening it will reveal several folders, “Albums”, “Artists”, etc. You want the “Artists” folder.

Open another window and browse to your iTunes library folder — you can find out where this is under the “Advanced” tab in iTunes’ Preferences. You want the “Music” folder. (You’ll see why we want these files open in a moment.)

Now, open a terminal window. We’re going to use the rsync command to copy any files that exist in the “Artists” folder on the “iTunesFS” volume — a synthetic folder constructed by iTunesFS which mirrors the organization of the “Music” folder in iTunes’ library folder. Type the beginning of the command:

rsync -av --dry-run

Leave a trailing space at the end. Now, go to the “iTunesFS” window in Finder and drag the “Artists” folder on top of the Terminal window, and “drop” it there. The Terminal app will fill in the path to the folder you dropped, and the command will now look something like this:

rsync -av --dry-run /Volumes/iTunesFS/Artists

Add a trailing slash and a space at the end, so it looks like this:

rsync -av --dry-run /Volumes/iTunesFS/Artists/

Now, go back to Finder, and from the window that’s open to your iTunes library folder, drag and drop the “Music” folder onto the Terminal window. The command should now look something like this (it will vary depending on the names of your volumes, etc., obviously):

rsync -av --dry-run /Volumes/iTunesFS/Artists/ /Volumes/Vibranium/iTunes/Music

The --dry-run flag tells rsync to just simulate moving the files, this will give you a chance to check that things look reasonable first. The -av flags tell rsync to use “archive” mode, which means it will walk down the directories within the folder you specify recursively, and to provide “verbose” output. Execute the command, and it will provide a list of all the directories the command would create and the files it would move. If you’re satisfied with what you see (and you probably actually want to review it!), use the up-arrow to bring back your last command, edit out the --dry-run flag, and execute it again.

Pretty easy.

Stupid Problems Deserve Stupid Solutions

I recently got handed a pretty decent Vista-era Lenovo laptop, which on inspection, turned out to have no hard disk in it. I invested $30 in a 250GB SATA disk, and decided I’d have a go at setting up Kali Linux just to check it out.

I’m very impressed. It’s well-put-together and well-documented, there’s a good IRC channel on freenode, and it’s got most of what I want in a development workstation.

I did run into one little problem setting some things up, though. Apparently, prior version 1.0.8, lsb_release reported a codename of “n/a”, which causes the nodesetup.sh script to barf. That was reported as fixed, but it seems to have regressed in the latest version, 1.0.9a.

After fooling around with a couple of candidates to fix the problem — and does anyone know the right way to fix this…? Ugly details below — I broke down and just hacked the setup script.

The core problem is that lsb_release -c -s is returning “n/a”, and causing the script to decide that this isn’t a distro it knows how to support. Here’s the (stupid) fix.

First, pull down the setup script from the NodeJS site and save it to a file so we can patch it:

curl -sL https://deb.nodesource.com/setup > nodesetup.sh

Next, open nodesetup.sh in an editor and look for the following section:

check_alt "Linux Mint" "rebecca" "Ubuntu" "trusty"
check_alt "Linux Mint" "qiana" "Ubuntu" "trusty"
check_alt "Linux Mint" "maya" "Ubuntu" "precise"
check_alt "elementaryOS" "luna" "Ubuntu" "precise"
check_alt "elementaryOS" "freya" "Ubuntu" "trusty"

Edit it to all a line to the end as follows:

check_alt "Linux Mint" "rebecca" "Ubuntu" "trusty"
check_alt "Linux Mint" "qiana" "Ubuntu" "trusty"
check_alt "Linux Mint" "maya" "Ubuntu" "precise"
check_alt "elementaryOS" "luna" "Ubuntu" "precise"
check_alt "elementaryOS" "freya" "Ubuntu" "trusty"
check_alt "Kali" "n/a" "Debian" "wheezy"

Now, pass the script to bash as intended. This will set up the repos correctly to get node.js and its dependencies as though you were running on a stock Debian “wheezy” system.

cat nodesetup.sh | bash -

It should complete without a problem. Now, you can install nodejs and npm:

apt-get install nodejs

Gory Details

I’m unclear on where the code name reported by lsb_release gets set up. A couple of candidates which didn’t work: attempting various edits to /etc/os-release didn’t help nor did modifications /etc/dpkg/origins/default which is an aliased for /etc/dpkg/origins/kali. Anyway, it worked, nodes and npm are in and check out.

Building a Customized PirateBox, Part 1: Build a PirateBox

I wanted to have something fun and portable that I could use to promote the software literacy effort I’m spinning up here, and I thought it would be great to make a PirateBox that I could “re-brand” and use to share files and a microsite about what I’m doing and why it’s important.

The PirateBox project has spawned a number of customized versions, like LibraryBox, BibleBox, and others. I’m calling the one I’ve put together “nanonet”, and you can see a version of the web site it runs here.

What You’ll Need

  • A TP-Link TL-MR3040
  • A small USB drive — you only need the room for the base PirateBox install, under twenty megabytes, plus whatever space you’ll need for the web site you want to serve plus any files you want to share.
  • A Debian 7 “Wheezy”-like system with an Ethernet port that you can dedicate to this for a little while

I did my build on Linux Mint 17, but this will probably work on any Linux command line.

Considerations

A “vanilla” PirateBox offers a chat window, a forum board, and the ability to upload and download files, in addition to its single-page HTML site. For my application, I wasn’t really interested in having people upload to the device, I just wanted to make material available for people to download.

The only “administrative” access to the PirateBox is via the Ethernet port, so there aren’t really any security issues to worry about on a device like this.

Overview

The TL-MR3040 is, in today’s terms, a tiny device, although in the mid-to-late ’90s, it would have been more or less the equal of a reasonably powerful PC. Its biggest limitation is internal storage: it’s got 16MB of RAM, and only 4MB of flash memory to play around with.

That’s going to limit what we’ll be able to put on the device, and aside from what’s needed to make PirateBox itself run, we’re going to limit ourselves to an index page, the stuff needed to support it (e.g. minified jQuery, etc.) Every thing else can go onto the USB drive, in a “Shared” directory.

We’re going to take things in the following order:

  1. Reflash the TL-MR3040 with the PirateBox version of OpenWrt 12.09 “Attitude Adjustment”.
  2. Do the first-time OpenWrt set-up.
  3. Install PirateBox v1.0

That much is going to be a lengthy posting in and of itself. In part 2, we’ll

  1. Extract an image of the on-board PirateBox web site from the install — this is the only portion where I actually used Linux, everything else was done on OS X
  2. Throw away the index page, and replace it with one of our own
  3. Implement the rest of the web site in the shared directory on the USB drive
  4. Customize things like the SSID of the WiFi network, the favicon, etc., to reflect the new “branding”

Installing OpenWrt “Attitude Adjustment” (PirateBox Edition)

Check which version of the MR3040 you’ve got, there are two different firmware versions, and you can brick your device if you try to reflash it with the wrong one.

To do this, take off the back — you’ll need to do this to install the battery anyway — and look at the barcoded label stuck on the label underneath where the battery goes. This photo shows the label on a v2.2 device, as indicated. For this one, we’d want v2 firmware. Do not guess at this. Be sure to check.

IMG_1145

For a version 2 device, you will want http://stable.openwrt.piratebox.de/auto/openwrt-ar71xx-generic-tl-mr3040-v2-squashfs-factory.bin

For a version 1 device, you will want http://stable.openwrt.piratebox.de/auto/openwrt-ar71xx-generic-tl-mr3040-v1-squashfs-factory.bin

In either case, get a copy of http://stable.openwrt.piratebox.de/auto/install_piratebox.zip, and expand it. Put the resulting folder, called install, onto your FAT-formatted USB drive. Do not insert the USB drive into the MR3040 at this time.

Again, be sure you’ve gotten the correct bin file for your device!

Start by installing the battery and plugging the MR3040 into a powered USB port via its own mini-USB port to charge it up. Both of mine have arrived with flat batteries. This will take an hour or two.

Configure the Ethernet port on your system. The MR3040’s factory configuration uses the link-local IP address 192.168.0.1 on its Ethernet port, so we want our system to have an address on the same network that won’t collide. 192.168.0.2 is fine.

To set it up on OS X, choose “System Preferences” from the Apple menu, then double-click on “Network”.

In the Network control panel, select your Ethernet port in the sidebar and then choose “Using DHCP with manual address” from the pop-up menu. Enter “192.168.0.2” in the box below the pop-up and click “Apply”.

System PreferencesScreenSnapz004

Connect the MR3040 to your computer’s Ethernet port — they include a short cable in the box — slide the 3-way switch on the side to “3G/4G”, and turn it on. It takes a little while to boot, but you can check to see when it’s up with the command

ping 192.168.0.1

at a Terminal command line. When you start getting responses to the ping, open up a browser window and enter the URL http://192.168.0.1

The browser will ask you for a user name and password. For a new TL-MR3040, these are admin and admin. This will get you to the Administration app for the TP-Link software.

Click on “System Tools” in the sidebar menu, then on “Firmware Upgrade” when the accordion menu opens. That will get you to this screen. Note that the factory firmware version for the device is shown here. In this case, it’s v1, more than likely, yours will be v2.

Be very sure that the version of the firmware that you’ve gotten from the PirateBox site is the same as the version shown here. You can brick your device if you mix them up, and you’re a lot better off bricking your device after you’ve successfully installed OpenWrt than you are before, for reasons we’ll discuss another time.

tl-mr3040-upgrade-firmware-2

Click the “Select” (or “Parcourir”) button to the right, and choose the bin file you downloaded earlier.

A scroll bar will appear indicating the progress of the firmware update. DO NOT INTERRUPT POWER ON THE DEVICE WHILE THIS IS GOING ON. While you’re waiting, go back to your Ethernet preferences panel and change the address you’re using from 192.168.0.2 to 192.168.1.2, since when a new device comes up on OpenWrt, it uses the default address 192.168.1.1.

Bring up a terminal window, and start pinging that address with the command

ping 192.168.1.1

When you start getting a response to your pings — and it can take a while, maybe ten or fifteen minutes — we can proceed to the next step. (If you’re not getting ping responses after twenty minutes, the most likely problem is that you haven’t reconfigured your own system’s Ethernet port correctly.)

First Login to OpenWrt

Initially, OpenWrt only allows telnet access, has a single user, root, who has no password set, so the very first thing we want to do is telnet in and set that password.

telnet 192.168.1.1

When prompted to log in, enter root and hit return. You should see this:

PB-openwrt-telnet

Once you see this screen, you’re talking to ash, the OpenWrt command shell. Change the root password:

passwd

You’ll be prompted to enter your new root password twice. Choose something you won’t forget! Assigning a password to root will “activate” OpenWrt, and once we’ve closed the telnet connection, we’ll only be able to access the system via SSH from that point on, so you’ll need to use the command

ssh root@192.168.1.1

When you’re prompted, enter your password, and you should be back at the OpenWrt command prompt shown above.

We’re now ready to complete our PirateBox installation. Get out of the telnet (or ssh) connection by typing

exit

and power off the MR3040.

Install PirateBox

Finally, we’re ready to install PirateBox v1.0 onto our newly-configured TL-MR3040. This part is easy.

With the power still off, insert your prepared USB drive — FAT-formatted, with the install folder you unpacked from the file install_piratebox.zip you downloaded earlier — into the USB port of the MR3040.

Turn the power on. Go do something else for fifteen minutes or so.

What you’ll observe, if you watch it, it a lot of blinking and flashing, occasionally interrupted by short periods when only the power light is lit — that indicates that the MR3040 is rebooting, which is does three or four times before the installation is completed.

After about the third reboot, if you scan your local Wifi networks, you’ll see the network “Piratebox — Share Freely!” appear. You can connect to that network, but it may go away if the install hasn’t actually completed yet.

Once you’ve been able to get onto the Wifi network that the MR3040 is creating, open a new browser page, and navigate to any address you haven’t browsed recently — caching can confuse things — or just go to http://piratebox.lan, the name the Piratebox uses to refer to itself. You should see this.

piratebox-lan

Congratulations! You’ve just created a PirateBox, a tiny self-contained web server based on a tiny version of Linux.

Explore and have fun — you can upload and download files, put up messages on a forum, or leave brief notes on the home page. It turns out that, when you’re out with a crowd, a PirateBox can be an easier way to share around things like photos rather than having to email them.

In part two, we’ll dig a little deeper into the internals of PirateBox and show how you can put pretty much any sort of a web site onto one.

If you follow this tutorial, please let me know how things worked out for you in the comments. If you run into snags, I’ll do what I can to help out.

Easy Build for OpenWrt On Ubuntu/Debian

I just got a TP-Link TL-MR3040 a few days ago, and successfully set it up as a PirateBox, which involved refreshing the firmware with OpenWrt rather than a stock image. This is actually a pretty cool little device for the $35, it’ll run Linux, and with OpenWrt, not only can it function as a router, it can act as a tiny server running off a file system attached via USB.

I’ve tried this build on multiple platforms, and documented some of that in a previous version of this posting. While I’ve successfully gotten the core of OpenWrt to build on OS X, and a number of things to build on CentOS, I’ve only gotten consistent and reliable results overall on Debian-like systems, so that’s what I’m going to be sticking to here.

In particular, I’ve had nothing but trouble trying to build an OpenWrt image for a Raspberry Pi anywhere other than on Debian or a Debian derivative. I have verified working builds for both the TL-MR3040 and the Raspberry Pi on Mint 17.

The instructions for building a firmware image on the OpenWrt wiki are a version or more out of date — they’re for building Attitude Adjustment, rather than Barrier Breaker.

The procedure for building top-of-trunk for OpenWrt developers is better documented than the Attitude Adjustment build seems to have been, but still a little bit scattered.

Additionally, the guide for setting up a build environment on OS X relies on MacPorts, and I prefer Homebrew, is similarly outdated, and there are a wrinkle or two along the way, so I figured I should document what I’ve done. I’m not recommending, at this point, that you try building this stuff directly on OS X. Use a VM running Debian or Mint instead, that’s my recommendation.

Set Up the Prerequisites

On Ubuntu 14.04 LTS “Trusty Tahr”/Debian 7.7.0/Mint 17

sudo apt-get install subversion build-essential libncurses5-dev zlib1g-dev gawk git ccache gettext libssl-dev xsltproc zip

On OS X 10.10 “Yosemite”

On OS X, we’ll want to specifically set up a case-sensitive file system to work on. We can create a .dmg file that we can use for our development with the following commands. Twenty gig is plenty of space.

hdiutil create -size 20g -fs "Case-sensitive HFS+" -volname OpenWrt OpenWrt.dmg
hdiutil attach OpenWrt.dmg

Getting the build environment set up right here is a little more ornate. If you don’t have Homebrew (and you should), you’ll need to get that installed first. You’ll also need to install Xcode and the Xcode Command Line Tools.

brew update
brew upgrade
brew install coreutils e2fsprogs ossp-uuid asciidoc binutils fastjar gtk+ gnu-getopt gnu-tar intltool openssl subversion rsync sdcc gawk wget findutils

When brew installs the gnu toolset, it doesn’t automatically link it into your path, and the build wants to use gnu-compatible tools. However, brew does create an auxiliary directory of gnu-compatible aliases at /usr/local/opt/coreutils/libexec/gnubin, and for the purposes of the build, we can set our path to preference those tools temporarily.

ln -s /usr/local/Cellar/gnu-getopt/1.1.5/bin/getopt /usr/local/opt/coreutils/libexec/gnubin/getopt
ln -s /usr/local/bin/gtar /usr/local/opt/coreutils/libexec/gnubin/tar
export PATH=/usr/local/opt/coreutils/libexec/gnubin:$PATH

Get the Sources

Get the Barrier Breaker sources from the upstream repo to build the current stable release:

git clone git://git.openwrt.org/14.07/openwrt.git

Or pull down the latest OpenWrt “Chaos Calmer” sources to build the “bleeding edge” top-of-trunk version:

git clone git://git.openwrt.org/openwrt.git

Prepare For the Build

Connect to the source directory, and update and install all the feeds. These represents the build schemes for all of the optional components that you can add to your OpenWrt system.

cd ~/openwrt
./scripts/feeds update -a
./scripts/feeds install -a

Configure the build.

make prereq

This sets up prerequisites for the build and then takes you into menuconfig, a screen-driven configuration utility based on the one used to set up builds for the Linux kernel.

For starts, you simply want to pick an appropriate “Target system” and “Target profile”. For the TP-Link TL-MR3040, the target system is “Atheros AR7xxx/9xxx”, subtarget “generic”. For a Raspberry Pi, the target system is “BRCOM947xx/953xx”, the only profile is “Raspberry Pi”.

build5

For an initial build, I’d suggest simply picking the correct target and leaving it at that. You can start adding other options once you’ve verified that you can produce a working build and have an idea how much free space you’ve got to play with on the system. You want to start out minimal, the MR3040 only has 4MB (!) of available flash memory.

When you’re done here, select “Exit”, and save your configuration file as .config when prompted to do so.

Build that sucker!

All it takes is a make at this point. I like to use make V=s because I like to watch it do its thing.

make

Results will be in the bin/ folder, in a subfolder corresponding to the architecture you’ve built for — in my case “ar71xx”.