Recent Blog Posts

  • Photo of danieltoshea danieltoshea
    Sparkart
    danieltoshea
    Male, 29
    Oakland, CA
    Status
    Just Joined
    Comments So Far
    4
    Last Updated
    04/16/09

    Engineering Builds a Twitter Bot

    April 16, 2009 Posted by danieltoshea 0 Comments

    Twitter is the new hotness, it seems every site you visit has some kind of "follow us on twitter" link or built in twitter functionality. People are hooking up cat doors and office chairs to the twitter API so it's only natural that Sparkart Engineering wants to play with this "exploding" new technology.

    The idea was simple, as anyone in support will tell you our fans have lots of followers (stalkers), but with twitter our clients can now stalk their fans right back! Twitter has built in search from their acquisition of Summize, that allows any tom, dick, or harry bot to search tweets for specific terms. If you haven't you should check out the advanced search features which allow you to search for terms with positive or negative attitudes, near places, between dates or that are asking a question. Arbitrarily we built our bot to search for tweets that have the terms sparkart or UFC in them, and in a couple of hours our first bot was born.

    Engineering Builds a Twitter Bot

    Since new technology deserves more new technology our twitter bot can't be written in ruby, that would be too easy for team engineering. To make the project more interesting and fun we decided to write the bot in Erlang. Erlang according to wikipedia is:

    A general-purpose concurrent programming language and runtime
    system. The sequential subset of Erlang is a functional language, with
    strict evaluation, single assignment, and dynamic typing. For concurrency
    it follows the Actor model. It was designed by Ericsson to support
    distributed, fault-tolerant, soft-real-time, non-stop applications.

    Wow, that's a mouthful. Erlang is starting to become the new "hot" language among startups, especially where concurrency is a major requirement. Everyone from Amazon.com, facebook, github, heroku, mochiweb, and last.fm are using it internally. Lets dive into some code and see if we can take the title of "nerdiest post on the blog". Warning really, really technical content ahead...

    First some Erlangism's to make the code below easier to understand. Variables in erlang start with capital letters, all lowercase symbols are called atoms (think ruby symbols). Square brackets denote lists and curly brackets denote tuples (ordered sequences).

      -module(twitter_search).
    
      -export([run/0, search/1]).
    
      -include_lib("xmerl/include/xmerl.hrl").
    
      -define(SEARCH_URL,
        "http://search.twitter.com/search.atom?rpp=50&q=").
    
       -define(SEARCH_STORAGE, "results-file").
    
      run() ->
        inets:start(),
        open_table(?SEARCH_STORAGE),
        SearchTerms = ["sparkart", "ufc"],
        [ store_result( Term, search_for( Term ) ) | Term <- SearchTerms ],
        timer:sleep(2000),
        close_table().
    
      search_for(Query) ->
        Xml = search(Query),
        retrieve_names_from(Xml).
    
      search(Query) ->
       URL = search_url_for(Query),
       { ok, {_Status, _Headers, Body }} = http:request(URL),
       Body.
    
      search_url_for(Query) ->
        ?SEARCH_URL ++ Query.
    
      retrieve_names_from(Xml) ->
        { Body, _Rest } = xmerl_scan:string(Xml),
        Names = xmerl_xpath:string("//name/text()", Body),
        UserNames = [ strip_out_full_name(Author) | {_xmlText,[{name, _},{author, _},{entry, _},{feed, _}],
                                                 _,[],Author,text} <- Names],
        lists:usort(UserNames).
    
      strip_out_full_name(Author) ->
        [Name | _Rest ] = string:tokens(Author, " "),
        Name.
    
      store_result(Key, Results) -> dets:insert(?MODULE, {Key, Results}).
    
      open_table(File) ->
        io:format("dets opened:  ~p~n", [File]),
        io:format("dets name:  ~p~n", [?MODULE]),
        case dets:open_file(?MODULE, [{file, File},{type, bag}]) of
          {ok, ?MODULE} ->
            true;
          {error, _Reason} ->
            io:format("cannot open dets table~n"),
            exit(eDetsOpen)
        end.
    
      close_table() -> dets:close(?MODULE).
    

    Lets start breaking this bad boy down so it makes sense to more people than the person who wrote it.

    -module(twitter_search).
    
    -export([run/0, search/1]). 
    
    -include_lib("xmerl/include/xmerl.hrl"). 
    
    -define(SEARCH_URL, 
      "http://search.twitter.com/search.atom?rpp=50&q="). 
    
        -define(SEARCH_STORAGE, "results-file").
    

    The first line defines the name space for the the functions in our bot, the module name needs to match the file name, just like model files and classes in Rails. All functions in Erlang are name spaced, so run becomes twittersearch:run(), preventing any similarly named functions from colliding with each other. The export directive tells the run time what functions and their arity are available outside of the module. The arity of a function is the number of parameters it takes. So the twittersearch module exports the run and search functions. The include_lib line is like a ruby require, we need xmerl to do an Xpath search later. The two define lines create constants, for the twitter search API URL and for the path to a file for persistent storage of the results respectively.

    run() ->
      inets:start(),
      open_table(?SEARCH_STORAGE),
      SearchTerms = ["sparkart", "ufc"],
      [ store_result( Term, search_for( Term ) ) | Term <- SearchTerms ],
      timer:sleep(2000),
      close_table().
    
    The run function is the entry point into our system. It starts the inets process so we can make HTTP requests, it opens our persistent backing store, performs the search and finally closes the backing store. The line [ storeresult( Term, searchfor( Term ) )Term <- SearchTerms ] is an Erlang list comprehension. In English, for every item in the search terms list it maps that element to the variable Term and then runs the searchfor and storeresult functions.
    search(Query) ->
     URL = search_url_for(Query),
     { ok, {_Status, _Headers, Body }} = http:request(URL),
     Body.    
    

    Search builds the URL for our search including the query we are looking for. Lets explore the line { ok, {Status, Headers, Body }} = http:request(URL), since it makes use of pattern matching which is a fundamental aspect of Erlang programming. First throw out the notion you may have that http:request(URL) gets assigned to that mess on the left hand side of the equals sign. What's really happening here is http:request(URL) is called and the result is checked to see if it matches the tuple on the left hand side. The tuple has an atom (think ruby symbol) as the first element and then has another tuple, with the HTTP status, HTTP headers and body of the response. The underscore before the variable names tells the runtime that we don't really about that value. IF the result matches the tuple, then the variable Body will be bound to the HTTP response body.

    retrieve_names_from(Xml) ->
      { Body, _Rest } = xmerl_scan:string(Xml),
      Names = xmerl_xpath:string("//name/text()", Body),
      UserNames = [ strip_out_full_name(Author) | {_xmlText,[{name, _},{author, _},{entry, _},{feed, _}],
                                           _,[],Author,text} <- Names],
      lists:usort(UserNames).
    

    The retrievenamesfrom function parses the XMl returned from our search, and uses an XPath expression to find all of the name nodes. We use another list comprehension to match the inner text of the name nodes and create a list of twitter usernames. Finally the list of user names is sorted and duplicates are removed. This list comprehension is a little more advanced than the one in run(). Instead of executing the strip out full name function for every element in the Names list, we use Erlang pattern matching to only match on the inner text of the author node. The format of the tuple is based on the internal parser representation of the node defined in xmerl header file.

    Simple right! One bright and shiny new bot in 56 lines of code. Hopefully this served as a good introduction to our first bot and Erlang the language it was written in. Feel free to drop questions, comments, kudos, and criticisms in the comments.

    Note: the single pipe in the list comprehensions should be a double pipe to be syntactically correct, but double pipes are reserved in markdown so a single pipe has been used.

  • Photo of EricL EricL
    Sparkart
    EricL
    Male, 27
    Oakland, CA
    Status
    Just Joined
    Comments So Far
    11
    Last Updated
    10/17/09

    WiiSpray

    April 6, 2009 Posted by EricL 0 Comments

    Spraypainting with a Wiimote, ridiculous! The stencil is especially awesome.

    (via daringfireball)

  • Photo of EricL EricL
    Sparkart
    EricL
    Male, 27
    Oakland, CA
    Status
    Just Joined
    Comments So Far
    11
    Last Updated
    10/17/09

    "You're sitting on a chair in the sky"

    March 15, 2009 Posted by EricL 0 Comments

  • Photo of JamesL JamesL
    Sparkart
    JamesL
    Male, 26
    San Francisco, CA
    Status
    Just Joined
    Comments So Far
    5
    Last Updated
    02/19/09

    Be Uniq

    January 26, 2009 Posted by JamesL 0 Comments

    The Ruby programming language's built-in uniq function for Arrays offers a quick and easy way to remove duplicate values.

      a = [1, 1, 2, 2, 2, 3, 4, 4, 4, 4]
      a.uniq   #=> [1, 2, 3, 4]
    

    But, what happens if the elements in our array are more complex than integers, such as Rails' ActiveRecord objects? To implement an Address Book feature, we needed to fetch users' previously entered billing and shipping addresses, and then remove any duplicates. An interesting problem arose, since a person's billing and shipping addresses might be exactly the same, but since they are stored separately, they correspond to separate ActiveRecord objects.

    Say Address.find(1) is a user's billing address and Address.find(2) is their shipping address:

    >> address1 = Address.find(1)
      {"id"=>"1",
       "street_address"=>"5655 College Avenue",
       "city"=>"Oakland",
       "state"=>"CA"
       "postal_code"=>"94618",
       "address_type_id"=>"1"}
    >> address2 = Address.find(2)
      {"id"=>"2",
       "street_address"=>"5655 College Avenue",
       "city"=>"Oakland",
       "state"=>"CA"
       "postal_code"=>"94618",
       "address_type_id"=>"2"}
    

    Calling

    >> [address1, address2].uniq.size
    2
    

    returns both addresses because Ruby treats them as separate and unique objects (they have different id's and address_type_id's, after all). According to our rules, however, we want to treat the two addresses as identical if they have the same street_address, city, state, country, and postal_code, regardless of whether they are separate objects or have different id's. In order to make uniq use our rules, we need to implement Address#eql? and Address#hash:

    class Address
      def eql?(other)
        [street_address, city, state, country, postal_code].eql?([other.street_address, other.city, other.state, other.country, other.postal_code])
      end
    
      def hash
        [street_address, city, state, country, postal_code].hash
      end
    end
    

    eql? and hash are the two underlying functions that uniq uses to determine uniqueness. With these two definitions in place, calling

    >> [address1, address2].uniq.size
    1
    

    returns only one of the addresses, instead of both. Allowing uniq, eql?, and hash to do the work for us is certainly easier and more elegant than the alternative of manually looping through the addresses and comparing them all by hand.

    References