share      tweet      post

twitter search
Find Links

Note: Unfortunately this no longer works. Probably due to changes in the Twitter API.

When following hashtags on twitter during conferences or chats there are often a lot of links being thrown about. With a stream of tweets going on it's normally necessary to save these for later. You can favourite the tweets, although this relies on catching the tweet at the time. Or search twitter later, although as far as a summary goes there may be loads of links. And the tweets don't always give a good enough context to show more information about what the link contains.

The web being as clever as it is though, it should be possible to list all the links shared through a hashtag, alongside additional information about them - gained directly from the web pages themselves, such as the page title and description.

Yahoo Query Language (YQL) allows for directly searching the html of web pages, and also provides access to searching twitter. This post will do the following (an example is at the top of the page):

It sounds quite a lot in terms of programming, but using YQL and jQuery to shorten the amount of JavaScript, it can be done in a very small amount of code. If you are completely unfamiliar with YQL, it was partly covered in this example of collecting global library stats. There is also an excellent developer resource.

Step 1. Search Twitter for tweets and links

YQL comes with some 'community tables'. In the YQL console you can tick 'show community tables' on the left hand side and you get access to all kinds of data: wikipedia, twitter, amazon, bbc, etc. If you select the twitter tables and select 'twitter.search.tweets' an example is generated of a twitter search. this appears as:

SELECT * FROM twitter.search.tweets
WHERE q="yahoo"
AND consumer_key="08ZNcNfdoCgYTzR7qcW1HQ"
AND  consumer_secret="PTMIdmhxAavwarH3r4aTnVF7iYbX6BRfykNBHIaB8"
AND access_token="1181240586-JIgvJe4ev3NHdHnAqnovHINWfpo0qB2S2kZtVRI"
AND access_token_secret="1nodv0LBsi7jS93e38KiW8cHOA5iUc6FT4L6De7kgk"

In a production system those consumer keys and access tokens would be your own, provided by twitter when you create a developer account. The following article gives instructions of how to create those keys and hide them when using YQL within a web page: Twitter API v1.1 Front-end Access.

YQL makes querying twitter quite straightforward, and after designing a query in the console it provides a direct link to access the data returned. Taking a library related query as an example, a query on savelibraries could be run by accessing the following URL, set to return JSON data:

YQL Query

In one go that returns tweets matching that search term, any linked URLs, user data for whoever sent each tweet, and location if attached, etc.

Step 2. Get the metadata for the links within tweets

The data returned includes links from within tweets that are already separated from the text of the tweet, so there's no need to mess around detecting them or manipulating the text data to extract them.

The next task is then to provide additional information for those links by querying YQL for meta tags within the html of pages. A YQL query to return the meta and title tags from a web link would be:

SELECT *
FROM html
WHERE url = 'http://www.librarieshacked.org'
AND xpath='//head/meta|//head/title'

The xpath query there defines which parts of the html are queried. In this case, the query will return data from any meta tags that appear in the head section of the html, and the title tag. YQL also allows for combining many queries at a time, by specifying these in the following way:

SELECT *
FROM yql.multi
WHERE queries = "
    SELECT * FROM html WHERE url = 'http://www.librarieshacked.org' AND xpath='//head/meta|//head/title';SELECT * FROM html WHERE url = 'http://www.librarieshacked.org/tutorials' AND xpath='//head/meta|//head/title'"

Step 3. Use JavaScript/jQuery to put it all together

To effectively use those two data sources, a tool needs to be created which will take a search term, find all the tweets that include links, use these to retrieve the webpage meta tags, and display this. Using the jQuery library, an example JavaScript function is:

function GetResults(search) {
    // empty out the results
    // divUrlResults is the 'output' container
    $('#divUrlResults').empty();

    // construct url to get tweet data
    var yqlGetTweets = "SELECT statuses FROM twitter.search.tweets WHERE q='"
            + search
            + "' AND consumer_key='08ZNcNfdoCgYTzR7qcW1HQ' "
            + "AND consumer_secret='PTMIdmhxAavwarH3r4aTnVF7iYbX6BRfykNBHIaB8' "
            +"AND access_token='1181240586-JIgvJe4ev3NHdHnAqnovHINWfpo0qB2S2kZtVRI' "
            +"AND access_token_secret='1nodv0LBsi7jS93e38KiW8cHOA5iUc6FT4L6De7kgk'"
    var yqlTweetsUrl = "https://query.yahooapis.com/v1/public/yql?q="
            + encodeURI(yqlGetTweets)
            + "&format=json&env=store://datatables.org/alltableswithkeys";

    // make the call to yql and wait for it to return (success:...)
    $.ajax({
        url: yqlTweetsUrl,
        dataType: "jsonp",
        success: function (data) {
            // on success set up an array of data to store each url urlData[]
            var urlData = [];
            // and a set of select statements for later call to yql
            var yqlSelects = "";

            if (data && data.query && data.query.count
                    && data.query.count > 0 && data.query.results.json) {
                // for each tweet returned process it
                $.each(data.query.results.json, function (key, value) {
                    if (this.statuses.entities && this.statuses.entities.urls) {
                        // create an object with relevant data from the tweet
                        var tweet = {
                            text: this.statuses.text, username: this.statuses.user.name,
                            usertag: this.statuses.user.screen_name, link: '',
                            tweetdate: this.statuses.created_at
                        };
                        $.isArray(this.statuses.entities.urls) ? tweet.link = this.statuses.entities.urls[0].expanded_url : tweet.link = this.statuses.entities.urls.expanded_url;
                        // checking if url has already been added (not adding duplicates)
                        if ($.grep(urlData, function (e) { return e.link == tweet.link; }).length == 0
                            && tweet.link.length > 0 && urlData.length  0) {
                            // for each item returned add a new section on the web page to show it.
                            $.each(data.query.results.results, function (i) {
                                $('#divUrlResults').append("<h4>" + this.title
                                        // cnstruct the html for each item
                                        + "</h4><h5>" + description + "</h5><a href='"
                                        + urlData[i].link + "'>" + urlData[i].link
                                        + "</a><p>tweeted by "
                                        + urlData[i].username + " at " + urlData[i].tweetdate
                                        + "</p><hr>");
                            });
                        }
                    }
                });
            }
        }
    });
}

That JavaScript will then search Twitter, find some additional information about the shared links, and populate a div on the page with that data. It could used to populate a twitter style widget, showing links shared rather than tweets, with summary data.


comments powered by Disqus