How to Download Twitter data in JSON:Twitter API Python examples

12

In  this tutorial,I will use Python scripts to download twitter data in JSON format from Twitter REST,Streaming and Search APIs.The scripts I will use in the examples are complete and can be run right away.If you are coming from a different program language I have attached the outputted JSON data file  so that you can understand the tweet object JSON structure.

The tutorial is divided into three parts:

  • An example using REST API to download user tweets
  • An example using Streaming API to download tweets for a certain keyword or Hashtag
  • An example using Search API to download tweets for a certain search  keyword or Hashtag

Let’s get started.

1.Install Python and Tweepy library

In order to implement the examples in this tutorial, you need to have Python 2.7.x installed(the script might work for Python 3 but haven’t tested yet),Atom (or any code editor) and Tweepy library.Tweepy is a Python library for accessing the Twitter API.If you are new to Python below are resources for you to refer to get started.

If you are using Windows you might encounter ‘pip’ is not recognised as an internal or external command error when trying to run pip install tweepy.To troubleshoot this error open the CMD or PowerShell copy and paste below values then close the window and reopen.Then run pip install tweepy. If you are still getting the error please refer to this discussion.

[Environment]::SetEnvironmentVariable(“Path”,”$env:Path;C:\Python27\Scripts”, “User”)

2.Get authorization credentials to access Twitter API

In  order to access and download data from Twitter API, we need to have credentials such as keys and access tokens.You get them by simply creating an APP with Twitter.

Follow these steps:

  • Go to Twitter Application Management and log in,with your Twitter account
  • Click “create New app” button
  • Supply the necessary required fields, read and agree to the Twitter Developer Agreement
  • Submit the form

If everything went fine you will have a window similar to the one below with your keys and access tokens under the keys and access tokens tab.We will use these credentials in our example codes.

twitter api consumer key and consumer secret keytwitter api access token and access token secret

Since we have everything set let us do some interesting stuff in the coming steps

3.An example using REST API to download a user’s tweets

The REST APIs provide programmatic access to read and write Twitter data,author a new Tweet, read author profile and follower data, and more. The REST API identifies Twitter applications and users using OAuth; responses are available in JSON.

We will use the tweet.py script to download a user’s recent tweets up to a maximum of 3200 the response will be returned in JSON format and saved as tweet.json file in the same folder you have saved the script.If you are new to Python you run the code from the command line by typing python tweet.py but make sure while you are on CMD window navigate to the same folder(my script is in scraping folder)  where your script is saved.

running python on the command line

saving tweets to JSON

Before running the code you need to edit the code to include the credentials provided by the Twitter App Management interface in the previous step.

#Twitter API credentials
consumer_key = "Consumer key goes here"
consumer_secret = "Consumer secret goes here"
access_key = "access key goes here"
access_secret = "access secret goes here"

Also, input twitter username you want to download tweets from.

if __name__ == '__main__':
    #pass in the username of the account you want to download
    get_all_tweets("user name goes here")  #Example:@realDonaldTrump

Below is the structure of a single twitter tweet object with attributes returned in JSON format.Here is a complete JSON file tweet.json (14.8MB)

{
    "contributors": null, 
    "coordinates": null, 
    "created_at": "Sat Aug 20 01:00:12 +0000 2016", 
    "entities": {
        "hashtags": [
            {
                "indices": [
                    97, 
                    116
                ], 
                "text": "StandWithLouisiana"
            }
        ], 
        "media": [
            {
                "display_url": "pic.twitter.com/Ob7J2oBWhq", 
                "expanded_url": "https://twitter.com/realDonaldTrump/status/766801978085117952/video/1", 
                "id": 766801621007294464, 
                "id_str": "766801621007294464", 
                "indices": [
                    117, 
                    140
                ], 
                "media_url": "https://pbs.twimg.com/ext_tw_video_thumb/766801621007294464/pu/img/0utktWvDSyGamM4m.jpg", 
                "media_url_https": "https://pbs.twimg.com/ext_tw_video_thumb/766801621007294464/pu/img/0utktWvDSyGamM4m.jpg", 
                "sizes": {
                    "large": {
                        "h": 576, 
                        "resize": "fit", 
                        "w": 1024
                    }, 
                    "medium": {
                        "h": 338, 
                        "resize": "fit", 
                        "w": 600
                    }, 
                    "small": {
                        "h": 191, 
                        "resize": "fit", 
                        "w": 340
                    }, 
                    "thumb": {
                        "h": 150, 
                        "resize": "crop", 
                        "w": 150
                    }
                }, 
                "type": "photo", 
                "url": "https://t.co/Ob7J2oBWhq"
            }
        ], 
        "symbols": [], 
        "urls": [], 
        "user_mentions": []
    }, 
    "extended_entities": {
        "media": [
            {
                "additional_media_info": {
                    "monetizable": false
                }, 
                "display_url": "pic.twitter.com/Ob7J2oBWhq", 
                "expanded_url": "https://twitter.com/realDonaldTrump/status/766801978085117952/video/1", 
                "id": 766801621007294464, 
                "id_str": "766801621007294464", 
                "indices": [
                    117, 
                    140
                ], 
                "media_url": "https://pbs.twimg.com/ext_tw_video_thumb/766801621007294464/pu/img/0utktWvDSyGamM4m.jpg", 
                "media_url_https": "https://pbs.twimg.com/ext_tw_video_thumb/766801621007294464/pu/img/0utktWvDSyGamM4m.jpg", 
                "sizes": {
                    "large": {
                        "h": 576, 
                        "resize": "fit", 
                        "w": 1024
                    }, 
                    "medium": {
                        "h": 338, 
                        "resize": "fit", 
                        "w": 600
                    }, 
                    "small": {
                        "h": 191, 
                        "resize": "fit", 
                        "w": 340
                    }, 
                    "thumb": {
                        "h": 150, 
                        "resize": "crop", 
                        "w": 150
                    }
                }, 
                "type": "video", 
                "url": "https://t.co/Ob7J2oBWhq", 
                "video_info": {
                    "aspect_ratio": [
                        16, 
                        9
                    ], 
                    "duration_millis": 140000, 
                    "variants": [
                        {
                            "bitrate": 320000, 
                            "content_type": "video/mp4", 
                            "url": "https://video.twimg.com/ext_tw_video/766801621007294464/pu/vid/320x180/0t4SNNy1YU1rHCYo.mp4"
                        }, 
                        {
                            "content_type": "application/dash+xml", 
                            "url": "https://video.twimg.com/ext_tw_video/766801621007294464/pu/pl/QiF_xbP1ARIdGp-F.mpd"
                        }, 
                        {
                            "content_type": "application/x-mpegURL", 
                            "url": "https://video.twimg.com/ext_tw_video/766801621007294464/pu/pl/QiF_xbP1ARIdGp-F.m3u8"
                        }, 
                        {
                            "bitrate": 2176000, 
                            "content_type": "video/mp4", 
                            "url": "https://video.twimg.com/ext_tw_video/766801621007294464/pu/vid/1280x720/8zc8PRPYNM4KmCXd.mp4"
                        }, 
                        {
                            "bitrate": 832000, 
                            "content_type": "video/mp4", 
                            "url": "https://video.twimg.com/ext_tw_video/766801621007294464/pu/vid/640x360/q_ClmD0bzudWewVn.mp4"
                        }
                    ]
                }
            }
        ]
    }, 
    "favorite_count": 42550, 
    "favorited": false, 
    "geo": null, 
    "id": 766801978085117952, 
    "id_str": "766801978085117952", 
    "in_reply_to_screen_name": null, 
    "in_reply_to_status_id": null, 
    "in_reply_to_status_id_str": null, 
    "in_reply_to_user_id": null, 
    "in_reply_to_user_id_str": null, 
    "is_quote_status": false, 
    "lang": "en", 
    "place": null, 
    "possibly_sensitive": false, 
    "retweet_count": 16977, 
    "retweeted": false, 
    "source": "<a href=\"https://twitter.com/download/iphone\" rel=\"nofollow\">Twitter for iPhone</a>", 
    "text": "We are one nation. When one hurts, we all hurt. We must all work together-to lift each other up.\n#StandWithLouisiana https://t.co/Ob7J2oBWhq", 
    "truncated": false, 
    "user": {
        "contributors_enabled": false, 
        "created_at": "Wed Mar 18 13:46:38 +0000 2009", 
        "default_profile": false, 
        "default_profile_image": false, 
        "description": "#TrumpPence16", 
        "entities": {
            "description": {
                "urls": []
            }, 
            "url": {
                "urls": [
                    {
                        "display_url": "DonaldJTrump.com", 
                        "expanded_url": "https://www.DonaldJTrump.com", 
                        "indices": [
                            0, 
                            23
                        ], 
                        "url": "https://t.co/mZB2hymxC9"
                    }
                ]
            }
        }, 
        "favourites_count": 35, 
        "follow_request_sent": false, 
        "followers_count": 11087586, 
        "following": true, 
        "friends_count": 42, 
        "geo_enabled": true, 
        "has_extended_profile": false, 
        "id": 25073877, 
        "id_str": "25073877", 
        "is_translation_enabled": true, 
        "is_translator": false, 
        "lang": "en", 
        "listed_count": 37773, 
        "location": "New York, NY", 
        "name": "Donald J. Trump", 
        "notifications": false, 
        "profile_background_color": "6D5C18", 
        "profile_background_image_url": "https://pbs.twimg.com/profile_background_images/530021613/trump_scotland__43_of_70_cc.jpg", 
        "profile_background_image_url_https": "https://pbs.twimg.com/profile_background_images/530021613/trump_scotland__43_of_70_cc.jpg", 
        "profile_background_tile": true, 
        "profile_banner_url": "https://pbs.twimg.com/profile_banners/25073877/1468988952", 
        "profile_image_url": "https://pbs.twimg.com/profile_images/1980294624/DJT_Headshot_V2_normal.jpg", 
        "profile_image_url_https": "https://pbs.twimg.com/profile_images/1980294624/DJT_Headshot_V2_normal.jpg", 
        "profile_link_color": "0D5B73", 
        "profile_sidebar_border_color": "BDDCAD", 
        "profile_sidebar_fill_color": "C5CEC0", 
        "profile_text_color": "333333", 
        "profile_use_background_image": true, 
        "protected": false, 
        "screen_name": "realDonaldTrump", 
        "statuses_count": 32979, 
        "time_zone": "Eastern Time (US & Canada)", 
        "url": "https://t.co/mZB2hymxC9", 
        "utc_offset": -14400, 
        "verified": true
    }
}{
    "contributors": null, 
    "coordinates": null, 
    "created_at": "Sat Aug 20 00:17:09 +0000 2016", 
    "entities": {
        "hashtags": [
            {
                "indices": [
                    0, 
                    14
                ], 
                "text": "WheresHillary"
            }
        ], 
        "symbols": [], 
        "urls": [], 
        "user_mentions": []
    }, 
    "favorite_count": 59882, 
    "favorited": false, 
    "geo": null, 
    "id": 766791143291916288, 
    "id_str": "766791143291916288", 
    "in_reply_to_screen_name": null, 
    "in_reply_to_status_id": null, 
    "in_reply_to_status_id_str": null, 
    "in_reply_to_user_id": null, 
    "in_reply_to_user_id_str": null, 
    "is_quote_status": false, 
    "lang": "en", 
    "place": null, 
    "retweet_count": 27272, 
    "retweeted": false, 
    "source": "<a href=\"https://twitter.com/download/iphone\" rel=\"nofollow\">Twitter for iPhone</a>", 
    "text": "#WheresHillary? Sleeping!!!!!", 
    "truncated": false, 
    "user": {
        "contributors_enabled": false, 
        "created_at": "Wed Mar 18 13:46:38 +0000 2009", 
        "default_profile": false, 
        "default_profile_image": false, 
        "description": "#TrumpPence16", 
        "entities": {
            "description": {
                "urls": []
            }, 
            "url": {
                "urls": [
                    {
                        "display_url": "DonaldJTrump.com", 
                        "expanded_url": "https://www.DonaldJTrump.com", 
                        "indices": [
                            0, 
                            23
                        ], 
                        "url": "https://t.co/mZB2hymxC9"
                    }
                ]
            }
        }, 
        "favourites_count": 35, 
        "follow_request_sent": false, 
        "followers_count": 11087586, 
        "following": true, 
        "friends_count": 42, 
        "geo_enabled": true, 
        "has_extended_profile": false, 
        "id": 25073877, 
        "id_str": "25073877", 
        "is_translation_enabled": true, 
        "is_translator": false, 
        "lang": "en", 
        "listed_count": 37773, 
        "location": "New York, NY", 
        "name": "Donald J. Trump", 
        "notifications": false, 
        "profile_background_color": "6D5C18", 
        "profile_background_image_url": "https://pbs.twimg.com/profile_background_images/530021613/trump_scotland__43_of_70_cc.jpg", 
        "profile_background_image_url_https": "https://pbs.twimg.com/profile_background_images/530021613/trump_scotland__43_of_70_cc.jpg", 
        "profile_background_tile": true, 
        "profile_banner_url": "https://pbs.twimg.com/profile_banners/25073877/1468988952", 
        "profile_image_url": "https://pbs.twimg.com/profile_images/1980294624/DJT_Headshot_V2_normal.jpg", 
        "profile_image_url_https": "https://pbs.twimg.com/profile_images/1980294624/DJT_Headshot_V2_normal.jpg", 
        "profile_link_color": "0D5B73", 
        "profile_sidebar_border_color": "BDDCAD", 
        "profile_sidebar_fill_color": "C5CEC0", 
        "profile_text_color": "333333", 
        "profile_use_background_image": true, 
        "protected": false, 
        "screen_name": "realDonaldTrump", 
        "statuses_count": 32979, 
        "time_zone": "Eastern Time (US & Canada)", 
        "url": "https://t.co/mZB2hymxC9", 
        "utc_offset": -14400, 
        "verified": true
    }
}

4.An example using Streaming API to download tweets for a certain keyword or Hashtag

Streaming API provides programmatic access to monitor or process Tweets in real-time.Connecting to the streaming API requires keeping a persistent HTTP connection open.

In this example, I will use Streaming API to download all the tweets related to #love and save all the JSON response in a file.This script stream.py will run continuously listening to any real time tweet for #love.

You execute the script in the same way like we did last time.On your command line or terminal write python stream.py.

Before running this script make sure you have modified the script to include your API credentials and your favourite hashtag or search query.

# Authentication details. To obtain these visit dev.twitter.com
access_token = "your access token goes here"
access_token_secret = "your access token secret goes here"
consumer_key = "your consumer key goes here"
consumer_secret = "your consumer key secret goes here"
    #Hashtag to stream
    stream.filter(track=["#love"])  #Replace with your favorite hashtag or query

The JSON response returned is similar to the one returned by the previous example.Here is an attached JSON response file (13.9KB)

5.An example using Search API to download tweets for a certain search  keyword or Hashtag

The Twitter Search API is part of Twitter’s REST API. It allows queries against the indices of recent or popular Tweets and behaves similarly to, but not exactly like the Search feature available in Twitter mobile or web clients, such as Twitter.com search. The Twitter Search API searches against a sampling of recent Tweets published in the past 7 days.

Before getting involved, it’s important to know that the Search API is focused on relevance and not completeness. This means that some Tweets and users may be missing from search results. If you want to match for completeness you should consider using a Streaming API instead.

This example will use the search API to download JSON data from Twitter.The script for this example is search.py.

You execute the script the same like we did last time.On your command line or terminal write python search.py.

Before running this script make sure you have modified the script to include your API credentials and your favourite search query.

#Twitter API credentials
consumer_key = "your consumer key goes here"
consumer_secret = "your consumer key  secret goes here"
access_key = "your access token goes here"
access_secret = "your access token secret goes here"
#Put your search term
searchquery = "love"

Below are available Query operators you can use for the search term.

OperatorFinds tweets…
watching nowcontaining both “watching” and “now”. This is the default operator.
“happy hour”containing the exact phrase “happy hour”.
love OR hatecontaining either “love” or “hate” (or both).
beer -rootcontaining “beer” but not “root”.
#haikucontaining the hashtag “haiku”.
from:interiorsent from Twitter account “interior”.
list:NASA/astronauts-in-space-nowsent from a Twitter account in the NASA list astronauts-in-space-now
to:NASAa Tweet authored in reply to Twitter account “NASA”.
@NASAmentioning Twitter account “NASA”.
politics filter:safecontaining “politics” with Tweets marked as potentially sensitive removed.
puppy filter:mediacontaining “puppy” and an image or video.
puppy filter:native_videocontaining “puppy” and an uploaded video, Amplify video, Periscope, or Vine.
puppy filter:periscopecontaining “puppy” and a Periscope video URL.
puppy filter:vinecontaining “puppy” and a Vine.
puppy filter:imagescontaining “puppy” and links identified as photos, including third parties such as Instagram.
puppy filter:twimgcontaining “puppy” and a pic.twitter.com link representing one or more photos.
hilarious filter:linkscontaining “hilarious” and linking to URL.
superhero since:2015-12-21containing “superhero” and sent since date “2015-12-21” (year-month-day).
puppy until:2015-12-21containing “puppy” and sent before the date “2015-12-21”.
movie -scary : )containing “movie”, but not “scary”, and with a positive attitude.
flight : (containing “flight” and with a negative attitude.
traffic ?containing “traffic” and asking a question.

Here(1.92MB) is the JSON response file returned by this query.

Conclusion

The JSON format files returned by the scripts have a lot of information which can be consumed by other codes or programs for further data analysis.You can get tweet’s text,images,videos,retweet count,favourite count and much more information.

If you have any questions feel free to comment below.

Click Here to Leave a Comment Below 12 comments
ray - September 9, 2016

Thanks so much for the scripts – unfortunately the JSON one seems to have some problems (the third one). Everything works fine and the data is all there but it seems to be writing invalid JSON. Is there a fix? Every time I try and use a JSON to CSV/XLS converter it says the JSON is invalid.

Thanks.

Reply
daniel - September 19, 2016

Any ideas why the script bogs out at about 14KB and seems to write over itself? It pulls a few tweets, the file size freezes, reduces to a couple KB, bounces back up to about 14KB, then repeats. Basically it captures a couple tweets and that’s it. Like its looping over itself. Regardless thank you very much. I’ll keep looking into it. Sincerely – Daniel

Reply
    daniel - September 20, 2016

    The scripts wasn’t in “append”. Works great! Thanks!

    Reply
    Paulo - October 2, 2016

    Hi Daniel,

    Which script is having issues?

    Reply
daniel - October 26, 2016

Hi again! Again I love working with your scripts. Thank you. On the Hastag to JSON, time wsnt defined so I used import time and got through that problem, but I still have a problem with the code wanting to send my console to sleep (“sleeping….”). It sleeps continuously and I have not been able to get through this. So in other words when I execute the script it immediately goes to “sleeping…” and stays there. Any ideas on how to get around it? Thanks again….

Reply
Gathu - November 21, 2016

What exactly does this line do;

oldest = all_tweets[-1].id – 1

what is this .id method is it in tweepy??

Reply
    Paulo - November 24, 2016

    Hi Gathu,
    Tweets returned by Tweepy user_timeline method have “id” as one of the attributes.
    The line of code makes sure we don’t get duplicate tweets. Read this documentation to clearly understand why we need that line of code.

    Reply
Jean - January 17, 2017

Hi,

Can I first echo the messages of thanks from the other guys! These scripts really are great.

I started of with the scripts from your other page scraping data into excel but I wanted to get them in JSON form so tried to run these scripts. I am having a problem with the second script on this page – stream.py.

When I run the script It returns ‘401’ over and over – almost as if it generates a 401 as each tweet comes in. I assume this is a 401 error, however it doesn’t actually say this, just 401.

The other scripts work fine, and I had the stream script working into excell. Any idea what might be going wrong?

Thanks.

Reply

Leave a Reply: