Skip to content

Poll User Data daily#30

Open
tdraebing wants to merge 6 commits into
Data4Democracy:masterfrom
tdraebing:b-poll_users
Open

Poll User Data daily#30
tdraebing wants to merge 6 commits into
Data4Democracy:masterfrom
tdraebing:b-poll_users

Conversation

@tdraebing

Copy link
Copy Markdown

Implementing a daily poll of twitter user data as described in issue #16.

Running the docker instance now will next indexing tweets with specified topics also fetch user data once a day.
The timing is achieved by extending the original crontab-script.
The index_user_profiles.py-script takes a file listing the user-IDs of the user data to be pulled separated by line breaks. It uses the lookup_users() method of tweepy to fetch the data from Twitter's REST API and hands it to ElasticSearch for indexing.

The following features are extracted from the full set of user features provided by the twitter API:

  • 'timestamp'
  • 'id'
  • 'name'
  • 'screen_name'
  • 'followers_count'
  • 'friends_count'
  • 'location'
  • 'description'
  • 'favourites_count'
  • 'statuses_count'
  • 'listed_count'
  • 'profile_background_image_url'
  • 'profile_image_url'

If you have further suggestions or found bugs, I would be happy to deal with those as well.

Cheers,
Thomas

@hadoopjax

Copy link
Copy Markdown
Contributor

Thanks @tdraebing ! Let me pull this down and play with it over the next couple days but thanks a ton for the work! I'll get back with a review no later than Thursday.

@hadoopjax hadoopjax self-requested a review March 1, 2017 01:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants