- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Get Direct Link
- Report Inappropriate Content
Tweets in JMP: Duplicate tweet.id numbers that are not duplicate tweets
I have produced a file of 60,000 some tweets mentioning both "Hawaii" and "Covid."
If I use the Row Selection command to identify duplicates by Tweet.id, a presumably unique number, I get some 23,000 putative duplicates. This is a snippet:
What is clear is that the records grouped together under the same tweets.id are not the same records, judging by the author id, and most importantly, by the text. I stored all of the ID variables as text upon reading in with the jstor application.
Is it conceivable that the id numbers have been truncated? Tweet ids presumably are built in part from a timestamp, so that they are not likely to be consecutive numbers.
Accepted Solutions
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Get Direct Link
- Report Inappropriate Content
Re: Tweets in JMP: Duplicate tweet.id numbers that are not duplicate tweets
maybe this: https://developer.twitter.com/en/docs/twitter-ids
If you keep the ID in a numeric variable, you will lose some of the 64 bit integer data because there are only ~53 bits of fraction in a double.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Get Direct Link
- Report Inappropriate Content
Re: Tweets in JMP: Duplicate tweet.id numbers that are not duplicate tweets
maybe this: https://developer.twitter.com/en/docs/twitter-ids
If you keep the ID in a numeric variable, you will lose some of the 64 bit integer data because there are only ~53 bits of fraction in a double.