Tweets in JMP: Duplicate tweet.id numbers that are not duplicate tweets

LNitz — Fri, 09 Jun 2023 00:42:32 GMT

I have produced a file of 60,000 some tweets mentioning both "Hawaii" and "Covid."

If I use the Row Selection command to identify duplicates by Tweet.id, a presumably unique number, I get some 23,000 putative duplicates. This is a snippet:

What is clear is that the records grouped together under the same tweets.id are not the same records, judging by the author id, and most importantly, by the text. I stored all of the ID variables as text upon reading in with the jstor application.

Is it conceivable that the id numbers have been truncated? Tweet ids presumably are built in part from a timestamp, so that they are not likely to be consecutive numbers.

Re: Tweets in JMP: Duplicate tweet.id numbers that are not duplicate tweets

Craige_Hales — Sun, 28 Nov 2021 04:41:50 GMT

maybe this: https://developer.twitter.com/en/docs/twitter-ids

If you keep the ID in a numeric variable, you will lose some of the 64 bit integer data because there are only ~53 bits of fraction in a double.

topic Re: Tweets in JMP: Duplicate tweet.id numbers that are not duplicate tweets in Discussions

Tweets in JMP: Duplicate tweet.id numbers that are not duplicate tweets

Re: Tweets in JMP: Duplicate tweet.id numbers that are not duplicate tweets