<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: Tweets in JMP:  Duplicate tweet.id numbers that are not duplicate tweets in Discussions</title>
    <link>https://community.jmp.com/t5/Discussions/Tweets-in-JMP-Duplicate-tweet-id-numbers-that-are-not-duplicate/m-p/440237#M68862</link>
    <description>&lt;P&gt;maybe this: &lt;A href="https://developer.twitter.com/en/docs/twitter-ids" target="_self"&gt;https://developer.twitter.com/en/docs/twitter-ids&lt;/A&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;If you keep the ID in a numeric variable, you will lose some of the 64 bit integer data because there are only ~53 bits of fraction in a double.&lt;/P&gt;</description>
    <pubDate>Sun, 28 Nov 2021 04:41:50 GMT</pubDate>
    <dc:creator>Craige_Hales</dc:creator>
    <dc:date>2021-11-28T04:41:50Z</dc:date>
    <item>
      <title>Tweets in JMP:  Duplicate tweet.id numbers that are not duplicate tweets</title>
      <link>https://community.jmp.com/t5/Discussions/Tweets-in-JMP-Duplicate-tweet-id-numbers-that-are-not-duplicate/m-p/440184#M68859</link>
      <description>&lt;P&gt;I have produced a file of 60,000 some tweets mentioning both "Hawaii" and "Covid."&lt;/P&gt;
&lt;P&gt;If I use the Row Selection command to identify duplicates by Tweet.id, a presumably unique number, I get some 23,000 putative duplicates. This is a snippet:&lt;/P&gt;
&lt;P&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="LNitz_0-1638066135279.png" style="width: 400px;"&gt;&lt;img src="https://community.jmp.com/t5/image/serverpage/image-id/37930i3BD221F24793FA40/image-size/medium?v=v2&amp;amp;px=400" role="button" title="LNitz_0-1638066135279.png" alt="LNitz_0-1638066135279.png" /&gt;&lt;/span&gt;&lt;/P&gt;
&lt;P&gt;What is clear is that the records grouped together under the same tweets.id are not the same records, judging by the author id, and most importantly, by the text.&amp;nbsp; I stored all of the ID variables as text upon reading in with the jstor application.&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Is it conceivable that the id numbers have been truncated?&amp;nbsp; Tweet ids presumably are built in part from a timestamp, so that they are not likely to be consecutive numbers.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Fri, 09 Jun 2023 00:42:32 GMT</pubDate>
      <guid>https://community.jmp.com/t5/Discussions/Tweets-in-JMP-Duplicate-tweet-id-numbers-that-are-not-duplicate/m-p/440184#M68859</guid>
      <dc:creator>LNitz</dc:creator>
      <dc:date>2023-06-09T00:42:32Z</dc:date>
    </item>
    <item>
      <title>Re: Tweets in JMP:  Duplicate tweet.id numbers that are not duplicate tweets</title>
      <link>https://community.jmp.com/t5/Discussions/Tweets-in-JMP-Duplicate-tweet-id-numbers-that-are-not-duplicate/m-p/440237#M68862</link>
      <description>&lt;P&gt;maybe this: &lt;A href="https://developer.twitter.com/en/docs/twitter-ids" target="_self"&gt;https://developer.twitter.com/en/docs/twitter-ids&lt;/A&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;If you keep the ID in a numeric variable, you will lose some of the 64 bit integer data because there are only ~53 bits of fraction in a double.&lt;/P&gt;</description>
      <pubDate>Sun, 28 Nov 2021 04:41:50 GMT</pubDate>
      <guid>https://community.jmp.com/t5/Discussions/Tweets-in-JMP-Duplicate-tweet-id-numbers-that-are-not-duplicate/m-p/440237#M68862</guid>
      <dc:creator>Craige_Hales</dc:creator>
      <dc:date>2021-11-28T04:41:50Z</dc:date>
    </item>
  </channel>
</rss>

