[Tutorial] How to scrape and analyze social media data
Articles Blog

[Tutorial] How to scrape and analyze social media data


I will now search online about the ‘Trade War’ issue using Twitter If you click Latest, tweets about the ‘Trade war’ will come up in the most recent order Now I will collect and analyze the Twitter data that was just shown. You can run the Social Media Collectors from Extension After running the Twitter Collector, you can start collecting data Once you log in to your account, you’re all set to start collecting data. Simply enter the collection standard, keywords to collect, and the maximum amount The Twitter Collector can collect up to 100 Tweets in 1~2 seconds. The process of collecting 2000 Tweets has been finished If you double-click the collected data, you can check the collected data You can access things like the time the tweet was written, hashtags, number of retweets, and the original text. Using the same method, you can collect data from other social media. Right now (July 2019) you can collect data from Facebook, Twitter, YouTube, and Instagram. Now I will collect data from Facebook From Facebook, you can collect data from fanpages Click the Log in button to log into your Facebook account This step is needed to get authorized to collect data from Facebook Once the log in is finished, the part for entering the collection conditions is activated First enter the name of the page. The name of a Facebook page can be found under the profile picture next to the @ sign. Now I will be collecting data from the Disney page You can select the time period from which the data will be collected. You can enter a specific time period, or collect from the most recent 30 days as listed in default Now enter the number of posts and comments to collect from. Additionally you can collect the posts made by the visitors on the page. Click start to begin the collection Once the collection is finished, a new dataset is added to the left-side area. You can check things like the amount of collected data in the dataset, the collection time, whether or not the text data is processed, and whether you’re the admin of the collected page or not. If you double-click the dataset, you can view the collected data Things like the context of the post, the date the post was created, the writer, likes, and comments can be seen Also, if you click on the comment tab, you can look at the information collected about the comments You can check the context of the comment, where the comment was posted, and the time the comment was written. Now I will collect data from YouTube. Like before, all you have to do to start collecting data is by logging into your Google account. Enter a word to search for in order to collect data The YouTube Collector uses this word to look for videos that contain the word in the video title, description, or uploader’s name I will now start collecting videos about BTS Enter the collection period and amount of videos to collect. I’m now selecting a standard to sort the collected data. For example, I could sort by higher relativity, or higher view counts. The collection is now complete. If you double-click the collected dataset, you can check information about uploaded videos and comments. You can check the title of the video, when the video was uploaded, the uploader, views, number of likes, number of comments, and video descriptions. You can also check when a comment was made, the uploader of the comment, the number of likes, the title of video the comment was made on, and the context of the comment. Lastly, I will be collecting data from Instagram. Using the Instagram Collector you can collect data from hashtags, usernames, and locations. I will now be collecting data from the hashtag #blacklivesmatter Enter the Search keyword and click Related Terms to select recommended words to search for. The collection is complete. If you double-click the dataset, you can check information about the post and comments You can check the post ID, date created, uploader, context, hashtags used, number of likes, and number of comments. Also, if you double-click on a collected post, you can see a pop-up of the picture that was uploaded with the post. If you click on the comment tab, you can check the uploader of the comment, and the context of the comment. Now I will go back to the Twitter data that I had first collected. If you select the collected dataset and click on Text Process(Preprocess), you can process the collected data. In other words, you can extract the words that were used in the text by tagging the part of speech and calculating the frequency/TF-IDF of the word. You can process the text by selecting options like the language of the text, which part of speech, and defining specific words with the user dictionary. For more detailed information on the text process, look for the second video of the NetMiner Tutorials. Once the text process is finished, the Text Proc will show the sign ‘Yes’ If you click on the processed dataset, you can see that a Word tab is added In the word tab you can check the words used in the Twitter posts, the frequency of the word, and which part of speech each word is in. If you click Import into NetMiner, you can import the collected data into NetMiner. Once you import the collected data into NetMiner, you can run analysis on the user network or do text mining analysis. I’ve now brought the collected into NetMiner. In the bottom left corner, a new Workfile is created. One is the user based data, and one is for text based data The user based Workfile contains information about the users and the network between users. You can easily check the ID, name, and number of followers of the Twitter user. Now I will find the topics from the Tweets about the Trade War The Plug-in that I’m currently running automatically analyzes the topic and visualizes the result. You can download this plug-in for free in the NetMiner homepage, and you can find more information on this in the description of this video. Now I’m using a topic modeling method called LDA to extract the topics. This plug in creates a word cloud to visualize which word is used the most frequently. Once the plug-in stops, the word groups appear. The blue dot represents the topic, and the keywords that are connected are used to figure out the topic Looking at the words Napa, wine, and tax that are connected to Topic-2, this topic is composed of the effects on the export of wine in Napa Valley, California due to the Trade War. Topic-5 is connected to the words Donald Trump, hat, and flag, which shows the topic is about Donald Trump’s supporters. Topic-4 is connected to the words Trump, acre, and farmer, which shows the topic is about the news of supporting the farmers that were affected by the Trade War. Topic-3 is connected to the words South Korea, and Japan, and from this we can tell that the topic is about the trade conflict between Korea and Japan Lastly, Topic-1 is connected to India, and benefit, which indicates the topic is about the news that India is seeking profit from the US Trade War.

Leave a Reply

Your email address will not be published. Required fields are marked *

Back To Top