Next Steps
Reflection
Text analysis serves as a powerful tool for understanding market trends. Throughout April, a noticeable correlation emerged between the headlines on CNBC and the prevailing market trends. We noticed that certain terms were connected to these upward and downward spikes in the market graph and studied the analysis between them to give us our data presented in our site.
Things to Improve:
Content
To enhance insights, we should consider collecting more data from a wider range of sources. This could be done by collecting from other new marketing sites like MarketWatch, The Motley Fool UK, and Yahoo Finance, just name a few.
WebScraper
Our webscraper had some issues with sentmient for a couple of words. We would do this differently in the future by double checking the sentiment of registered words by hand.
Formatting
Our file extensions got in our way for a bit in the beginning when our webscraper outputted the collected data as a JSON extension. In order to use our data, we had to change our Corpus extension to XML using a converter plugin. This let us have more flexibility in processing our collected text.
Sentiment Scope
We could also further develop by training our natural language processing (NLP) models on a more comprehensive dataset. That would take the form of improving the accuracy and depth of your text analysis to better understand and predict market trends. This first became apparent to us when our first grouping of most frequent words included the term "said", which our NLP listed as the #1 most used term.
What's Next?
If we continue our work on this project, we would be interested in adding other finance sites to analyze. The previous sites mentioned above would be a good place to start. This would open up an avenue to let us compare terms between each sites and to also rate their difference in sentiment for key stocks mentioned on specific dates.