This place holds (ongoing/archived) project ideas.
Last update: 2020-10
A great amount of information flowing around us is in the form of natural language (e.g., text). And many real-world scenarios are about information matching. For example, there are "suppliers" and "demanders" in the pool, and they have their interests which can be described in the text format. To give some examples,
A university student who is seeking a private tutor on a specific subject; a graduate student who is proficient in his/her major and is willing to teach as a private tutor;
A job seeker who wants to find a job that fits best his/her experience, skillset, and salary expectation; an employer (company or hiring manager) who wants to find the talent that is the best fit.
In such scenarios, both sides can post the information about their needs. However, in the real word, there is no a direct bridge to connect both sides, but often an intermediary agent is involved to make the connection. If NLP model/product can handle the information pool of both the "suppliers" and the "demanders", and provides optimized recommendations to each of them, it will be the bridge that brings the efficiency and effectiveness. Indeed, this NLP model can be thought as a matching system.
How can we make sure the NLP model will provide the relevant/optimized recommendation;
Related to question 1, we don't have the ground-truth data in the beginning. How can we let the model provide the recommendation in the very beginning?
Use the "attention" from the transformer-based language model.
Last update: 2020-10
This is a project about the application of NLP techniques. The goal is to (1) find some interesting results in the field/topic of Machine Learning and Artificial Intelligence, by analyzing texts from different sources. To be more specific, there are two big components as follows.
Twitter data analyses (ML and AI in the public world)
Collect twitter data during a period of time (could also be streaming data), with the filtering on the topics of our interest. In this case, use hashtag #MachineLearning and #AI;
Use LDA (topic model) to analyze tweets; extract the topics and their associated words;
Use sentiment analysis to analyze the sentiment of tweets regarding ML and AI;
Do some further analyses to mine more useful and interesting results. For example,
What are the topics and associated words for tweets discussing ML and AI;
What are the sentiments (Negative/Neutral/Positive) for tweets discussing ML and AI;
How the topics and sentiments change over time? And is there any pattern between the topics and sentiments? If so, what is that?
More...
Code repository: https://github.com/Chancylin/twitter_data_analysis
Text data analyses (ML and AI in academia)
Find the best/popular paper of top conferences in ML and AI;
Use web scraping to get the abstracts for those papers;
Use LDA (topic model) to analyze the paper abstract; extract the topics and their associated words;
Do some further analyses to mine more useful and interesting results. For example,
What are the trends of research interests over the last decade;
More...
Code repository: https://github.com/Chancylin/text_topic_analysis
And of course, we can combine the NLP results of the public and the academia, and to compare them. This will be helpful to answer some interesting and important questions, for example:
Is the public's understanding of ML and AI somehow synchronized with the academic research? Or there is still a huge gap between the public and academic?
Has the flourish of ML and AI research brought the positive effect on the public? Such as, people are more willing to welcome the fact that ML and AI have unprecedentedly impacted their daily life.