An interview with William Wang – KSJ Award Winner 2022

William Wnag - Winner of the Karen Spärck Jones Award 2022
A very worthy winner: William Wang answering questions following his Karen Spärck Jones Award keynote talk

I asked William Wang, who at ECIR 2023 was presented with the Karen Spark Jones Award for 2022, if he could respond to a series of questions about his background and career. I am most grateful to William for the care he put into his replies.

What were your aspirations at high school?

I have been interested in computers since my father bought me an Intel-586 desktop in elementary school. During my junior high and high school years, I was passionate about writing HTML, PHP, and ASP for building websites that provide knowledge for online games. Creating online resources for gamers was a great way to share knowledge and expertise, and it can help fellow gamers improve their skills and enjoy the game even more. It requires a lot of hard work and dedication. Since then, I have become interested in building better technology to provide people with better access to knowledg

What led you towards your choice of university/course?

My passion for computing has led me to choose Computer Science as a major in college. In my third year, I had an opportunity to intern at the Chinese Academy of Sciences, which was the beginning of my research area. After finishing my undergrad, I became a Master’s student working with Kathy McKeown and Julia Hirschberg, and at Columbia, I learned the basics of Natural Language Processing.  After Columbia, I moved to Carnegie Mellon University to pursue my Doctoral degree with William Cohen. It was my passion for AI research that took me to Carnegie Mellon.

When did you first become aware of the contribution that Karen Spark Jones had made to IR?

It started in my MS course with Kathy at Columbia, where I needed to implement the TFIDF algorithm for a question answering project.  I was amazed you could use such a simple and elegant approach to rank documents, and it generalizes well to unseen domains, because no training is needed. It is inspiring to read Karen’s work and her long-lasting impact on IR and NLP: till today, TFIDF is still an industry standard for IR.

Were there any light-bulb moments during your time at university which helped you define the road you wanted to take?

I have decided to pursue research in NLP, because the combination of Information Retrieval (IR) and Natural Language Processing (NLP) has tremendous potential to revolutionize AI research. With IR, machines can effectively sift through vast amounts of data to find relevant information. At the same time, NLP allows machines to understand and interpret human language, enabling more nuanced and sophisticated user interactions. By leveraging the strengths of both fields, we can create powerful AI applications that can automate complex tasks, provide personalized recommendations, and even mimic human-like conversations. As we continue to develop and refine these technologies, the future of AI research undoubtedly lies in the integration of IR and NLP.

Were there academics who you found especially inspiring

Kathy McKeown and Lise Getoor are inspiring academics because of their groundbreaking research and contributions to their respective fields. McKeown, a professor of Computer Science at Columbia University, is a pioneer in natural language processing and has made significant contributions to the development of summarization and natural language generation. Getoor, a professor of Computer Science at the University of California, Santa Cruz, is a leading expert in machine learning and statistical relational learning, particularly in probabilistic reasoning. McKeown and Getoor received numerous awards and honors and mentored many Ph.D. students throughout their careers, cementing their status as influential figures in academia. Throughout my personal interactions with them, I found that it is not only that their work continues to inspire and shape the future of their fields but via their generous and dedicated mentoring of students and junior researchers in their fields.

You seem to have a significant interest in teaching – what was the catalyst for that and what do you get out of teaching?

Teaching is an effective way to enhance one’s learning experience. When I teach a subject, I am forced to break it down into smaller, more manageable parts, and then explain it in a clear and concise manner. This process allows me to identify gaps in my understanding and deepen my knowledge of the subject matter. For example, my Ph.D. thesis was not in neural networks, but I learned everything about deep learning by preparing my teaching materials at UCSB.  Teaching others exposes us to different perspectives and questions, which can challenge our assumptions and expand our thinking. Overall, teaching can be a valuable tool for self-improvement and personal growth.

How do you cope with juggling research, teaching and the management of the Centre? 

Identifying priorities is my secrete to effective time management. Without knowing the most important and urgent tasks, I would waste time on less significant activities, leading to stress, burnout, and missed deadlines. By prioritizing tasks, I can focus my energy and time on completing the most crucial tasks, ensuring that important goals are met on time. It also helps avoid multitasking, decreasing productivity, and increasing errors. I always find that identifying priorities can help me make informed decisions, allocate resources, and achieve my objectives efficiently, leading to better outcomes for my team.

What are some of your objectives (both research and personal) over the next few years?

The field of large language models (LLMs) has seen tremendous growth in recent years, with models like GPT-3 exhibiting impressive language generation capabilities. This is a unique opportunity for AI and Computing research.  I am interested in improving model interpretability, enhancing their ability to understand and use external knowledge, and addressing issues of factuality. Additionally, exploring ways to integrate multimodal information like images and videos into LLMs could be a promising direction.

Finally, I also see that addressing the issues of privacy, security, and the impact of LLM-generated content on society would be critical.

Personally, I hope my students can all be successful at the time of economic resession.

Leave a comment

Your email address will not be published. Required fields are marked *