Academia and the Enterprise – Steve Zimmerman

Academia and the Enterprise

It is an honour to be asked by a highly respected contributor to the enterprise search community to share my journey from academia into the enterprise. Admittedly, it has been an unusual journey, so perhaps it’s best to say a bit about where things are at the exact moment before diving into the details.

Now

Currently, I am a Senior Data Scientist in the NLP team at a large multinational, and there has never been a more interesting time to work in search and NLP. This is a strong statement given my journey into search and NLP, which began 10 years ago, has always been fascinating. So what makes this journey even more fascinating now? Probably not surprising to you, the latest generation of large language models (LLMs) is what has made the work even more interesting. A former colleague told me about ChatGPT on December 1st and said it will be as big as Google.

It’s now just over 4 months later, and I tend to agree with my former colleague’s assessment that ChatGPT is indeed just as big as the release of Google (more specifically the release of PageRank). The initial impact of ChatGPT is so large that Southpark recently aired an entire episode about the it’s powers and the related dangers (co-written with ChatGPT nonetheless), and I note there is still yet to be an episode devoted to the release of Google.

In actuality, there is nothing that new with respect to ChatGPT, it builds upon an existing body of research in the space of generative AI, so it isn’t that revolutionary. Yes, there has been some buzz over the past few years around models like DALLE, as well as the topics of deep fakes. And in recent years, usage of this flavour of models has greatly simplified development of solutions to many difficult problems. But, this is the first generative LLM that got the attention of the masses and further permitted the masses to easily interact.

Personally, it blew my mind away, as it was the first AI based interactive dialogue system that had a feeling of being “real”. Very quickly though, my antennae began to wiggle a bit, as I found big holes in many of the legitimate sounding responses it gave. Of course, those in the business of AI and NLP, refer to these legitimate sounding big holes as “hallucination”.

Most of us, I assume, were taught at a quite young age the inherent risks of hallucination within self and/or those around you. So one might ask, should there be so much belief in a capability from which its designers caution us that it will “hallucinate” from time to time? I leave that to you to decide.

For me, this question ties directly back to my academic research centred in the somewhat recent emergent field of interactive information retrieval (IIR), which focused on risk mitigation of harms on the Web. And in my view, due to this latest technology, there has never been a greater potential for harm, and paradoxically there never has been a greater potential for benefit. It turns out, there has never been a more important time for IIR to play a role in the development of methods and evaluation approaches for the safe usage of this capability. Yes, I might be biassed given my research background in IIR, but it is my research background that explains a huge part of the excitement I currently feel. ChatGPT opens up the door to many new research avenues to explore, the research possibilities on the Web and in the Enterprise are not only massive, but highly important.

Before Now

Perhaps interesting to some, I come from a background of computer scientists that worked at some large players in the tech industry. Admittedly, I was avoidant to go down this path as my parents worked insane hours well into my adulthood. I thought they were mad! So I stayed away from this area for a while. And here I find myself today working insane (but not quite as insane) hours in computing.

My first dive into computing was in the post 9/11 era when I finished undergrad and found jobs were in short supply. While working various menial jobs as a contractor, I took a few computing courses at Northeastern in Boston and in a very short time found myself working full-time as a programmer at a large financial company.

After 5 years in technology, I took a pause to explore the possibility of graduate studies in atmospheric physics through coursework at Cornell. After a couple of years building up the fundamentals of atmospheric science, I found myself much more interested in the computing aspects and much less interested in deriving the fluid dynamics of the atmosphere (though that is still very interesting too). Though the rigour and demands at this university developed my abilities to solve difficult problems independently (a necessary skill for a PhD), I no longer felt excitement about an academic career in atmospheric sciences.

It was around this time in 2013 that I first heard about NLP and the emergent career of data science (via a well known article on the topic), and it immediately sparked a flame in me. Low and behold, a well timed life event led me to relocate to England with a simultaneous opportunity arising to join a newly created MSc programme that focused on NLP and search. I remember telling my classmates in the atmospheric sciences lab about my plans, to which one said “It sounds like you are going to work for SkyNet”.

Well, I don’t work for SkyNet (yet), but the main takeaway here is the events in early 2013 led to the subsequent chain of events to this article right now.

And now I have a PhD in the field of IIR, so what happened?

Sure, maybe I have some aptitude for work and research in modern computing, but I also admit a lot of it comes down to good timing and lucky connections. For instance, my NLP course was taught by Udo Kruschwitz, and his eager presentation of the topics in NLP and IR were infectious, which made this area even more interesting. He encouraged our class to attend the London Text Analytics Meetup which he co-ran with other well known folks in NLP/IR such as Tony Russell-Rose. And it was at these events where I connected with many different companies building interesting products and were all hiring!

It was through the Text Analytics meetup that I connected with Miguel Martinez, and this connection led to my first job in NLP as an intern between my first and second year in my MSc. What a fun time! It was during the seed round funding period for a small startup in a garage in Belsize Park, which has now turned into a much larger company called Signal AI. After completing my MSc, I found full time work in the data science team of a large newspaper, developing document classification pipelines and prototype recommender engines.

Here again, timing played an important role. Udo Kruschwitz contacted me about an ESRC funded research grant that looked at human rights in the digital age, and the timing was perfect in the sense of my concerns about the world. In particular, I was very concerned about unmitigated online misinformation campaigns on various topics. For example, the dialogue surrounding justification for Brexit had many false claims that spread like wildfire. It would be dishonest to say that June 23rd 2016 did not help me clarify the main points of my PhD application to focus on harm mitigation on the Web.

Initially, my research focused on hate speech mitigation, but very quickly realised the research problem was potentially intractable due to free-speech concerns. Nonetheless, my initial research was published at LREC, which was a good foot to start on. A key lesson from this experience was that my ideology of AI solving the world’s ugly problems would not be realised without taking into consideration the psychology of humans.

Around the time I submitted my paper to LREC for review, two things happened. First, my application to the Autumn School for Information Retrieval and Foraging (ASIRF) at Dagstuhl was confirmed. Second, a fellow PhD student in the Psychology department researching judgement and decision making in medicine lent me his copy of Daniel Kahneman’s “Thinking, Fast and Slow”. Attendance at ASIRF introduced me to many great researchers, most notably David Elsweiler, who lectured on the fundamentals of IIR studies. The book and the Autumn school were the foundation for a rapid update to my PhD research plan to include the consideration of the human in the system. This shift in research led to co-authored papers with David Elsweiler and the aforementioned PhD student (Alistair Thorpe).

Concurrently to my PhD research, my PhD advisor (who has many industry links), encouraged me to explore avenues in the private sector. He connected me with an enterprise search expert at a large energy company based in London, which led me to an internship which took place during my PhD. This internship transitioned to my current full-time role as a search and NLP researcher in the private sector. At the moment, my research is predominantly in the private sector, and heavily focused on enterprise search. Applications of NLP and search have been interesting to me from the first day I set foot into the field, and I find it more interesting now than at any point of my career.

I close with some key learnings from my experience.

For those considering an advanced degree in Search/NLP

Consider an interdisciplinary approach to your research. Though at the core my research was in computer science, it considered research in a broad set of fields. In the modern age, my view is we cannot afford to take a narrow view on the problems we face.
PhD’s are a huge commitment, and strongly recommend against self-funding.
Ideology is a great motivator for research, but be prepared to let it go. My experiences with hate speech research taught me a lot about this matter.

For those in a PhD (or recently signed up for one)

Get your hands dirty early in your PhD. Build some experiments and try publishing your findings as soon as you can.
Apply for a doctoral consortium. SIGIR kindly accepted my application and furthermore covered my expenses to/from the event. This is an experience that you should not miss!
Apply for and attend “summer” schools. In addition to ASIRF, I attended the summer school for Bounded Rationality at the Max Planck Institute for Human Development. Both of these experiences provided strong foundational knowledge for my PhD, and furthermore connections to several co-authors.
Take a pause and do an internship / placement at a company. It’s important to get a feel if you want to be in academia, private sector, or a bit of both.

Academia and Industry – It’s a spectrum, find what’s right for you after your PhD. Some considerations and possibilities:

Evaluation is much more straightforward in academia than in the private sector. Academia is contained and offers great experimental control. Industry has many moving parts, and many people to work with.
Pure Industry will pay a lot more, but pure academia will give you a lot more freedom (though freedom has eroded greatly in recent years)
Just as in academia, industry offers the opportunity to investigate interesting research problems in search and NLP. However in industry, the problem is typically business driven, and thus much easier to define.
Some private sector companies offer research positions which allocate some time for academic work outside of the company.
It is quite common for folks with full-time academic appointments to do side research in the private sector.
You can work in the private sector and still keep an academic affiliation to conduct research you wish to continue doing on the side
If you really feel the pull of a full-time academic appointment, talk to people in their work and understand fully what is involved, it is much more than research. You will also have responsibilities of creating course syllabuses, teaching slides, assignment marking, administrative work; very different from a PhD or post-doc.

NOTE: A ChatGPT cleansed version of this article can be found here.

Leave a comment Cancel reply