Research: The Art of Throwing Cakes at The Wall
I am working on the preliminary research for an NSF proposal. Sounds fancy enough! In reality, this is what most research looks like at the beginning: I am grinding line-by-line through a 17,000-line Excel file, trying to reconcile this with a different 4,000-line Excel file. The reason is simple. People who enter institution names do it based on however they like. For example, my alma mater has the following possible names: University of Arkansas, University of Arkansas Main Campus, University of Arkansas at Fayetteville, “University of Arkansas, Fayetteville”, and I have seen them all used within a single federal dataset. To combine data from standard educational sources (IPEDS or Carnegie Classifications), we have to match these naming varieties.
Did I try AI? Yes I did. The results were inconsistent and hard to validate. In this case, I think that the manual process that I am doing might take longer, but is certainly safer and more deterministic in a research context rather than using AI for data merging. Anyway, I am about 6,000 lines into my source data, and then I realize, there is nothing fancy about doing research at all. That said, there is a reason why my students are hesitated in trying out research work. They perceive research as something mystical, some high-level cognitive activities that require significant brain power. In this essay, I want to say that it is not the case, and that research is mainly about grunt work, perseverance, and the patience to throw cakes at the wall until some stick.
Cakes, so much cakes
Modern media tends to glamorize research work by focusing mainly on the end, when the research results come to fruition. No one talks about the sheer amount of initial grunt work needed to get to that point though. I remember in one of the research projects I did during the last year of my PhD, we had to collect yearly publication data from the top 200 national universities in the U.S. On the average, these universities produce approximately 60-70,000 publications per year, which are indexed in Web of Science (WoS). Without a special (and expensive) license to mine, you have to download these publications manually. Since WoS allows 500 entries to be saved at a time, this is what I did for three weeks:
- Search for all publications from a university using its unique name
- Create a directory with the university’s name.
- Download results 1-500 and save to a file named 1-500.txt
- Download results 501-1000 and save to a file named 501-1000.txt
- …
- Rinse and repeat
It was mind numbingly boring work, but it was also the only way to trust the data. Without that data, we could not have written a paper that later led to a $600,000 NSF Award on studying the impact of high performance computing resources on research productivity of US institutions.
The majority of research work is just like that. I have listened to colleagues that are doing award-winning research in cancer treatment talk about injecting trial drugs into hundreds of laboratory mice for experimentation purposes. I have seen friends running hundreds or thousands of computational experiments for days only to adjust some tiny parameters and restart everything again. There are so many cakes laying around for you to pick up and throw at the wall just to see them fall off.
Throw faster, or keep throwing
Everybody can come up with some ideas. I think, with the exception of geniuses (aka the Einsteins of the world), we all have roughly similar degrees of creativity. Therefore, what sets more accomplished researchers apart is their ability to quickly try out different things until something is discovered (the cake is stuck on the wall!). Going back to my example of saving data, someone with a very strong web development background could write a Selenium/Beautiful Soup script that automates the saving process, and reduce the three weeks time down to days. I knew that at the time too, but after considering the time it takes to become reasonably sufficient at using Selenium/Beautiful Soup, I figure that I will keep clicking and saving.
In the end, whether it is fast or slow, you have to keep doing the research work. Only by running massive amounts of experiments, poring through thousands of lines of data, or poking hundreds of rat tails can you finally see some patterns emerge. Hey, something stuck to the wall. Perhaps, research is not about being smart (intelligence helps, but it is rarely decisive) but it is more about being stubborn.
Something stuck, now what
Throwing things at the wall is a surprisingly addictive activity. Once something sticks, a whole host of questions arises. Is it because of my throwing techniques? Is it because of the composition of the stuck cakes? Is it the milk, the egg, or the flour? Or is it because this wall is suspiciously porous and it is simply easier to stick things? This is where hypotheses actually begin. Maybe this will lead to a special brand of cake that never falls of a plate onto the floor!
Now that you have gotten something stuck to the wall, you might end up pursuing a career specializing in throwing things besides cakes at the wall. But at the very least, your arm and shoulder will have become very strong indeed!
This essay is the product of a procrastination process. I am now at line 8,000 out of 17,000.
Enjoy Reading This Article?
Here are some more articles you might like to read next: