Life in Estonia, part 8: Research

One sighs. One groans angrily. And a third voice spits out swear words. That’s us. That’s our office. And usually, when it happens, it will be followed by my boss commenting: „The sounds of making science.“ We may be staring at a computer screen all day, but this is indeed hard work. Thinking. Trying. Failing. Trying again.

When I procrastinate, I spend a lot of time looking at memes related with PhD or working in academia. One of my favorites is: “I didn’t want a 9 to 5 job, so I started a PhD, now I’m working 24/7”.There is definitely some truth in it. As a PhD student, you try to find a work-life balance. You are usually quite flexible in your work schedule, while you are doing literature research or statistical analysis, you can easily work from home, and nowadays all the classes are also online. And while it is easy to start every day at the same time, when do you stop? Some days I end my work day at four thirty, because I just can’t concentrate anymore and feel like I accomplished enough for one day. On other days I really get into a work flow, solve problems and get thrown out of the building by the night guard – yes, this happened. On other days (especially Fridays), my supervisor reminds me not to overwork, that I shouldn’t stay so long just because he does (well, I do start three hours earlier, so I think he has a point), and that I should take my weekends off.

After 7 months, my name is finally on the door, too

The thing is, when you are doing research, there is only one thing that everything else depends on: how much progress you make. The faster you get the analysis done, the faster you can start writing the paper. The faster you get the writing done, the sooner your peers can comment on it. The sooner you submit the article, the sooner you’ll get published. So every day you take off, every break you allow yourself, and every time you leave work early, you will feel bad for not making progress. Every weekend that I don’t work at all, I beat myself up for not working hard enough. Every day I don’t make a big step forward, I am scared that I will not manage this within four years (and I am only funded for four years, so after that, I will need to get money for rent and food from some other source).

A very clear research plan

But I also know that without enough sleep, without days of rest, my brain won’t function to its full potential. And my brain is the thing that all of this depends on (and Stackoverflow of course – so other people’s brains).

There are Saturdays where I sleep in, take time to cook healthy food and meet friends and feel good afterwards. There are Sundays on which I sleep in and do only one hour of reading or coding and I beat myself up for it. And then there are the best days: When I find a problem, solve it, and go home psyched about how much I learn and get done today. I eat dinner, I sit down on my yoga mat for my daily stretching routine and suddenly have an idea. So I reach over to my laptop, and right there on the ground, at half past eight in the evening, I open my statistics program and create a new variable that will tell me how long the interval between the first and the second calving of each cow was.

I go to the laboratory on weekends, because that means we can turn up the music and work without anyone walking by distracting us or needing any of the equipment that we are also working with. Sometimes I take another day off instead, but most of the time I do not, because I would feel too bad to do nothing on a Monday, and I also find the lab work relaxing, so it is absolutely not the same as doing my statistics.

our famous whiteboard

It is hard to get your research completely out of your head. And on other days it is hard to even get up and do any work at all. One day it is exciting and the next it is frustrating. You will spend months researching something that will ultimately make two sentences in your paper. You will spend hours reading scientific articles so that you can put a reference behind one sentence in your introduction. Every word in a research paper has been turned over three times.

So this is the life of a researcher in general. But many of you probably don’t really know what I am doing specifically.

So let me describe last Thursday, as it was a very typical day. Now, as a PhD student, I also have some classes, which take up around 6-7 hours per week plus some homework.

I have adapted to Estonian working hours and arrive at the university at around nine in the morning. Due to the pandemic, our building is currently closed for students and visitors, so I have to enter a door code to get access. Inside I find a ghost town. Not many people still actually come in, and those who do wear masks and usually don’t start before nine. I switch on my computer and open STATA, the statistics program that I use. On the big screen, I will have the window that gives me the output and also the table with all of my data, on the smaller screen I have my script. That is where I type in the commands, enter Ctrl+d and it will be done in the other window.

Today I am looking at the relationship between the concentrations of the protein haptoglobin as measured in the first three weeks of life of a cow and the weight gain of the cows until they are around 15 months old. My dataset includes the birth date and birth weight for each cow (144 of them were included in the study), their measurements of haptoglobin in week one, two and three of life. As not all calves were exactly the same age when the blood was taken, there is also a variable that says that “this calf was 12 days old for the two-week-measurement”, then I have the weight that was taken at approximately 15 months as well as the date on which exactly this weight was recorded, whether this calf was the first-born of its mother or not, if it had diarrhea during the time when the blood sample was taken, and if it was treated correctly for that. There is actually much more data in my dataset, but these are the things that I decide I will need. First I write a line of code that tells the program to calculate the average weight gain by dividing the weight difference between 15 months and birth by the number of days that has passed. Then I think of my model: outcome is the average weight gain per day per cow from birth to 15 months. The factor I want to find out if it can predict this is haptoglobin. Factors that can influence this relationship are birth weight, the exact age at measurement and blood sampling, whether it had been treated, how it was treated and maybe also the calf’s mother. But of course I cannot include three haptoglobin measurements per cow; that will mess up the accuracy of the relationship. So I decide to look at the second week right now. The model that I write into my script window will look something like that and create a so called linear regression:

xi:regress weight_gain15months hapto birth_weight i.diarrhea i.mother_heifer i.treat_group age age_15months if age>7&age<15

The program then tells me that there are only 65 cows that have data for each of these factors and are thus included in the model. There are different values that I can look at to see if my hypothesis (that haptoglobin is correlated with weight gain) is correct, and I there is an effect, how big it is.

Sadly, these values (coefficient, adjusted R², P-value and confidence interval) are not as I want them to be. So my brain goes to work. Ah yes, ideally, dairy cows should be inseminated at the age of 14 months. This means, that some of these 65 cows in my model will already be pregnant, That is another factor that heavily influences weight gain, of course, and this can “mask” the haptoglobin effect.

I have tried a couple of times to explain what this masking means. Let’s see. Imagine I want to fill a pool (this is my outcome). There is a toddler with a little bucket putting water into the pool (my predictor variable), and a big pipe directing water into the pool. I look only at the toddler and ask: Is he filling the pool? The answer is no. There is a lot of water filling the pool, but it is not coming from the toddler with the bucket. So I look around and find the hose. Now I ask: Is the toddler adding water to the pool if we turn off the hose? Bucket by bucket he adds water to the pool and I can measure how much it is if I want to (effect size or coefficient). It turns out, in my model, the hose is a so-called cofounder.

Adding the effect of the pregnancy to the model means turning the hose off so I can see if the toddler (haptoglobin) is having an effect, too.

So I need to create a variable for “pregnant at 15 months” and preferably, also how many days pregnant. For this, I need to get dates of inseminations, pregnancy check-ups and possible abortions for the cows. This is quite complicated and I don’t know how to do it. So I do what all programmers do: I see if anyone else has had a similar problem before on a page called Stackoverflow. Sometimes it is enough to google “how to calculate …in STATA” to find the code that somebody else has written and copy paste and adjust it to my own.

After over an hour of working on this, I can finally include this new variable into my model, and really, I get perfect results: The R², which measures the accuracy of the model (how much of the pool filling is explained by all the factors I am looking at) it at around 80%. The P-value, which tells me if my result is significant, is at less than 0.001 for haptoglobin, and the lower it is, the more we scientists like it. The coefficient is at -0.04. This means, that if the haptoglobin of the two week old cow is 1 mg/dl higher than that of another one, she will gain 40 grams less per day than the other one. Over time, this is actually quite a lot.

Before I get to do any more models, it is time for my weekly meeting with my second supervisor. He is a genius when it comes to STATA, so I get to ask him all the questions on how to best create the variables that will help me, how to fix mistakes in my code, and so on, and tell him about my progress. He also looks at the scientific poster that I have prepared for the conference at the end of the month and gives me some tips on how to improve it.

No time to rest; my wonderful colleague Elisabeth has arrived and we put on our lab coats. We need to measure the haptoglobin of more cows. This is done by a so called colorimetric method: We use different substances that will basically color-mark the haptoglobin in the blood serum, and our machine can measure the exact wavelength, from which the computer calculates back how much haptoglobin the blood contains. As the substances need to have time for their reactions, and we also have a lot of samples, this takes up a few hours.

in the lab

As I get back to the office, my first supervisor, the head of our chair of Clinical Veterinary Medicine and obviously head of our research team, has arrived. “Did you see the paper by Goetz, et al.?”, he asks. I have actually, it talks about weight gain in veal calves and also looks at the effect of haptoglobin. "I cannot believe they haven’t cited Leena’s article!” Leena is one of his former PhD students. I quickly open the PDF of the article he is referring to. It basically deals with the same topic, and I have been meaning to read it thoroughly for months, but never took the time. So now I do, and add a sentence to the draft of my own paper: “The negative association between haptoglobin and short term weight gain has been shown by Seppä-Lassila, et al (2018), and our current study suggests that the association is still visible after 15 months.”

Yes, a whole day of work for one sentence. I tell the boss what I discovered and he excitedly writes it on his whiteboard. “Interesting, that with the calves who had diarrhea, the effect stays, bit those from Leena’s study, who had respiratory disease, the difference in weight gain between high and low haptoglobin was gone after a few months!”

He sits down and I can see that his brain is working on an explanation for that.

We also need to discuss the exact plan for my next project, where I investigate the same things in sheep, the lectures I will start giving in September, and the masters’ thesis that I have to supervise next year.

But it is now already seven, I am hungry and tired, and it’s dark outside, so I pack up my stuff and head to my bike. We can continue this tomorrow.


Beliebte Posts

Zwischen Palmen und Plastikmüll

Essentials for your Estonian accent - a not-so scientific approach to linguistics

The Second Year, part I: Conference