Microsoft Excel is Screwing Up Science
Automated features in productivity software are supposed to save us time and effort. But sometimes, these features insert errors into our work. That's the case for an alarming number of academic papers about genetics. Here's the problem: many academic papers have supplemental files filled with charts, tables and other data. Ideally, those files support the paper and provide data for other researchers in the future. But an automated feature in Excel converts some gene names into other types of information, such as dates or floating-point numbers. That ends up causing confusion and inserting errors into scientific publications. Here's an example. There's a gene called Membrane-Associated Ring Finger (C3HC4), E3 Ubiquitin Protein Ligase. But that's a bit of a mouthful and so the accepted gene symbol is MARCH1.
Unfortunately, Excel sees MARCH1 and assumes it's a date, changing it to 1-Mar. And this sort of accidental conversion happens all the time! Scientists first pointed out the problem back in 2004 and it has persisted since then. Based on the research of Mark Ziemann, Yotam Eren and Assam El-Osta, gene name conversion errors appear in about 20 percent of all papers with supplemental spreadsheet files. The researchers looked at more than 35,000 supplemental Excel files attached to papers related to genetic research. They used automated software to search for anything that looked like lists of genes and narrowed the field to 3,597 papers with supplemental files. They screened for 10 known false-positive cases and found them in files attached to 704 published papers. That's 19.6 percent of all the papers they screened. And while we've all experienced autocorrect changing the meaning of text messages with hilarious results, in this case it's no laughing matter. As other scientists use these files to perform further research, the errors slow things down and make research more difficult. This could cause delays in significant scientific advances. It's a big deal.
To make matters more frustrating, there's no way to turn off the automated conversion feature permanently. However, the researchers noted that Google Sheets doesn't perform these automated conversions, and copying from Google Sheets into another spreadsheet program preserved the formatting. Maybe productivity software publishers will build in options to allow people to permanently disable these conversion features. Until then, it's going to fall to some poor research assistant to double check massive lists of gene names. Or just switch to Google Sheets.
Unfortunately, Excel sees MARCH1 and assumes it's a date, changing it to 1-Mar. And this sort of accidental conversion happens all the time! Scientists first pointed out the problem back in 2004 and it has persisted since then. Based on the research of Mark Ziemann, Yotam Eren and Assam El-Osta, gene name conversion errors appear in about 20 percent of all papers with supplemental spreadsheet files. The researchers looked at more than 35,000 supplemental Excel files attached to papers related to genetic research. They used automated software to search for anything that looked like lists of genes and narrowed the field to 3,597 papers with supplemental files. They screened for 10 known false-positive cases and found them in files attached to 704 published papers. That's 19.6 percent of all the papers they screened. And while we've all experienced autocorrect changing the meaning of text messages with hilarious results, in this case it's no laughing matter. As other scientists use these files to perform further research, the errors slow things down and make research more difficult. This could cause delays in significant scientific advances. It's a big deal.
To make matters more frustrating, there's no way to turn off the automated conversion feature permanently. However, the researchers noted that Google Sheets doesn't perform these automated conversions, and copying from Google Sheets into another spreadsheet program preserved the formatting. Maybe productivity software publishers will build in options to allow people to permanently disable these conversion features. Until then, it's going to fall to some poor research assistant to double check massive lists of gene names. Or just switch to Google Sheets.
Comments
Post a Comment