«

»

Beware this scary thing Excel can do to your data!

[1] Heh, heh. I’ve always wanted to write a clickbait title. That was fun. Excel. Love it. Hate it. Most ecologists I know use it at least a little, including me. Now I know there are plenty of people who abhor the idea of using Excel for science. But Excel is a tool, just like any other. If you were to use a screwdriver to try to hammer a nail, then my dad – who taught me to respect tools and use them for their proper purposes – would be horrified. And if you were to use Excel to manage and analyze your dataset of tens of thousands of data points, I would be horrified. You could perhaps eventually manage to struggle through. But there are better tools.

Excel is probably best for things like manual data entry, especially if you constrain the format of your cells to only allow valid entries and save in a non-proprietary format. Excel can be fine for quick exploratory analysis of small datasets: means, standard deviations, sums… that sort of thing. It’s fast at making interpretable graphs. It’s a good exploratory viewer for datasets that you get from others – most of the time.

I say most of the time because Excel tries to think for you a little too much. When you open a file that is in CSV format (the best format to store your data in most of the time), Excel converts all the values to its best guess of the format you want to view them in. This is normally fine. But when it comes to dates and times, things can go wrong. Terribly wrong. For one thing, people store their dates and times in different formats according to culture. In the US, we (weirdly) prefer the month-day-year format. In Europe (and other places), the preferred format is the more logical day-month-year. So the value 03-04-2016 is ambiguous; it could be March 4, 2016 or April 3, 2016, depending on your cultural bias. (Times can be 12-hour or 24-hour, causing other problems.)

Each version of Excel has a setting as to which way to read dates, which it originally gets from your operating system, though you can switch it manually if you want to. So if you’ve got your computer set up with US defaults, Excel will display (and save) dates in month-day-year format. If you’ve got it set up with European defaults, Excel will use day-month-year format. This can be a problem if you’re collaborating with someone who uses a different format system than you and you’re using Excel spreadsheets to share data. (Been there. Done that.)

But it’s worse than that. Excel can actually change the format of the dates in a non-Excel file (e.g. CSV file) without your permission. Don’t believe me? Try it yourself:

  1. Start with a CSV file that contains a date in a year-month-day format – the type of format we scientists prefer because it’s unambiguous across cultures. (And, if you’re a data wrangler, because any type of sorting – numeric, date-based, alphabetical – puts the dates in proper order.) You can use one of my files if you like. Make sure it is saved to your computer. excel-before
  2. Open your file in Excel. Excel will automatically reformat your dates into something it prefers.
  3. Click “Save”.
  4. Excel will prompt you with an “Are you sure?” dialog. After all, you may lose your amazing formatting, graphs, and the like if you try to save as a CSV file instead of an Excel file.
  5. Click “No” because you don’t want to save (and overwrite) your CSV file.
  6. Excel will redirect you to the “Save As” dialog. But by now, the damage has been done! Click cancel or back (depending on your version of Excel) so that you don’t save anything.
  7. Close your file. When Excel asks if you want to save it, say no.
  8. Open your CSV file with a text editor and cringe as you see that Excel has changed the format of all your dates to month-day-year (or day-month-year) without your permission. excel-after

How do you avoid this? Some suggestions:

  1. Don’t use Excel’s “Save”. Only ever use “Save As”. Of course, this only works if you’re not an obsessive saver like me who clicks Ctrl-S every few minutes without even thinking about it. [2] Once upon a time, before a developer had ever dreamed up auto-save, it was quite easy to lose huge amounts of work because you forgot to save early and often. One day an eight-year-old girl was traumatized when her home lost power, deleting a story she had spent all afternoon writing. Never again, she vowed.
  2. Store dates in a format Excel doesn’t recognize. You can use an eight-digit string, such as YYYYMMDD. So that March 4, 2016 becomes 20160304. That’s not super easy to read as a human and it can get confused with integers, so I prefer the underscored version: YYYY_MM_DD. A date might look like 2016_03_04, but Excel has no idea it’s a date and so won’t try to auto-format it. The only downside is if you need interoperability along a data management pipeline. Packages for various programming languages will almost invariably recognize YYYY-MM-DD as a date, but you’ll have to write conversion routines if you want to use underscores in your dates instead of hyphens.
  3. Never open an original copy of a file in Excel. I do this frequently, too. Simply make a copy of a CSV file and open the copy in Excel. Store your originals somewhere where you’re not tempted to open them in Excel by mistake.
  4. Explicitly tell Excel not to convert the formats on your dates and times. This is especially useful for large files. You can import your CSV file rather than opening it. If you do this, you can manually tell Excel how to read each column. So for your dates and times, tell Excel that it’s a text field instead of a date or time one. The problem with this method is that it’s tedious and there’s no way to tell Excel to remember what you did for next time or set defaults. So you’ll have to go through the tedium every time you open a file.
  5. [UPDATED] Use separate columns for month, day, and year. [Thanks to Emily McKinnon and Kara Woo (in the comments), and Kristina Riemer (on Twitter) for pointing out this simple and useful workaround!]

Or you could just not use Excel. But since you probably will, just be aware that Excel can change the format of the data in your files (without you knowing) and take precautions as needed.

Permanent link to this article: http://ecologybits.com/index.php/2016/07/06/beware-this-scary-thing-excel-can-do-to-your-data/

11 pings

Skip to comment form

  1. Friday links: Most. Awkward. Department. Meeting. Ever. And more. | Dynamic Ecology

    […] Margaret Kosmala had a post warning about ways Excel can mess up data (and how to avoid the problem). (Jeremy adds: Hey, that was gonna be my link! Stop, link thief! Or I’ll say “stop” again!) […]

Leave a Reply to Margaret Kosmala Cancel reply

Your email address will not be published. Required fields are marked *