Making science products Open: an informal guide to copyright and licensing

I grew up a hacker (in the original sense) and thus a True Believer in open knowledge. And so, when it came time to start publishing science, I figured I’d make all my products Open. But it turns out that there’s a bewildering array of things to think about if you want to do so. More recently, I’ve been wanting to incorporate other people’s creations in my own, and have encountered various difficulties in using Open products. I’m writing this post, in part, so I have notes I can easily reference in the future. But I figure if it helps me, it can help others, so here you go.

I have to put a note here that I am not a lawyer, and so this is not legal advice. This is just my good faith understanding of the intersection of U.S. copyright law, licensing, and academic products.

What is copyright, and why do I care?

When you make a Thing, you get to decide how to it’s used and how to distribute it to other people. That’s copyright. The sorts of Things we’re concerned with here are scientific writing (journal papers, reports, dissertations, etc.) and other media (photos, video, audio, etc.), scientific data, and software. You’ll see these Things referred to as “creative works” if you read a lot about copyright. Copyright is a type of intellectual property, and is different from patents, which cover inventions [1] specifically a physical thing or a process, and trademarks, which distinguish products and services from similar ones. And most likely, if you make a scientific Thing, you are automatically granted copyright. [2] There are exceptions, though. If you work for the U.S. government, your Things will automatically be in the public domain. And if you are the employee of a University or other institute, you may have signed away your rights in that flurry of paperwork you got when you were hired; in other words, your institution may own the copyright on Things you make, not you.

What do I do with my copyright?

Whatever you want.

The historical use of copyright goes something like this… I wrote a scientific paper and now Journal of Things (JoT) wants to publish it. I assign a license to JoT saying that they can use my writing to make a new Thing — a journal article — and that this journal article can be disseminated as JoT sees fit. Note that I retain the copyright to my actual writing, but JoT has copyright to the formatted, spiffed-up, published version. Now, let’s say someone else wants to use a figure from the published article, they now need JoT to assign a license to them for the use of that figure.

This model of assignment can work fine if the Thing you make is just used once or twice by others, or if you feel strongly about how your Thing is used and distributed. But otherwise, it can get cumbersome. Instead of (or in addition to) assigning licenses on a case-by-case basis, you can assign a general non-exclusive license that automatically allows people to use and disseminate your Things.

How do I assign one of these general non-exclusive licenses?

The first thing you have to do is pick one. And sadly, there are a lot of options for you out there. I really like the Choose A License site to get a sense of what the possibilities are. But if you just have time for a single blog post, here’s a quick run-down. Answer these questions:

  • Are you willing to let your Thing be distributed to anyone who wants it, free of charge?
  • Are you willing to let your Thing be modified into some other Thing by others? (e.g. If you take a picture that someone else wants to use, is it okay if they crop it differently or change the lighting or include it in a collage?)
  • Are you willing to let your Thing and its modifications be distributed by someone else for commercial purposes? (i.e. They might make money off of it.)
  • Do you require attribution? (i.e. You require that your name be attached to your Thing.)
  • Do you want to make sure everyone who uses or distributes your Thing (or modifications of it) uses the same set of answers to these questions as you do?

This seems straightforward enough until you realize that your answers to these questions might have complicated ramifications. For example, if you decide you do not want your beautiful photo of a rail to be used for commercial purposes without your explicit permission, I would totally understand that. But what that means is that when I want to use it in my Ecology article, I probably still need to contact your for explicit permission. That’s because Ecology, although a publication of the non-profit Ecological Society of America, is published by Wiley, a for-profit publisher. This is, of course, a murky area, but none of us are lawyers, right? So I should ask permission. Now, if you had put an open license on that image that didn’t curtail commercial use, then I could have used it in my article without asking. Even within the Open Source community, there are arguments about which are the best licenses to use. (That’s why there are so many of them.)

Ugh, this all sounds like a lot of effort. What if I just don’t do anything?

If you don’t do anything, you retain the strictest copyright allowable under law. In other words, if you don’t assign a general license to your Thing, then legally, it can’t be used, modified, or disseminated by anyone else without getting explicit permission from you.

Well, huh. I’d like to be more Open than that. What do you suggest?

Here’s where I’m at in my thinking of open licenses, though my thoughts may continue to evolve. For creative things I write, such as blog posts, scientific articles, and so forth, I usually retain full copyright, and don’t assign an open license.

For other media, such as photos, videos, and audio, I typically assign Creative Commons license CC BY. I used to care more about commercial use and so some of my stuff is licensed CC BY-NC. But as someone who’s been stymied by the NC (“non-commercial”) designation when trying to use something for not-for-profit purposes because there’s an awful gray area, I’ve given it up. If there is something that I think might have actual commercial value (such as our Snapshot Serengeti photos), I am more conservative with licensing and will slap on an NC. If anyone does wants to use it for a commercial purpose, they can ask and I can issue a separate non-exclusive commercial license that provides me with some slice of the income (as royalties or a one-time payment).

I also used to be a fan of Creative Commons’ “share alike” (SA) restriction, e.g. CC BY-NC-SA, which forces people who use your Thing to use the same license as you. But I’ve found that such “copylefts” are severely limiting for reuse of material. For example, I am never going to be able to persuade a publisher — even a clearly non-profit one — to make a journal article CC BY-NC-SA, so if you give that license to your rail photo, I’m going to have to ask you for explicit permission if I want to use it in an article. Every. Single. Time. So for me, CC BY is where it’s at, unless I think my Thing has actual commercial value. It essentially mirrors what we do in academia already: reuse and distribute work with attribution.

For data, I make it truly Open. I assign it to the public domain, meaning that anyone can use it for any purpose, without attribution. I do this both because it aligns with standard academic practice and because I don’t want anything to get in the way of anyone using my data. [3] Note: please use my data! (Of course, there are potential ramifications of doing so.)

I divide code into two types: code that I consider “end code” that is very specific to particular scientific study and “general code” that might reasonably be expected to be built upon by others. An example of the former is the specific agent-based model I used for a paper on disease dynamics. And for this sort of code, I tend towards a CC BY license because it’s simple and easy and I don’t have much expectation of reuse. An example of the latter is an R package. For this sort of code, I lean towards GPL-compatible licenses to make sure that my code license meshes easily with the code licenses of others. And since I’m no longer a fan of copyleft, the MIT license works just fine most of the time. It essentially says, “go ahead and use my code as you like, but I’m not providing any guarantees that it’s any good.”

Still seems complicated. Any other thoughts?

I have read a convincing argument [4] that I can’t find now, despite lots of searching. If you know it, can you send me the link? that as academics we might reasonably put everything under a public domain or MIT license (which limits liability). The reasoning is essentially that (1) academic culture already provides for attribution by default; (2) there are lots of murky gray waters in the copyright code such that definitions may vary between people (e.g. my definition of “commercial” may be different than yours), meaning that it’s hard to know what people’s real intentions are when they choose an Open license; and (3) we aren’t prone to go around suing each other over copyright infringement. After all, copyright only really matters if you’re willing to enforce it. And that takes time and money and effort.

I’m still chewing on this argument.

And I’m happy to hear others. How do you license your scientific Things?

Permanent link to this article: http://ecologybits.com/index.php/2016/10/19/making-science-products-open-an-informal-guide-to-copyright-and-licensing/


Skip to comment form

  1. Stephen Heard

    Nice explanation, Margaret!

    Something you mention that surprises people, and thus needs emphasis, is that copyright (at least in the US and at least since 1976) “appears” when you create your Thing, not when you publish or register it. This means that unpublished letters, etc. are copyright the writer. One place this matters is that people who think they should make science more open by posting the peer reviews of their papers are likely in copyright violation (the “likely” recognizes that I’m not a copyright lawyer either). I have a blog post about this coming up in a few days.

    Like you, I generally copyright blog posts, except for SciComm ones that I make CC-BY, because their whole point is to be read as widely by the public as possible. (But I’m not aware of anyone taking me up on this license offer.) Papers I don’t stress about; I may have the copyright or the journal may, but I don’t lose sleep over it. When I collaborate with Canadian federal scientists, the paper ends up “Copyright Her Majesty the Queen in Right of Canada”, which irrespective of your thoughts on the monarchy, has a lovely ring to it, doesn’t it?

  2. Koen Hufkens

    I release most of my software AGPLv3. This to make sure that derivatives are open as well. For data I would suggest CC BY-NC-SA, especially if these data were gathered using tax money I rather insist on the NC part. The reason for this is that business often “borrows” data to validate their work, with little of this money making it back to the scientific community. In particular I think of the financial markets (banks) who for example extensively use free MODIS data and products for modeling food commodity prices but dodge most taxes.

    1. Margaret Kosmala

      So, I’ve read some reasonable articles about how if you make your software require that derivatives also be open, then you essentially stifle people building on your code. For example, many people work at companies where they might reasonably contribute code, but may decide not to because at the time they don’t know yet whether their code will be open (or be allowed to be open) or not. So they’ll just end up rewriting your code.

      Data as NC is equally a problem. Suppose that some company who builds scientific equipment wants to use your data for validation or testing or whatever. They’re unlikely to contact you and negotiate a license unless your data is *that* unique and *that* valuable. As a result, the scientific equipment might not work as well for your system. Think about weather forecasts. TV and radio wouldn’t be able to have weather forecasts without (really) public data. I’d rather have phenology data public so it can be used for the common good — say to help allergy sufferers. And I have to add that any data collected under NSF (and other federal) grants must be made freely available. You can’t actually license it NC (as I understand it). And I dislike SA for the reasons I outlined in the post.

  3. Florian Hartig

    I’ve been pondering / discussing the same question a few times and am leaning towards Margaret’s view.

    The sentiment underlying NC seems that it’s “unjust” if companies should profit from publicly / personally donated public goods. I can’t see the argument for that. If we extend the access to a public good (software, data) to companies, they can produce a cheaper or better product, and assuming a competitive marked this should make everyone better off. As long as no costs are incurred, the moral imperative seems to be to extend the access to public goods as wide as possible.

    The only reason I see for NC is situations where commercial and non-commercial actors are in competition. This may be the case for certain software products, but not ecological data / software.

  4. Peter Erwin

    For what it’s worth, under US law data (“scientific” or otherwise) is generally *not* copyrightable at all. You can have copyright over a “compilation” of data — which might extend to things like the specific formatting of your data tables — but the data itself cannot be copyrighted. This video by a librarian at the University of Minnesota gives a nice overview, including brief references to the relevant Supreme Court decisions and bits of US copyright law (copyright is discussed in the first 2 1/2 minutes):


    1. Margaret Kosmala

      Really important point! Thanks.

Leave a Reply

Your email address will not be published. Required fields are marked *