People who work in natural language processing get stuff done - it's a very task-oriented field. That's not to say that the theory isn't there, or is even easy to understand -- it's just that most people seem to care more about how a method performs at some well-defined thing, like named entity recognition or part-of-speech tagging, than how neat it is. There's no shortage of things to do with natural language, and each thing tends to have a standard dataset to test against and a well-defined, well-known state of the art.

I like this.

Network science, on the other hand, doesn't seem to place as high a premium on performance. I'd argue that, in many cases, it's not even clear what good performance would look like. The only well-defined, agreed-upon task that I can think of is community detection, and the standard datasets (like the 34-member karate club) are waaaaaaayyyyy smaller than what would be interesting to practitioners like, say, Facebook. It's also not a terribly well-defined thing - what makes a community, exactly?

This makes it hard to justify the field to people who aren't just naturally interested in networks (which, to be honest, is a surprisingly large number of people). On the other hand, just about everyone can agree that a tool that recognizes that 'JFK' and 'The 35th President of the United States' are coreferent is useful, regardless of whether they're interested in linguistics.

I guess this comes down to a science-vs-engineering argument, and that network science is, well, a science. There are so many fields, though, like information retrieval and natural language processing, that really succeed in being both an engineering discipline and a science, and netsci could, too.

Here are some tasks that we could nail down:

  • Predicting the presence of unobserved edges.
  • Identifying influential individuals in networks.
  • Preserving user privacy in online social networks.
  • Tractably improving node classification using network structure (shameless plug).
  • Recommending new, useful connections between users.
  • Predicting network cascade size and character.
  • Predicting network growth and decay.
  • Node and edge reification across distinct networks.

To be clear - I know that there are published papers that deal with each of these, but I don't think there's really a standard benchmark to compare a new approach with in the same sense as, say, the CORA dataset for document classification.

And, to be fair, there's a shining counterexample to all of this to be found in Jure Leskovec's SNAP group - they tackle well-defined problems, write clear papers, and release their data to the world, giving a nice point of comparison.

Still, I'd love to see network science earn the same 'get it done' cred that NLP enjoys. Here's to hoping.