Reproducibility and the Philosophy of Data with Clare Gollnick

800 800 This Week in Machine Learning & AI

In this episode, i’m joined by Clare Gollnick, CTO of Terbium Labs, to discuss her thoughts on the “reproducibility crisis” currently haunting the scientific landscape.

For a little background, a “Nature” survey in 2016 showed that more than 70% of researchers have tried and failed to reproduce another scientist’s experiments, and more than half have failed to reproduce their own experiments. Clare gives us her take on the situation, and how it applies to data science, along with some great nuggets about the philosophy of data and a few interesting use cases as well. We also cover her thoughts on Bayesian vs Frequentist techniques and while we’re at it, the Vim vs Emacs debate. No, actually I’m just kidding on that last one. But this was indeed a very fun conversation that I think you’ll enjoy!

Conference Update

You all know I travel to a ton of events each year, and event season is just getting underway for me. One of the events I’m most excited about is my very own AI Summit, the successor to the awesome Future of Data Summit event I produced last year. This year’s event takes place April 30th and May 1st, and is once again being held in Las Vegas, in conjunction with the Interop ITX conference.

This year’s event is much more AI focused, and is targeting enterprise line-of-business and IT managers and leaders who want to get smart on AI very quickly. Think of it as a two-day, no-fluff, Technical MBA in machine learning & AI. I’ll be presenting an ML & AI bootcamp, and I’ll have experts coming in to present mini workshops on computer vision, natural language processing and conversational applications, ML and AI for IoT and industrial applications, data management for AI, building an AI-first culture in your organization, and operationalizing ML and AI. For more information on the program visit twimlai.com/aisummit-interop-2018/.

About Clare

Mentioned in the Interview

“More On That Later” by Lee Rosevere licensed under CC By 4.0

16 comments
  • Charles Brewer
    REPLY

    This may be the most interesting TWiML I have yet listened to. Having a background in philosophy and a career in IT, lately in the ML area may have something you doing with it!

    Clare’s insight into the relationship between a phenomenon defined by rules and one where our descriptions are married into reality is fascinating. I am currently the CEO of a company where we are using Wittgenstein’s Picture Theory of Meaning as the basis for a model of regulation-controlled enterprises (we don’t tell clients about it though). Essentially what we are doing is creating a week defined set of tires of object and creating visual models of reality which are used to promote understanding and communication between the human agents who interact in this domain.

    The matter of subject matter expertise is the key between our modeling environment and any given actual enterprise.

    I found Clare’s analysis of the relationship between hackers and am enterprise to be both clear and enlightening, a airbrush combination in my view.

    As a follow up Clare might wish to examine the Wittgenstein-Turing disagreement about how rules work in mathematics. As a fully signed up Wittgensteinian, this is one place I think W got it wrong, but the nature of the “wrongness” is, I think, just what Clare has identified.

    A large part of the scholarship round Wittgenstein’s later work concerns”following rules”. It’s very subtle stuff (hidden in very straightforward language) and u would recommend this as a follow up on the philosophy of data.

    Best regards

    Charles Brewer
    CEO Ariadne Regtech

      • Charles Brewer
        REPLY

        Hi Sam,

        Wittgenstein is, unfortunately, not one of the more accessible philosophers since he doesn’t reference previous philosophers and questions and often he will make points and it takes so digging about to work out what he is really addressing.

        His earlier work is at once more austere and more comprehensible. You can read the Tractatus Logico-Philosophicus in a couple of hours (even going fairly slowly) but there will be some parts which will be quite incomprehensible (I still don’t know what some of the stuff in section 5 is about or what he is addressing).

        The best to start may be Ray Monk’s “Wittgenstein – The Duty of Genius”. It’s a longish biography, but is very readable and Monk weaves Wittgenstein’s philosophy into his own development – which is very relevant.

  • John Morris
    REPLY

    This was the first TWiML podcast I’ve played. Clare Gollnick’s comments were excellent – exactly what is needed in a time of inflated expectations for “AI”. Thanks for presenting, Sam.

  • Tim h
    REPLY

    The reproducibility crisis should be one of the greatest scandals of our times. As a materials scientist, I have many examples from my field of unreproducible work, and many scientists I’ve spoken with from other fields agree that reproducibility is a major issue in their fields.

    I like Popper’s falsifiability construction of science but have been more convinced by the Kuhn framework in which science advances when consensus is overturned by the accumulated weight of many experiments that seem to contradict the dominant paradigm. It seems to me to be a more accurate depiction of what happens, and a more Bayesian notion, which appeals to me.
    https://en.wikipedia.org/wiki/Thomas_Kuhn?wprov=sfla1

    See point #2 here
    https://plato.stanford.edu/entries/popper/#CritEval

    • sam
      REPLY

      This podcast and topic seem to have really resonated with folks. I’m looking forward to digging in more and appreciate the pointers!

      Thanks!
      Sam

  • Rick Payne
    REPLY

    Great episode.

    I have been thinking about the idea that “all data is about the past”. There is a metaphor often trotted out about how you should not manage a company by “looking in the rear view mirror.” When we drive a car and look into the distance we are sort of getting data about the future. So contemplating whether such data points are somehow different.

  • patrick
    REPLY

    I listen to and enjoy these podcasts regularly.
    This interview with Clare Gollnick was among the best to which I have listened.
    Really enjoyed Ms Gollnick’s clarity and obvious enjoyment to discuss some of the philosophical aspects of inferring from data.

    Thanks much !

  • John Bestevaar
    REPLY

    The reproducibility crisis is an emergent property of the complexity of our knowledge systems. Our systems are only valid for the human beings because we consist of biological systems on planet earth. Our predictions consequent to knowledge are also only for humans. So the notion of reproducibility is directly related to human need such as if we assume the sun will rise tomorrow we also assume we have a good chance of being able to find breakfast in the fridge. In other words reality is interesting to us humans only so far as it serves our needs within notions such as Science and reproducibility.

Leave a Reply

Your email address will not be published.