An argument that Jupyter Notebooks are flawed and the world needs a successor.
MIT License
An argument that Jupyter Notebooks are conceptually flawed just as spreadsheets are - the (python) world needs a successor.
See this discussion on reddit and notbook and alternatives.
To try and convince you of that I've built a quick demo of on alternative. TL;DR here's an example of the kind of document that's generated.
But before you look at that, I should convince you notebooks have fundamental problems...
True, that's very annoying. Both .xlsx
and .ipynb
show as blobs, it's impossible to clearly see what's changed
in either between versions.
But it's not the real problem, try again.
Another good try. It's virtually impossible to have clearly visible logic in either a notebook or an excel sheet which can also be used in production. You end up copying the logic out, rewriting it and adding it to your production stack.
Problems come when you want to modify the logic and share it with someone again, now you have two (somewhat different) implementations to keep track of and keep identical.
But still not the fundamental common problem, try again...
Remember ctrl + alt + shift + F9
in Excel? Go for coffee and wait for it to update, and hope nothing crashes,
search through all you sheets to see if anything has gone wrong.
Notebooks are no better - sections don't automatically
update when an earlier section is modified, so you end up running "Run All Cells" the whole time.
But even that's not the whole problem, because notebooks reuse a single python process, you can have more subtle
bugs: declare a function in one cell, then use it in the next - all works well, now delete the function from the
first cell, but the function object still exists in globals, so the notebook continues to work.
Now you send that sheet to someone else and of course everything fails!
But that's not all: both excel and notebooks don't make it obvious when an error has happened, you could have an exception in a cell that's offscreen (or a sheet you're not looking at) and you wouldn't be aware of it.
That's a big problem, but really it's just an implementation mistake, not the root problem, keep trying...
True, just as you can't reuse logic in a production environment, you can't easily access logic outside the main document to write unit tests.
Mad, but still not THE problem, keep trying...
So annoying, anyone who writes code for more than a few hours a month becomes very at-home in their editor of choice. Having to leave it to use either excel or notebooks is really painful, plus both lack many of the advanced features of a modern IDE.
Still not the answer I'm looking for, have another try...
Excel is awful at displaying complex results, particularly with a narrative. Notebooks are better at describing a narrative, but they're still ugly - you have to show lots of code that the casual reader doesn't care about (like imports and utility functions), there's also no really pretty way of displaying a notebook that I know of.
Still not the right answer, have one more guess...
YES! That the fundamental problem, and has led to many very serious (heisen)bugs. I personally have fallen into this bear pit more than once.
Three conceptually very different things:
Are stored in the same file.
Imagine if python automatically appended the output of a script to that script every time you ran it!
Of course this didn't seem like a big problems when spreadsheets were conceived in the 60s (the word "spreadsheet" or "spread-sheet" comes from two facing pieces of paper used as a leger). They were designed as just a clever table, just as you had the inputs and output on the same piece of paper when you did manual accounts on lined paper; it seemed sensible to keep everything in one file when building the computerised equivalent. But why on earth did anyone think this was still a good idea when inventing notebooks?
You would never think of storing your customer data in the same file as the logic to generate their invoices (unless you're still using excel) - so why would you store the results of your machine learning model along side its definition?
This is the fundamental problem with both excel and notebooks, it's the root cause of many of the issues described above and the reason most experience develops eschew both.
(Here my MVP is called "notbook", but if anyone actually wants to use it, it should be renamed to avoid confusion with Jupyter Notebooks.)
A program that executes a python script, and renders an HTML document with:
That document can be built either using:
notbook build my-logic.py
- where the HTML document is built once and the process exists, if execution raises
an exception, no document is built and the processes exits with code 1
.
To view the document generated with the notbook build demo-script.py
see
samuelcolvin.github.io/notbook/.
notbook watch my-logic.py
- where the file is watched and a web-server is started showing the document,
when the file changes the HTML document is updated and the page automatically updates giving almost instant feedback.
Watch mode in action:
The python script(s) containing all logic:
python
CLIThis might not sound like much (it's basically just another static site generator which works on python files, not markdown etc.), but I think it could dramatically improve the workflow for data scientists and anyone python-literate currently using notebooks or excel.
There's much more this could do:
devtools.debug
is used tostr
, int
, float
), this should be replaced with an interactive tree-view