logoalt Hacker News

Zen and the Art of Machine Learning Research

191 pointsby jxmorris12last Tuesday at 12:45 AM63 commentsview on HN

Comments

jdw64today at 7:30 AM

I feel that the Zen used in the West and the Zen in East Asia are quite different. I think the Western Zen is probably the one from the 1970s book Zen and the Art of Motorcycle Maintenance. It usually carries a sense of equanimity and beginner's mind. But in East Asia, Zen actually emphasizes aimlessness or non‑purposefulness.

The point where I really feel the difference is that Western Zen seems to be about how to train the self to become stronger, whereas actual Seon (Zen) in East Asia is about going with nature, letting go of the self, and allowing things to flow. In the actual practice of Seon, it's about doubting the self, letting go of attachments, and realizing that achievement, comparison, and the desire for control are all just fleeting. There's a famous phrase: 'Banghasak (放下著)' — let it all go.

If anything, I think ancient Roman Stoicism feels more like Zen than Western Zen does

So that's fascinating. When I saw this article, I was expecting it to be about whether we should give up the desire for success, but instead it took a completely different direction, which was surprising

show 14 replies
rented_muletoday at 7:58 AM

Around 2015, I found myself managing back end and machine learning engineers (not researchers) at the same time. Many of the back end engineers wanted to do more ML. Some of them did well when given a chance, but others wanted to revert to back end within a few months. At the same time, one of the ML leaders wanted to step away from ML and only do back end work to support ML.

As I studied these dynamics, something occurred to me... Different people need to see signs of success at different frequencies. Because of the nature of our product, measuring the performance of a new/updated model required the model to be live for at least a full calendar month. So, between initial work and final analysis, it was often a 2 month wait or more. For many back end tasks, you can build a quick prototype, run it to see if it works, and be on your way - the signals come all day long. The varying frequency needs of different people went a long way to determining which of them liked working on ML.

This is sort of a manager's version of feature engineering. ;-) The people on that team taught me a lot!

show 1 reply
mrmarkettoday at 11:06 AM

excellent essay. what a great read.

like the author said, so much of 'success' or 'progress' (in research but of course also across disciplines) depends upon temperament. just straight up having a good attitude about things. the skills that make a good researcher could not be more transferable: patience, innate curiosity, and a resilience against failure.

that said, these skills are increasingly rare/at a premium given our culture of minimizing discomfort tolerance via hyperconvenience. people have a harder and harder time waiting or failing.

ms_by_pdtoday at 5:56 PM

Good learning

Scene_Cast2today at 10:46 AM

I think this also stems from ML being more like biology or alchemy and less like math or programming (where you can get down to the first principles, abstractions are rock solid, and non-determinism is limited in scope).

HarHarVeryFunnytoday at 1:27 PM

> on days we find insight, we sit.

> on days we do not find insight, we sit.

This reminds me of Ed Witten (greatest living physicist?) in an interview by Brian Green. Green asked Witten what his day-to-day was like at the Institute for Advanced Study ...

Wittens' reply: "I sit at my desk".

almarchertoday at 12:37 PM

Stepping away from the work to find inspiration, to allow the subconscious time to process everything, to present your conscious mind ideas is necessary. I try to pick a wild or almost outlandish idea from time to time, because if I only try what I think will work, then I'm not doing my job.

aputsiaktoday at 2:38 PM

You can in fact take courses taught by the greatest in the field. The one does not exclude the other.

sdfsefsdftoday at 7:37 AM

Perhaps I've been deep in my own issues for too long, but it seems to me that the author is trying to say "don't trust the current evaluation suites too much"; scores only reflect a small part of the problem. What's interesting is discovering a new, stable evaluation metric, doing something new based on it, and having that new thing yield some unexpected intelligent results

WithinReasontoday at 9:27 AM

> If you want to solve a problem, the tried-and-true path to success is to attempt a solution, try it, reach a bottleneck, try to solve it, and only reach for literature when you’ve run out of ideas yourself.

I've found this to be the right balance between using your creativity and getting stuck too long

jessinra98today at 1:53 PM

Would either of you have a recommendation on where to start learning about either?

lostdogtoday at 6:41 AM

I have some coworkers that are similar in everything--education, work ethic, and intelligence--but some of the tick out ML ideas that work like clockwork, while others get hits rarely if ever. I cannot tell what makes it work for some and not others. Their ideas both sound equally good.

Sometimes a coworker will be an ML star for a year or two, but then suddenly run out of steam. It's brutal to watch.

I used to think most smart people had similar distributions of good ideas, and it was just that the hardest working tried out all 50 of their ideas to pick out the 2 good ones. But I've seen smart and hardworking people have a hit rate of 0.

show 5 replies
misiti3780today at 5:24 PM

why is SVD so important? i know it's important in general ML but seems minor for LLMs (LoRA?)

staredtoday at 8:30 AM

It revolves around the sentiment of "go deeper" - but I think it is a double-edged sword. Sure, entropy, tensors and gradients are important - and yes, they are pretty much requirements.

But from what I see, it is the opposite - a lot (if not virtually all) progress in the last decade of deep learning was not because of a fundamental idea, but incremental, experimentally-verified practice. Even though I think there is good intuition for why ReLU is better than sigmoid (tl;dr: last layer is log(sigmoid) ~ ReLU, putting anything different inside kills the gradient), the original paper by Hinton himself was more or less "because it trains 3x faster".

Re-thinking fundamentals might help, but most "let's change the fundamentals" is rarely how it works. Even the most seminal papers, i.e. AlexNet and "Attention Is All You Need", are refinements of existing ideas, and show how they help.

Machine learning is an experimental science. Many mathematically cool ideas do not work. Many engineering ones do.

> I've tweeted before that one of the most important traits in a researcher is healthy paranoia. Be paranoid!

I have seen so many PhDs burned out to cinders; I don't think it is any more a good piece of advice than "depression is good for philosophers". Sure, be a relentless explorer.

> In short, holding on to ideas for too long can actually be counterproductive. Stay open-minded and refuse to let ego cloud your judgement.

Which I think is true.

nathaah3today at 7:47 AM

This is gold!!!!

photochemsyntoday at 5:01 PM

[flagged]