Why does it matter? If you want to read scheduling code of linux, no one forces you to read 5M+ lines of driver code. If linux was just a microkernel, and distros shipped drivers, you'd still have to read roughly the same number of lines to understand linux scheduling. If you want to read the entire linux code, you can still do it, just ignore the driver code if you're not interested in them; otherwise in your scenario you'd be reading driver code in distros as well.
We literally read basically all of the Linux scheduler/syscall code in my computer engineering classes in the first couple weeks, the rest of the course was to reimplement a subset of it for a custom (RT)OS on some Cortex-M3 microcontroller dev boards.
The first week or so of the process was learning how to go from a new Debian machine (with the expectation we'd only used Windows or Macs before) through "git clone git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable.git" (not on Github at the time) to finding the function called by "man 2 fork" and navigating the source tree/unmasking preprocessor stuff to show the actual implementation. Yes, you can easily make a wrong turn if you're trying to do that on your own. But the actual linux/kernel directory, which has most of the parts you'd want, isn't that much larger, and a lot of the difference is modern requirements like power saving and security.
> Why can't distros be a package of microkernel + drivers?
Performance. Basically, the required amount of context switches (since all drivers run in user mode) impacts performance and cache use in a negative way compared to monolithic kernels. Whether this is still a significant issue with modern designs, I don't know, but that was the argument back in day (i.e. the early 90s when Linux came about).
Because a monolith is easier to get working on a quick timeline and it will generally outperform code that makes compromises for the sake of its' developers sanity. At least as long as you can keep the insane developers from eating each others eyes.
I feel obligated to point out that there is a notable lack of eyeless developers in Linux kernel developer community, even after many, many years of development.
It's not even guaranteed to _be_ of high quality, is the thing. The same arguments play out all over CS: monolith vs microservices, distributed vs. centralized. Pick a poison and do it well - there's no 100% argument that says one side is absolutely better than the other here.
The other comments here have already mentioned the drawbacks of a microkernel, but I do wish Linux was more modular --- for example, like Windows where drivers are separate loadable modules, and Microsoft sure doesn't maintain the majority of them either. The Linux/Unix approach of having one huge kernel binary just doesn't seem all that efficient, especially if it contains drivers which would never be used.
The Linux kernel does use loadable modules, and indeed that is how most drivers are used. My relatively boring laptop setup is using 219 modules at the moment.
Most drivers in Linux are loadable modules- see the output of `lsmod`.
They are built (usually) at the same time as the kernel, yes, due to the lack of ABI stability guarantees, but most drivers you don't use won't take up any RAM.
Why can't distros be a package of microkernel + drivers?