Classes discovered from 15 years of SumatraPDF, an open supply Home windows app
The app
SumatraPDF is a multi-format (PDF, ePub, Mobi, comedian ebook, DjVu, XPS, CHM) viewer for Home windows and at the moment appears like this:
The code
SumatraPDF is an open-source doc reader for Home windows. It began as a PDF reader, therefore the identify. Over time I’ve added for e-book codecs (epub, mobi), comedian books (cbz, cbr), DjVu, XPS, picture codecs and so on.
It is about 127k strains of C++ (not counting libraries written by others).
It is written towards Win32 API, not utilizing GUI abstraction libraries like Qt. This contributes to creating it as small and quick as potential.
Nearly all of it was written by 2 folks, with occasional contributions from others.
The quantity of code written is definitely larger. It’s the nature of lengthy working code bases that the code will get written and re-written. We delete, add, change.
It is a aspect challenge, accomplished after hours, not a full time effort. How does a each day grind of engaged on an app appears like?
It appears like this:
Why I created SumatraPDF
SumatraPDF is what I name an unintentional success.
I by no means wished to write down a PDF reader for Home windows.
On the time I did not know that PDF is fashionable however Palm administration did which is why they determined that PDF reader is a should have software. I ended up being the (sole) dev on the challenge.
My job was to write down a fundamental PDF viewer that used Poppler to render PDF pages right into a bitmap in reminiscence and blit these bitmap on display.
PDF is a posh format and rendering of some PDFs is gradual. I wished to enhance the pace as a result of Jeff Bezos informed me that pace is one thing that clients will all the time care about.
Unintentional app
The best way to enhance pace is to profile the code and take a look at the outcome.
Sadly, the toolchain for unreleased ARM {hardware} wasn’t superb. Overlook a few profiler, child, be grateful you may have a C++ compiler and do not should enter meeting by typing hex, like Steve Wozniak.
Home windows had first rate profilers, so I compiled Poppler for Home windows.
As soon as I had the library engaged on Home windows, I wrote easiest GUI app that may present the pages and permit navigating between pages.
What have you learnt: I had a easy PDF reader for Home windows.
I launched it on my web site. It could not do a lot so I tagged it as model 0.1.
When you’re not embarrassed by your app then you definately’ve waited too lengthy to launch it
I did not provide you with this nugget of knowledge however I agree with it.
Getting early customers, studying what options they need probably the most beats toiling for months or years and implementing a number of options earlier than you recognize anybody even cares.
Profiling, efficiency optimization and contributing to open supply
Again to profiling: my plan labored.
I profiled the paperwork that took the longest to render and made a couple of surprisingly easy and surprisingly efficient optimizations.
If reminiscence servers, 2 optimizations had the most important impact:
- optimizing string class to make use of what’s know as “small string optimization” i.e. including a small buffer inside string class to carry small strings inline (versus all the time allocating reminiscence for the string). Strings had been used continuously and most of them had been small
- fixing byte-at-a-time i/o by changing it to bulk reads. The best way the code was structured in some code-paths it could do a digital C++ name and a name to C learn() perform for every byte. These are extraordinarily low-cost however not once you do it 5 million occasions
As an excellent boy I did submit my modifications to Poppler.
As is my expertise with contributing to open supply tasks, it was extra of a miss than successful.
Sure, I bought 13 commits in however the challenge wasn’t very lively and the maintainers weren’t keen to simply accept something past small modifications. Overlook any main refactors.
I am not one to voluntarily bash my head towards the wall so I finished attempting.
(As you possibly can see, I am a improbable workforce participant).
Code high quality
I would like it and you must need it to.
How one can keep excessive code high quality whereas working principally solo, with no-one doing code evaluations, no devoted QA workforce?
This is how:
- check the code your self. Step by newly added code within the debugger, confirm the newly added performance works as anticipated and normally use the app rather a lot
- automated crash reporting. Sadly it is a ache to construct however that is single most essential factor you are able to do to enhance high quality of your software program. Briefly: setup exception handlers to catch crashes within the app, in crash handler obtain symbols from the server to get readable callstack, create a crash report that features callstacks of all thread, program and os data, log and submit that to a server. On the server, course of these recordsdata and generate net pages for simple viewing of the crashes. Like I stated: it is a ache to construct. Upon getting crashes, take a look at them sometimes and take a look at to determine what went mistaken and repair it
assert()
. asserts are effectively established observe in C++ code: a further code solely executed in debug builds that verifies some situations are true. If they don’t seem to be, one thing went mistaken and you must examine. I wrote wrote my very ownassert
-like perform which I allow in non-debug pre-release builds in order that I robotically get bug stories from folks hitting these situations. Belief me: there is not any quantity of testing you are able to do your self that may match all of the various things {that a} thousand folks will just do through the use of the app.- logging. When investigating points it helps to know what sequence of occasions led to a crash. My tiny logging module logs to a block of reminiscence. That will get despatched together with crash report. I even have an choice to log to a file and I’ve just lately added logging to a separate logging app through named pipe. That is good as a result of more often than not I do not care in regards to the logs however once I do, I do not wish to restart the app to allow logging. With separate logging app, SumatraPDF is logging on a regular basis and when it detects that logging app is working, it’s going to additionally log to it. Implementation was trivial: logging app creates a named pipe, logger opens the pipe (like a file) and if open succeeds, it means the logger app is working and it reads the logs we write to the pipe
- static code evaluation: max degree of warnings in C++ compiler, make warnings into errors, Visible Studio’s `/analyze’ possibility, cppcheck, clang-tidy, GitHub’s CodeQL. Run these sometimes and repair the errors and warnings
- ASAN (Address Sanitizer), is improbable. Was added in some level launch of Visible Studio 2019. At a really small efficiency value it may well detect should you over-write reminiscence or attempt to learn uninitialized reminiscence. I’ve a configuration with ASAN enabled. It is quick sufficient for use as a daily construct.
- stress testing. Sumatra’s job is generally to render complicated doc format. There typically are crashes in particular recordsdata as a result of complexity of the codecs. To make sure lack of crashes I wrote a stress check code that reads and renders all recordsdata in a listing. I sometimes run it earlier than a launch on a big assortment of check recordsdata I amassed through the years
- unit testing. I haven’t got plenty of them, they’re principally for testing edge instances for low-level performance like string formatting. They sometimes discover bugs.
- reminiscence leaks. It is surprisingly exhausting to seek out a simple to make use of reminiscence leak detection software. I am engaged on a quite simple built-in leak detector. Within the meantime I am utilizing Dr. Memory. It really works however it’s tremendous gradual.
Frequent releases
When you do not have many options, bettering the app is quick and simple. It would not take a lot effort to implement “Go to” dialog (carried out in v 0.2).
On one hand I do not wish to launch too typically however I additionally do need the customers to get new options as shortly as potential.
My coverage of recent releases is: launch when there’s no less than one notable, user-visible enchancment.
Internet apps take it to the acute (some firms deploy to manufacturing a number of occasions a day).
In desktop software program it is a bit extra concerned and I needed to construct performance to make it straightforward i.e. add a verify for brand spanking new releases, write an installer that may replace this system.
BTW: I imply “frequent in proportion to quantity of recent code written”. SumatraPDF releases usually are not frequent in absolute phrases however frequent should you think about that it is a part-time, after hours challenge.
Deal with open supply tasks like industrial software program
Majority of open supply tasks most likely do not fall into this class, however in order for you your open supply to be as profitable as potential, act as if it was a industrial product from a software program firm.
What does it imply in observe?
From day one I created a web site for the app. It had screenshots, it had documentation, it was straightforward to obtain and set up. Granted, a sort soul on Reddit referred to as it “a web site made by a 6-year previous”. The lesson right here is two-fold:
- ignore haters and assholes
- a web site constructed by a 6-year previous is best than no web site. It would not should be fairly, it needs to be purposeful
I did fundamental search engine optimization. Nothing past Google’s “search engine optimization 101” docs: simply take note of URLs, put the proper meta-data, use the proper key phrases.
I had a discussion board for customers to ask questions, submit function requests and infrequently assist one another.
I made the set up course of as straightforward as potential.
The whole lot that’s a good suggestion for selling industrial software program can also be a good suggestion for open supply challenge.
Switching the engine whereas the automobile is working
Altering the app to make use of utterly completely different library shouldn’t be one thing you are able to do in a day.
It is demoralizing to work very long time on code that does not even compile.
To maintain issues compiling whereas additionally working in direction of supporting various rendering engine I developed an abstraction for the rendering engine.
The engine would supply the performance the UI wanted: getting variety of pages within the doc, sizes of every web page (to calculate format), rendering a web page as a bitmap and so on.
I am a lot much less smitten by abstractions than most programmers (no less than those that wish to opine on Hacker Information) however on this case it served me effectively.
I used to be capable of incrementally convert program kind utilizing Poppler API to utilizing Poppler through engine abstraction to utilizing mupdf through Engine abstraction.
For some time I supported each engines on the similar time however ultimately I switched to only mupdf, to maintain the app small.
This opened the door for supporting different codecs through the identical abstraction.
Simplicity vs. customizability
Simplicity sells.
I discovered that from the historical past of Mozilla Firefox.
Earlier than Firefox there was Netscape Navigator. It was a beast of an app, combining net browser with e-mail consumer.
Netscape could not assist themselves and was including options upon options, resulting in very complicated UI.
A small group of renegades inside Mozilla forked the code and centered on easy UI.
Easy Firefox was far more fashionable than the complicated Navigator and ultimately ate it utterly.
From the start my purpose was to maintain the UI of SumatraPDF so simple as potential. An 80/20 app: 80% of performance with 20% of the UI.
This requires resolve. I consistently get requests so as to add extra icons to the toolbar and I consistently should say “no” as a result of including 2 extra icons to the toolbar to fulfill 10% of customers makes the app barely worse for 100% of the customers.
One other entice is a siren music of extra settings. Typically folks recommend that as an alternative of doing X, this system ought to do Y. Not prepared to take away X, they recommend including a brand new UI setting “[ ] Do Y as an alternative of X”.
Having settings dialog with 100 settings shouldn’t be an excellent resolution. It makes the app worse for everybody as a result of overwhelming them with selections and hiding essential choices in a sea of non-important choices.
To not point out that each conditional habits requires extra code, extra potential bugs and extra testing.
That being stated, I additionally consider customizability is essential. I consider {that a} massive motive for Winamp being such a dominant music participant (on the time) was its skill to pores and skin the entire UI.
Some superior options may solely be utilized by 20% of customers however these customers are most definitely energy customers that can evangelize the app greater than the opposite 80% of the customers.
My resolution to UI simplicity vs. customizability: superior settings file.
I did not trouble to write down UI for altering these superior settings. I simply launch notepad.exe with the file. When person modifications the settings and saves the file, I reload it and apply the modifications.
Be water, my good friend
Change is the one fixed. We should adapt to the modifications on this planet.
I can not consider what number of fashionable tasks nonetheless use craptastic Sourceforge for supply repository or mailing record.
Really, I can consider: altering issues takes effort and the trail of least resistance is to do nothing.
I began with Sourceforge, switched to code.google.com after which to github.com.
I switched discussion board software program thrice.
I’ve added a browser plugin after which eliminated it when browsers stopped supporting such plugins.
I modified the format for storing preferences from binary to human readable textual content.
Home windows XP went from being the OS utilized by majority of customers to not being supported (lengthy after Microsoft stopped supporting it).
At first I solely had 32-bit construct and now I’ve each however emphasize 64-bit builds.
Suppose exterior of the field
Considering exterior of the field is tough as a result of the field is invisible.
SumatraPDF wasn’t the primary PDF reader software ever written.
However most PDF readers don’t change into multi-format readers.
In hindsight it is an apparent concept to assist as many doc codecs as potential however it took me 5 years to appreciate it.
Most readers are nonetheless single format and I do consider being multi-format helped SumatraPDF change into fashionable.
I can not say it’s very distinctive concept. There have been multi-format picture viewers lengthy earlier than SumatraPDF and I most likely was impressed by them.
Small and quick – decide each
By at this time’s requirements SumatraPDF is tiny (installer smaller than 10 MB) and begins up immediately.
I consider being small and seemingly quick was a giant motive for adoption.
This comes again to Jeff Bezos’ knowledge: there’ll by no means be a time when customers need bloated and gradual apps so being small and quick is a everlasting benefit.
How do I maintain SumatraPDF small?
I keep away from pointless abstractions. Window’s system of controls is a big ache within the ass to program towards. I might use wrappers like Qt, WxWindows or Gtk. They’re simpler to make use of however trigger immediate, large bloat.
I am not afraid to write down my very own implementation of issues. I’ve my very own JSON, HTML / XML parsers which might be a fraction of measurement of the favored libraries for these duties.
I aggressively benefit from wealthy performance included in Home windows.
As an example I must do a community request. I might embody a monster library like curl or I might write 300 strains of code utilizing win32 APIs. I wrote 300 strains of code.
An absence of bloat is tough to note as a result of it is not there.
My pet peeve is over-using XML for storing information.
After I labored at Palm I used to be at a design assembly for auto-update system for a telephone. A part of it was storing details about the present model within the picture, downloading details about the newest model and evaluating them.
The developer determined to make use of XML for storing that data. That appeared like plenty of bloat for storing easy data like a model quantity. An compliant XML parser alone is plenty of code. Certainly a easy binary format could be simpler to implement, I urged and was ignored.
If you do not have the facility to fireplace somebody, your concepts will probably be ignored.
(As you possibly can see, I am an ideal workforce participant.)
For storing superior settings I designed and carried out a file format that’s smaller than XML, readable and writeable by people and may be carried out in few hundred strains of code. It is as highly effective as JSON and much more readable.
It is so easy that after implementing it I had the time to implement a serialization system for C++ objects and a Go code generator. So as to add extra settings I haven’t got to write down extra C++ code. I simply add information definition to Go generator, re-run it and get data-driven C++ parsing auto-generated.
It is my challenge and I act prefer it
When somebody pays you to write down code it’s important to do it the best way they prefer it.
A giant attraction of engaged on code you are not paid for is that there is no such thing as a one who can inform you what to do or the best way to do it.
My code wouldn’t go a code assessment at Google and never as a result of it is dangerous however as a result of it is typically unorthodox. Outdoors of accepted dogma.
(As you possibly can see, I am an ideal workforce participant.)
I all the time used SumatraPDF as my playground for testing loopy concepts.
Decrease the code measurement by not utilizing STL? That is loopy however I did it. Granted, in 2006 STL wasn’t superb.
I discovered about how Plan 9 C code had non-traditional scheme of #embody recordsdata the place they do not put #ifdef wrappers in every .h file to permit a number of inclusion and .h recordsdata do not embody different .h recordsdata. Because of this .c recordsdata have to incorporate each .h file they want and in the proper order. It is a bit of a ache and no different trendy C++ codebase I do know of maintains such self-discipline.
But it surely’s my challenge so I did it and I maintain doing it. It prevents round dependencies between .h recordsdata and would not inflate C++ construct occasions due to careless together with the identical recordsdata over and over.
I carried out a CSS impressed UI system. Not nice, however mine. And I plan to switch with a distinct one.
As a result of I can.
As a result of nobody can inform me to not.
Cross-platform is over-rated
Supporting different platforms (Linux, Mac, Android) is without doubt one of the most frequent requests. A request that I’ve to say no.
First, there’s a pragmatic motive: I simply haven’t got the bandwidth to write down code for 3 platforms.
Second, I consider a superb app for one platform can change into extra fashionable than a mediocre app for 3 platforms.
Coming again to the primary motive: I haven’t got the bandwidth to write down 3 glorious apps. A part of the rationale SumatraPDF is small is my use of win32 APIs for the UI.
The one means for one individual to even try cross-platform app is to make use of a UI abstraction layer like Qt, WxWidgets or Gtk.
The issue is that Gtk is ugly, Qt is extraordinarily bloated and WxWidgets barely works.
Checks usually are not needed, neither are code evaluations
I am not saying checks are dangerous or that you just should not write check or do code evaluations.
I am saying that they don’t seem to be needed.
Dogma is highly effective. Typically in my company life I felt like writing checks was simply going by movement. Perhaps we should always spend extra time writing code as an alternative, I although?
However attempt to make a nuanced level about extra checks vs. extra code to your fellow builders and you will be burned at stake and your smoldering carcass will probably be thrown to wild canine. Village kids will use your severed head to play soccer.
(As you possibly can see, I am an ideal workforce participant.)
And but I do know that you could write complicated, comparatively bug free code with out checks, as a result of I did it.
I do know that you could write complicated, comparatively bug free code with out anybody wanting over your code, as a result of I did it.
If nobody makes use of your app then who cares if it crashes.
If many individuals use your app and it crashes, they’re going to inform you and then you definately’ll repair it.
In a single day success takes a decade
SumatraPDF is comparatively fashionable. Not Fb fashionable or DOOM fashionable, however extra fashionable than most apps. A decent degree of fashionable.
It began with v 0.1 and a trickle of downloads. It remained a trickle for a lot of, many months.
I am undecided there is a lesson right here.
Success typically takes a very long time.
Sadly, at that stage it is undistinguishable from (eventual) failure so this knowledge would not allow you to should you’re engaged on a not-yet-successful challenge and debating should you ought to proceed or abandon
The cash
Open supply shouldn’t be an excellent enterprise mannequin.
If you wish to earn a living do actually the rest: attempt to promote software program, do consulting, construct a SAAS and cost month-to-month for it, rob a financial institution.
I did experiment with earning money and made some.
There was a time AdSense would pay first rate CPM so I put AdSense advertisements on the web site and it made some cash. I not do as a result of the charges did plummet and it is not price annoying folks. My soul has a value and AdSense can not afford it.
Now I am experimenting with Patreon and Paypal donations. It makes greater than $100 a month however not far more than that.
Like I stated: do not begin open supply challenge with intent to earn a living.
Not often you possibly can have each: freedom to do no matter you need and an excellent pay so decide what’s extra essential to you. Open supply offers you freedom however not cash.
On to the longer term
I must proceed being like water.
For years I resisted including enhancing options. “It is only a reader” I stated. However why not add enhancing? If folks need it, give it to them.
The way forward for all software program is as an online app. Why not deliver the spirit of SumatraPDF to the online?
These are just some concepts I’ve at this time.
Being like water implies that in 5 years I will produce other concepts, knowledgeable by what’s occurring at the moment.