dw_dev | RFC: Config Yak Shaving / Bikeshedding

Summary

The present config system is ... non-ideal. It should be better.

Problems / Pain Points

The existing config system is all-or-nothing: if you want to tweak one thing in config.pl—$USE_ACCT_CODES or @SCHEMES, say—you have to copy the entire thing, and will no longer get tweaks made to the base config.

As you might imagine, this makes upgrading painful. "Oh, they added an option and ... now it's all broken because I don't set it."

The existing config system also fails to compose. While it loads three config files—config-private.pl, config-local.pl, and config.pl—it only loads one of each. If you clone, say, dw-nonfree into ext/, it will load config-local.pl from that. Unless you already have your own config-local.pl in ext/local, in which case it won't use nonfree's at all. So even though you only wanted to make one or two changes over the baseline, now you're stuck merging all three config-local.pl files manually.

And just forget about adding bobs-awesome-dw-plugin. I don't know if anything beyond dw-nonfree exists, but hey, maybe at some point.

It also brings up the question of what config variable goes into which file? @SCHEMES and @LANGS are pretty darned site-specific, but they're in config.pl. $HOME is set to the LJHOME env variable in config-local.pl, but when are you ever going to change that? (In fact, things are likely to break if you ever did!)

Good Things

One of the nice things about the existing config system is that it is pure Perl, bringing with it all the flexibility that provides. (Though some might argue that a turing-complete configuration file is also a drawback.)

Proposal

Summary

Let's move to a Debian-style conf-available/conf-enabled split system, where all config files are loaded, and then merged.

Technical Details

Directory Structure

Similar to the existing structure, except the provided config files would be placed into directories called something like "conf-available" or "conf.d". Config files would be loaded, in lexicographical order, from a single "conf-enabled" directory, which is populated with symlinks to the actual config files.

etc/conf-available/
- private.example.pl
- local.example.pl
- defaults.pl
- down-for-maintenance.pl
- schemes.pl
ext/local/conf-available/
- private.pl
- local.pl
ext/dw-nonfree/conf-available/
- nonfree.pl
- schemes.pl
etc/conf-enabled/
- 00-private.pl → $LJHOME/ext/local/conf-available/private.pl
- 10-local.pl → $LJHOME/ext/local/conf-available/local.pl
- 50-nonfree.pl → $LJHOME/ext/dw-nonfree/conf-available/nonfree.pl
- 50-free-schemes.pl → $LJHOME/etc/conf-available/schemes.pl
- 50-nonfree-schemes.pl → $LJHOME/ext/dw-nonfree/conf-available/schemes.pl
- 90-defaults.pl → $LJHOME/etc/conf-available/defaults.pl

Config Files

Config files would each return a hash of values. It would be the responsibility of the config system to merge them all together appropriately. Essentially, this would be done in the same manner as used by Config::FromHash. However, because a number of config values are defaulted using prior values, it would be necessary to provide a dynamically-scoped variable containing the config-as-loaded-thus-far, to support that (hopefully that will be made clear by the examples).

# ext/local/conf-available/private.pl
{
    DOMAIN => "example.dreamhack.net",
    DBINFO => {
        # ...
    },
    # ...
};

# ext/local/conf-available/local.pl
{
    IS_DEV_SERVER => 1,
    SITENAME => "My Awesome DW Site",
    # ...
};

# ext/dw-nonfree/conf-available/schemes.pl

use DW::Config::FromHash qw( $CONF );

# Manually append schemes
$CONF->{SCHEMES} = [
    @{$CONF->{SCHEMES}},
    { scheme => 'neato', 'title' => 'Neato' },
    # ...
];

{
    # ...
    # Or, a potentially abstracted way to add things
    add_SCHEMES => [
        { scheme => 'neato', 'title' => 'Neato' },
    ],
    # ...
};

# etc/conf-available/defaults.pl

use DW::Config::FromHash qw( $CONF );

my $www = "www.{$CONF->{DOMAIN}}";
{
    DOMAIN_WEB => $www,
    SITEROOT => "http://$www",
    # ...
};

Merging, and Post-Merge

As mentioned, the config files would be loaded and merged in a manner similar—if not entirely identical—to Config::FromHash.

After the config files are loaded and merged, it would be the responsibility of the Config system to populate all of the appropriate variables in the LJ and DW packages.

Pros

Much simpler to use! You can create a file to override a single value and re-use the existing configuration for everything else. And it works more like you'd expect if you're used to conf.d directories.

Easier manual testing. While automated tests are obviously best, if you need to test something works without and without X service, you can add and remove a symlink to a conf file enabling that service, restart apache, and poke at things.

Paves the way for better support of plopping things into ext/ and having it work. No more "copy these config values into your existing file", just symlink and go.

Cons

Harder to fully comprehend. It's more files floating around, more state to keep track of (-available, -enabled, symlinks galore!).

It's dramatically different, and converting an existing installation would be a pain. (But see below.)

Config reloading (see start_request_reload in Config.pm) is more involved. Far more files to stat, and a lot more work to reload them all. One possibility would be to drop config reloading: how often are config changes made that don't involve a code change that would necessitate restarting apache anyway? ($SERVER_DOWN maybe? But that could easily be converted into a flag file check.)

Not Breaking Existing Installations

It would be preferable to avoid breaking existing installations. I think this is possible: after performing the above, follow that up by running the current config system. While that does mean having multiple config systems running, it gives people time to migrate at their leisure, rather than breaking things immediately.

After some time with the new system, we could then add deprecation notices in the event the old system is still in use. After sufficient time has passed, we could then eliminate it entirely.

Or, we could treat it like ripping off a band-aid and break things and be all "hey, sorry about the one-time pain but we're eliminating the smaller pains you almost never have anyway because really how often do we change these things?".

Why Not Just `//=` Everything?

One thing you might be thinking is "Well, why not just //= everything in the default config*.pl files?

That's definitely much easier to implement! And it brings with it many of the same benefits—people could simply add their overrides to config-local.pl or config-private.pl and never need to create or edit a config.pl (4f8258a does that for @LANGS, it's totally viable).

One of the drawbacks of that approach is that it requires a developer updating the default config*.pl files to never make a mistake. Accidentally used = instead? Tried to use //= to set a list, even though //= only works for scalars? Well, now things are quietly broken in other people's installs. By automating the merge behavior, we can largely avoid that. Whether the additional complication of a split config brings enough benefits over conditional sets to be worthwhile is another matter.

Thoughts? Questions? Alternative proposals?

Flat | Top-Level Comments Only

I can't really comment on the pros and cons of doing this, but this method of config management is fairly familiar to me from configuring Apache, and I find that pretty easy to use. We could work around "harder to comprehend" by writing good documentation!

having been tripped up by config shenanigans in the past, i am fully in support of this proposal. i don't have technical suggestions or commentary to offer, but will cheerfully support whatever comes of this.

Arch Linux uses this configuration style for the whole system, and one of the things I'm never clear on is how the system that is building configuration files figures out who has priority in conflicts (where config.pl says foo = x, and config-local.pl says foo = y) and whether or not the system then merges and builds back a configuration file with everything in the order the program to be configured expects to see it all, so that if something did break, you could look at the final compiled config and see that somehow something is referenced before it is defined.

I do like the idea of being able to just build your own config file on top of what's already there, so that you can be assured that there's always a stable system underneath your tweaks. And it does make developing bobs-awesome-plugin easier, too.

(Not that I've touched Perl or set up a Dreamhack ever and likely won't, but I'm fur things that make it easier for beginning developers to experiment.)

You bring up an excellent point I hadn't considered in the proposal, but definitely should address in an implementation: it should be possible to inspect the resulting configuration. E.g., by doing something like ./bin/lj-conf.pl value VAR to see the resultant value of a variable and where that value came from, maybe even ./bin/lj-conf.pl trace VAR to follow changes to a value through the loading process (most things shouldn't change once set, but appends to something like @SCHEMES are totally plausible). And possibly something like ./bin/lj-conf.pl (enable|disable) FILE [WEIGHT] to add/remove the appropriate symlink, because who can ever remember the argument order to ln, anyway?

who can ever remember the argument order to ln, anyway?

Oh Lord I thought it was only me who got it wrong the first time every goddamn time.

Thanks for posting this!

The current configuration system is a total hodgepodge and quite terrible. I'm all in favor of doing something better and more sane, and this is definitely both of those things.

I'm not sure that Apache configuration is a model of simplicity, but it's probably better than what we have. It's also flexible and familiar to people who do this stuff, and that's probably reasonable as a target level for "deploying a Dreamwidth installation". I.e., we don't have to target total systems novices (good docs + consistent, predictable system without crazy sharp edges means any novice can understand it with enough effort, I think).

So, consider this a tentative +1 unless there are strong, well thought out objections that manage to sway me in follow-up comments.

ETA: The thing I'd mostly want to make sure of is a safe transition that actually finishes. I'm fine with us doing manual labor to port the DW production configurations, but I do want to make sure we can rip out the old code pretty quickly. DW is already way too complicated a system for new developers, we need to be killing and simplifying.

Edited 2016-05-16 05:19 (UTC)

I'm perfectly happy with ripping out the old code and having either a flag day, or a very abbreviated transition period, if you are. Obviously that would warrant even more stringent testing, but it shouldn't be a big deal to whip up some instructions for people on dreamhacks or otherwise testing code to pull in the branch to their local and try it out to provide feedback before it got merged into develop.

Yeah, I generally am leaning that way these days. We've got lots of in- flight transitions so I'm trying to be very careful about starting any more. A hard cutover is probably better in the long run.

I am in favour of fixing the horribleness of the current system.

Since DW is not something that is deployed in many many places, with many optional plugins that come and go, I am not sure whether the Debian-style approach is really necessary, or whether a simpler system would work equally well: simply have two config files, one of which is part of the DW repo and the other of which isn't (call the latter -local if you like), were anything in the second one overrides anything in the first on a line-by-line basis. I guess this is similar to your //= option, except that I'd prefer moving to config files that are not Perl, where this checking is handled by whatever parses them.

I actually considered proposing YAML for the config files—there's something to be said for the simplicity of a guaranteed-static config!—, but I just can't resist the allure of the turing tarpit.

Srsly: my favorite mailer daemon is Exim, whose configuration basically involves writing code for how to handle things; my lighttpd configs rely on include_shell; back when I used sphinx for full-text search, I relied on the ability to have the config start with a shebang and built my own Debian-style conf generator for it. Many of the daemons that don't provide that sort of thing as an option end up with config generators (rsyslog, nsd, etc.) and mildly annoy me because I forget to regenerate the config when something relevant changes instead of it being automagical.

Possibly that just means my programmer side holds a little too much sway over my sysadmin side, but code-based configs are just ... so ... handy. But hey, if there's a consensus on the configs being not-Perl, I'm happy to go that route, too.

Well, I am definitely not a consensus. But it just seems to me that unlike the things that you mention with config file generators, DW simply isn't something that is configured hundreds of times a day around the world... A few sites use it, and they probably don't change their config very often... So I'm just not convinced that additional complexity is warranted. I'd rather know that all the non-default options are in one specific file than be hunting through a config.d directory.

But, as I said, I'm not somebody who has ever configured a DW install from scratch and I'm probably not somebody who ever will, so my hypothetical preferences are not out of experience.

RFC: Config Yak Shaving / Bikeshedding

Summary

Problems / Pain Points

Good Things

Proposal

Summary

Technical Details

Directory Structure

Config Files

Merging, and Post-Merge

Pros

Cons

Not Breaking Existing Installations

Why Not Just `//=` Everything?

no subject

no subject

no subject

no subject

no subject

no subject

no subject

no subject

no subject

no subject

no subject

RFC: Config Yak Shaving / Bikeshedding

Summary

Problems / Pain Points

Good Things

Proposal

Summary

Technical Details

Directory Structure

Config Files

Merging, and Post-Merge

Pros

Cons

Not Breaking Existing Installations

Why Not Just //= Everything?

no subject

no subject

no subject

no subject

no subject

no subject

no subject

no subject

no subject

no subject

no subject

Why Not Just `//=` Everything?