00:00:26  * lloyddejoined
00:03:17  * trentmjoined
00:04:50  * lloyddequit (Ping timeout: 250 seconds)
00:16:45  * lloyddejoined
00:21:21  * lloyddequit (Ping timeout: 255 seconds)
00:35:36  * fredkjoined
00:37:12  * nfitchquit (Quit: Leaving.)
01:01:52  * lloyddejoined
01:06:22  * lloyddequit (Ping timeout: 250 seconds)
01:07:41  * trentmquit (Quit: Leaving.)
01:08:39  * pmooneyquit (Ping timeout: 272 seconds)
01:17:26  * abraxas_joined
01:17:28  * fredkquit (Quit: Leaving.)
01:20:00  * ed209quit (Remote host closed the connection)
01:20:18  * ed209joined
01:21:52  * abraxas_quit (Ping timeout: 258 seconds)
01:28:03  * abraxas_joined
01:34:48  * pmooneyjoined
01:39:30  * pmooneyquit (Ping timeout: 258 seconds)
01:47:16  * dap_1quit (Quit: Leaving.)
02:02:34  * lloyddejoined
02:07:12  * lloyddequit (Ping timeout: 265 seconds)
03:02:35  <swills>anyone knowledgable about manatee around?
03:03:28  * lloyddejoined
03:08:29  * lloyddequit (Ping timeout: 264 seconds)
03:41:44  * pmooneyjoined
03:46:23  * pmooneyquit (Ping timeout: 240 seconds)
04:04:11  * lloyddejoined
04:05:54  <swills> var pgUrl = 'tcp://[email protected]' + config.ip + ':' + config.postgresPort +
04:06:00  <swills>lines like this seem to make a bad assumption
04:09:00  * lloyddequit (Ping timeout: 258 seconds)
05:05:06  * lloyddejoined
05:09:41  * lloyddequit (Ping timeout: 264 seconds)
06:02:32  * marsellquit (Ping timeout: 244 seconds)
06:02:50  * marselljoined
06:05:50  * lloyddejoined
06:10:42  * lloyddequit (Ping timeout: 256 seconds)
07:01:25  * lloyddejoined
07:06:06  * lloyddequit (Ping timeout: 258 seconds)
07:41:32  * marsellquit (Quit: marsell)
07:42:18  * marselljoined
08:02:13  * lloyddejoined
08:06:17  * lloyddequit (Ping timeout: 240 seconds)
08:14:12  * marsellquit (Quit: marsell)
08:39:33  * bixu_quit (Read error: Connection reset by peer)
08:40:04  * bixu_joined
08:44:05  * pmooneyjoined
08:48:37  * pmooneyquit (Ping timeout: 240 seconds)
09:02:49  * pgalejoined
09:03:00  * lloyddejoined
09:07:17  * lloyddequit (Ping timeout: 240 seconds)
10:03:50  * lloyddejoined
10:08:35  * lloyddequit (Ping timeout: 255 seconds)
10:16:10  * marselljoined
10:20:01  * ed209quit (Remote host closed the connection)
10:20:18  * ed209joined
10:27:22  * bixu_quit (Ping timeout: 245 seconds)
10:55:46  * abraxas_quit (Remote host closed the connection)
11:00:38  * mamashjoined
12:05:09  * lloyddejoined
12:10:05  * lloyddequit (Ping timeout: 258 seconds)
12:44:47  * abraxas_joined
12:48:32  * pgalequit (Quit: Leaving.)
12:49:57  * abraxas_quit (Ping timeout: 258 seconds)
12:52:03  * pgalejoined
13:06:10  * lloyddejoined
13:10:46  * lloyddequit (Ping timeout: 255 seconds)
13:32:57  * bixu_joined
13:41:52  * chorrelljoined
14:06:59  * lloyddejoined
14:11:53  * lloyddequit (Ping timeout: 264 seconds)
14:21:11  * manytreesquit (Ping timeout: 258 seconds)
14:23:01  * manytreesjoined
14:23:23  * mamashpart
14:34:11  * abraxas_joined
14:38:47  * abraxas_quit (Ping timeout: 265 seconds)
14:56:26  * pmooneyjoined
15:08:02  * lloyddejoined
15:12:22  * lloyddequit (Ping timeout: 245 seconds)
15:33:47  * pmooneyquit (Ping timeout: 272 seconds)
15:34:33  <nahamu>swills: what's the bad assumption?
15:47:17  * pgalequit (Ping timeout: 240 seconds)
16:00:17  * pmooneyjoined
16:07:29  * nfitchjoined
16:08:39  * lloyddejoined
16:09:15  <swills>nahamu: that the username is postgres
16:09:19  <swills>in my case, it's pgsql
16:09:48  <swills>anyway, after hacking around some things, i think i have the first 2 of 3 nodes up and running
16:13:14  * lloyddequit (Ping timeout: 244 seconds)
16:14:49  <nfitch>swills: After logging on this morning, I only see your last 3 comments about postgres and 2 of 3 nodes… I'm really curious what you're up to.
16:17:21  <nfitch>Ah, just got a ping from another Joyent engineer. Looks like you're trying to set up a manatee?
16:20:04  <swills>yes
16:20:13  <swills>i'm trying to make it work on FreeBSD
16:20:45  <swills>we lack smf, but have everything else (zfs, dtrace, node, pgsql, etc.) so it shouldn't be too bad realy
16:22:47  <swills>really that is, sheesh, i can spell, i swear
16:23:04  * abraxas_joined
16:23:55  <nfitch>Ok… did you fork it? FWIW, there are cases where Manatee goes into 'error' mode which we're actively fixing right now.
16:23:56  <swills>well, i forgot about zones vs jails
16:24:03  <swills>yeah
16:24:08  <swills>i wonder what the zoneid is used for
16:25:20  <nfitch>So there are some pretty major changes coming down the pipeline. Configuration should remain the same, though.
16:25:38  * trentmjoined
16:25:56  <swills>ok
16:26:08  <nfitch>You can follow the progress on this branch of Manatee: https://github.com/joyent/manatee/tree/MANATEE-188
16:26:10  <swills>are those changes in a branch somewhere?
16:26:16  <swills>ah, thanks
16:26:50  <swills>so my primary node seems OK
16:26:54  <swills>i think
16:27:02  <swills>my sync node is up, but something strange happened
16:27:16  <swills>Filesystem Size Used Avail Capacity Mounted on
16:27:19  <swills>zfs8/[email protected]/dev/gpt/rootfs 9.7G 1.6G 7.3G 18% /
16:28:00  <swills>that's not what it was before manatee did it's thing
16:28:05  * abraxas_quit (Ping timeout: 264 seconds)
16:28:09  <swills>should be
16:28:09  <swills>Filesystem Size Used Avail Capacity Mounted on
16:28:10  <swills>/dev/gpt/rootfs 9.7G 1.5G 7.4G 16% /
16:28:22  <swills>(root is on ufs, only the other stuff is on zfs)
16:28:32  <swills>i could put it all on zfs, just didn't
16:28:44  <swills>but i think it might have kinda mucked up the node
16:31:30  <nfitch>Manatee relies on everything postgres related to be in a zfs snapshot. That's how it restores from the primary to the sync… via a zfs send/receive. See starting here: https://github.com/joyent/manatee/blob/MANATEE-188/lib/postgresMgr.js#L921
16:31:46  <nfitch>The old code didn't change there.
16:32:44  <nfitch>I believe it's all controlled via this config paramter: https://gist.github.com/nfitch/675e6ca06b78ed63f485#file-sitter-json-L23
16:32:54  <nfitch>But I haven't looked deeply at that code.
16:34:39  <swills>all the pg stuff is in a zfs snapshot
16:34:46  <swills>but / isn't.
16:35:47  <swills>yes, i have this:
16:35:47  <swills> "snapShotterCfg": {
16:35:47  <swills> "// ": "The manatee ZFS datset.",
16:35:47  <swills> "dataset": "zfs8/manatee"
16:35:48  <swills> },
16:36:23  <swills>so that getting mounted in / is really confusing to me
16:36:26  <nfitch>Oh sorry, I mean the "zfsClientCfg" one: https://gist.github.com/nfitch/675e6ca06b78ed63f485#file-sitter-json-L35
16:37:15  <swills> "dataset": "zfs8/manatee",
16:37:18  <swills>is what i have there too
16:40:54  <nfitch>This may be where that is coming from: https://github.com/joyent/manatee/blob/MANATEE-188/lib/zfsClient.js#L457
16:41:08  <nfitch>But it's strange that the zoneid is in there.
16:41:27  <swills>yes
16:41:36  <swills>and it's strange that it was mounted on / too
16:41:40  <nfitch>If you have logs you may be able to trace back what ZFS actions it took by grepping for ZfsClient
16:42:11  <nfitch>I see it logs everything at info level with that msg prefix.
16:43:02  <nfitch>What is your full zfsClientCfg?
16:44:28  <swills>the logs have likely scrolled away by now
16:45:34  <swills>http://pastebin.com/raw.php?i=gz0WCC5U
16:45:37  <swills>is my full sitter.json
16:46:41  <swills>i left the zoneid alone because i don't have one and i didn't know what it's for
16:46:53  <swills>note also that the uuid in that df output above isn't the zoneid listed in the config
16:51:37  <swills>man this is weird
16:51:43  <swills>so df shows what i pasted above
16:51:53  <swills>zfs list shows:
16:51:54  <swills>NAME USED AVAIL REFER MOUNTPOINT
16:51:57  <swills>zfs8 81.8M 193G 96K /zfs8
16:52:00  <swills>zfs8/2faf0a64-eb3b-4f19-99fe-112856350de5 168K 193G 96K /var/manatee
16:52:04  <swills>but then:
16:52:04  <swills>$ sudo zfs unmount zfs8/2faf0a64-eb3b-4f19-99fe-112856350de5
16:52:04  <swills>cannot unmount 'zfs8/2faf0a64-eb3b-4f19-99fe-112856350de5': not currently mounted
16:52:15  <swills>and also:
16:52:16  <swills>$ sudo mount
16:52:16  <swills>zfs8/[email protected]/dev/gpt/rootfs on / (ufs, local, journaled soft-updates)
16:52:22  <swills>i'm confused
16:52:27  <rmustacc>In genearl, can you please use pastebin, gist, etc. rather than dropping the raw bits into the channel.
16:52:30  <rmustacc>*genera
16:52:33  <rmustacc>**general
16:52:38  <swills>sure
16:52:45  <rmustacc>Thanks
16:53:25  <rmustacc>swills: In these cases though, what's the automount property of the dataset?
16:53:38  <rmustacc>eg. is it in a form that's actually controlled by ZFS?
16:53:45  <rmustacc>Not sure if that applies to FBSD.
16:53:59  * pgalejoined
16:54:36  <swills>there is an automount property
16:54:41  <swills>http://pastebin.com/raw.php?i=2ATD7ZWa
16:54:50  <swills>that's the output of zfs get all
16:55:02  <swills>wait, maybe there isn't automount
16:55:05  <rmustacc>Loooks like noauto is turned on.
16:55:31  <swills>i didn't set that
16:55:44  <swills>heck, i didn't even create the zfs8/2faf0a64-eb3b-4f19-99fe-112856350de5 fs
16:55:56  <swills>it got created when it replicated it seems
16:56:08  <nfitch>https://github.com/joyent/manatee/blob/MANATEE-188/lib/zfsClient.js#L163
16:56:33  <rmustacc>Right, manatee will create datasets and set properties on them.
16:56:45  <nfitch>Manatee does all sorts of things with zfs datasets.
16:57:31  <swills>going to lunch, brb
17:09:24  * lloyddejoined
17:14:17  * lloyddequit (Ping timeout: 264 seconds)
17:38:56  <yunong_>@swills Do you have manatee-sitter logs?
17:39:47  <nfitch>yunong_: He said they scrolled out. Debugging is going to be harder with no logs.
17:40:03  <yunong_>what's the problem they're having?
17:40:28  <swills>yunong_: no logs, sorry
17:40:42  <yunong_>@swills what's the symptom you're seeing?
17:40:42  <nfitch>Filesystem                                                                Size    Used   Avail Capacity  Mounted on
17:40:42  <nfitch>zfs8/[email protected]/dev/gpt/rootfs    9.7G    1.6G    7.3G    18%    /
17:40:43  <swills>well, nothing major right now except that weird mount
17:42:16  <swills>i have the primary and sync nodes up, that weird mount is on the sync node
17:42:26  <swills>let me setup the async node and see what happens... and this time i'll log it all
17:42:32  <nfitch>swills: Did you ever figure out where that uuid is coming from (2faf0a64-eb3b-4f19-99fe-112856350de5)?
17:42:46  <yunong_>@swills is that dataset created by manatee?
17:43:29  <swills>nfitch: nope
17:43:32  <swills>yunong_: yes
17:44:11  <yunong_>manatee does create temporary datasets when a standby is syncing from the primary.
17:44:22  <yunong_>and those are prefixed by a uuid
17:44:33  <yunong_>but I don'k know where dev/gpt/rootfs is coming from.
17:44:53  <swills>that's the root device for my / fs
17:45:05  <swills>or however you want to say it
17:45:09  <yunong_>swills: can you do a `zfs list`
17:45:37  <swills>http://pastebin.com/bNdGXGUh
17:46:09  <swills>http://pastebin.com/yj2pAfST is what it normally should look like
17:46:30  <swills>http://pastebin.com/C8YpHZrz is what it looks like right now after manatee did it's thing
17:46:46  <yunong_>swills: yeah, so those uuids are most likely temporary backups manatee made
17:47:00  <yunong_>can you also do a `zfs get mounted`
17:48:11  <swills>http://pastebin.com/xfQzqKuh
17:48:32  <swills>so it's not mounted, but df shows it, which is really weird
17:48:52  <yunong_>swills: it's not mounted because they were backups.
17:49:07  <swills>ok
17:49:32  <yunong_>so what happens when a manatee standby rebuilds from the primary is that it first makes a backup of the current zfs dataset and moves it to a new dataset (the ones with the uuids)
17:49:41  <swills>ah, ok
17:49:47  <swills>so i wonder why df looks so weird
17:49:49  <yunong_>when the rebuild finishes successfully the temporary datasets are deleted.
17:50:14  <swills>so they are still there does that mean the rebuild didn't finish successfully?
17:50:15  <yunong_>however -- if something interrupts the rebuild -- say a manatee crash, then the backups are left orphaned on the fs
17:50:28  * pgalequit (Quit: Leaving.)
17:50:46  <swills>i wonder if i should delete them
17:50:53  <yunong_>not necessarily. It means that some rebuilds failed -- but the fact that you have a mounted dataset under /zfs/manatee means there was one that was successful
17:50:56  <yunong_>what does `manatee-stat` return?
17:51:32  <swills>don't have that command
17:52:21  <yunong_>sorry it should be '/$path_to_manatee/bin/manatee-adm status'
17:52:44  <swills>returns usage info
17:52:54  <yunong_>can you pass it the right configs?
17:53:04  <yunong_>IIRC that should be the ZK IP and the shard name
17:53:11  <swills>not sure what it wants actually
17:53:14  <yunong_>both of which are in the manatee configs
17:54:33  <swills>so i have the zk ip
17:54:42  <swills>but what is the shard name? the node name?
17:54:52  <swills>is that the same as the shard path?
17:55:17  <yunong_>swills: from your config it should just be "1"
17:55:27  <yunong_>although you don't have to pass a shard name to status
17:55:34  <yunong_>it'll print out all shards if the shard name is emtpy
17:56:08  <swills>http://pastebin.com/dD0SbvR5
17:56:51  <swills>i'm ignoring the lack of dtrace support in node for now...
17:57:08  * bixu_quit
17:57:52  <yunong_>swills: https://github.com/joyent/manatee/blob/master/docs/trouble-shooting.md#manatee-adm
17:57:56  <yunong_>so your cluster looks good for now
17:58:04  <yunong_>you can safely delete those orphaned datasets
17:58:31  <swills>yunong_: awesome, thanks!
17:59:00  <yunong_>swills: no a problem -- let me know if you have any further issues. I'm ecstatic to see someone else running manatee.
17:59:06  <swills>df is still confused, but i'm going to chalk this up to a potential freebsd bug
17:59:20  <swills>yunong_: oh, i will bug you! :)
17:59:46  <yunong_>swills: the trouble-shooting docs are usually a good start too. I'm sure there are bugs in the docs -- so feel free to file them
17:59:53  <swills>we have bhyve which could serve as a replacement for kvm i think so i have long term big dreams of running all of sdc on FreeBSD! :)
18:00:05  <swills>yunong_: will do, thanks. i need to read the docs in more detail...
18:01:20  <swills>so now i have the primary and sync nodes working, time to setup the async node
18:01:38  <yunong_>one more unto the breach
18:02:10  <swills>this is all running in google cloud, ironicly
18:02:25  <swills>ironically even
18:10:16  * lloydde_joined
18:11:05  * trentm1joined
18:11:43  <swills>one thing i just noticed, in the sample sitter.json config, the ip of the node is not the same as the ip in the connstr in the zkCfg section
18:11:50  * abraxas_joined
18:11:58  * trentmquit (Ping timeout: 245 seconds)
18:12:02  <swills>does this mean they should all talk to the central zk rather than the one running on the local node?
18:12:20  <swills>or should zk run on a different set of nodes than manatee/pgsql?
18:12:43  <swills>i guess i was expecting to run zk on each node and have each manatee talk to the local zk
18:13:18  <yunong_>swills: they can talk to any zk in the cluster
18:13:33  <yunong_>though you should provide manatee with all the zk ips -- the zk client will randomly pick one
18:13:43  <swills>oh, ok
18:13:52  <swills>the sample only shows one
18:14:58  * lloydde_quit (Ping timeout: 255 seconds)
18:16:54  * abraxas_quit (Ping timeout: 264 seconds)
18:17:05  <swills>yunong_: is the zoneid used anywhere except the status?
18:18:38  <yunong_>swills: let me check the source
18:19:40  <swills>sure, sorry, i should have just done that rather than bother you
18:20:01  <yunong_>swills: looks like it's used as an identifier for a manatee node.
18:20:09  <yunong_>you can name it to whatever you like.
18:20:57  <swills>oh, ok
18:21:09  <swills>i suppose they should at least be unique, eh? :)
18:25:15  <yunong_>yeah -- for your own sanity.
18:26:17  <swills>probably not totally insane to just stick the hostname in there for now
18:29:57  * marsellquit (Ping timeout: 240 seconds)
18:39:11  <swills>http://pastebin.com/4chBw2Gn
18:39:13  <swills>boom!
18:40:22  * dap_joined
18:42:39  * marselljoined
18:42:57  <yunong_>nice!
18:44:53  <swills>so right now i have started the sitter/snapshotter/backupserver manually as root because we don't have smf
18:45:03  <swills>i can create init scripts, but i wonder about what user they are running as
18:45:26  <swills>that and a few other minor issues and i should be able to package this up
18:46:28  <swills>interesting, on the async node i didn't have a leftover fs
18:46:33  <swills>tho df is confused again
18:47:18  <dap_>swills: Cool! FYI, we're making some substantial changes to some of the guts of manatee, but they're not ready to integrate yet. Design details are at github.com/joyent/manatee-design/, and we're working in branch MANATEE-188 of the manatee repo.
18:48:34  <nahamu>sticking with ZK rather than switching to e.g. raft...
18:48:44  <dap_>nahamu: one thing at a time ;)
18:49:19  <swills>dap_: yeah, someone mentioned the branch, thanks
18:49:27  <dap_>Cool
18:49:31  <swills>i'm not sure what the changes will mean for me yet, haven't had time to look too closely yet
18:50:49  <swills>i wonder if you guys have looked at consul or how it might fit in?
18:51:11  <yunong_>it runs on go right?
18:51:28  <swills>well, it's written in go
18:51:36  <swills>the go runtime is just standard binaries
18:53:09  <nahamu>if it uses any C libraries, cgo doesn't work on illumos yet...
18:53:30  <nahamu>("it" being consul... if it's pure go, it's probably fine)
18:53:39  <swills>i don't think ot uses any external libs
18:54:11  <nfitch>nahamu: You're a little behind the times. We're really, really, close (and perhaps done) getting cgo working in illumos.
18:54:17  <nfitch>:)
18:54:17  <swills>oh good
18:54:41  <nfitch>That said, it didn't work at the time of Manta being written.
18:54:45  <nfitch>So… zookeeper.
18:55:04  <swills>*nod* fair enough
18:55:12  <swills>i just found out about consul myself, but i kinda like it
18:55:29  <swills>but then i tend to like Go stuff better than Java
18:56:18  * chorrellquit (Quit: Textual IRC Client: www.textualapp.com)
19:02:51  <nahamu>nfitch: poor Keith. ;)
19:03:05  <nahamu>good for the rest of us, though...
19:03:42  <nfitch>nahamu: Indeed, watching it was painful.
19:04:53  * Aramjoined
19:11:06  * lloydde_joined
19:11:17  * pgalejoined
19:15:16  * lloydde_quit (Ping timeout: 255 seconds)
19:20:54  * pgalequit (Ping timeout: 240 seconds)
19:29:52  * pgalejoined
19:33:43  * chorrelljoined
19:38:44  * pgalequit (Quit: Leaving.)
20:00:37  * abraxas_joined
20:05:31  * abraxas_quit (Ping timeout: 265 seconds)
20:12:09  * lloyddejoined
20:16:25  * lloyddequit (Ping timeout: 240 seconds)
20:20:01  * ed209quit (Remote host closed the connection)
20:20:19  * ed209joined
20:28:57  * chorrell_joined
20:28:58  * chorrellquit (Read error: Connection reset by peer)
20:29:55  * chorrell_quit (Client Quit)
20:30:12  * chorrelljoined
20:32:23  * axisysquit (Remote host closed the connection)
20:38:18  * axisysjoined
20:39:19  * axisysquit (Changing host)
20:39:19  * axisysjoined
20:49:11  * bahamas10quit (Ping timeout: 272 seconds)
20:51:06  * bahamas10joined
20:55:31  * pmooneyquit (Ping timeout: 272 seconds)
21:07:33  * chorrellquit (Quit: Textual IRC Client: www.textualapp.com)
21:09:27  * bahamas10quit (Ping timeout: 272 seconds)
21:11:31  * bahamas10joined
21:12:40  * lloyddejoined
21:16:57  * lloyddequit (Ping timeout: 240 seconds)
21:21:14  * pmooneyjoined
21:43:20  * chorrelljoined
22:13:26  * lloyddejoined
22:17:58  * lloyddequit (Ping timeout: 255 seconds)
22:21:38  * chorrellquit (Quit: Textual IRC Client: www.textualapp.com)
23:14:17  * lloyddejoined
23:18:43  * lloyddequit (Ping timeout: 255 seconds)
23:38:28  * abraxas_joined
23:43:18  * abraxas_quit (Ping timeout: 264 seconds)
23:53:35  * bahamas10quit (Ping timeout: 244 seconds)