Shrinking Archive

After the Muskification of Twitter, large Mastodon servers going down, the Mastodon Migration and discussions in and around Indieweb, I’ve concluded:

1. Not everything has to stay online forever.
2. If something is on my own site *and* a social network, the main site is the canonical one.
3. I don’t need to keep extra copies of things I already have.

So, new approach to this site (and the networks it archives)

* I’m removing a lot of duplicates from Twitter etc. (Reading Les Mis is almost all bits that I incorporated into articles.)
* I don’t need to keep duplicates here. This includes crappy phone pics posted on Instagram/Twitter before I posted the good pics on Flickr or my blog.
* Canonical pages on my site can link directly to the remote copies or plugs (potentially picking up replies via bridgy), in which case I don’t need to keep a copy here.
* There’s a lot of stuff I don’t really care about keeping, or that no longer matters because it’s a fragment of a conversation with someone whose account has closed.

In short: This only needs to keep copies of things I actually want to keep, and then only if I don’t have them somewhere else more “permanent” like hyperborea.org or kvibber.com.

Update (1/26): I’ve been deactivating and/or majorly clearing out my Twitter accounts, currently planning to keep only KelsonV and SpeedForce, and those only ephemeral. I’ll continue to clear up the duplicates and no-longer-interesting/useful posts from here, and move some of them over to K2R. As for SpeedForce, I’ll probably do the same with a few of the older posts and bring some others in here.

Recent Updates: Prismo, Facebook, Reddit, Forums

Prismo (a federated link sharing site like Reddit) development is stalled, and the flagship instance is offline. I never did get around to setting up an auto-archive, partly because it doesn’t expose an RSS feed. One of these days I’ll write an ActivityPub subscriber that can just follow a channel and generate posts from it. Anyway, I copy-and-pasted the profile as seen in Mastodon & manually looked for the original URLs. Not sure why I bothered, but it felt like something was missing here.

I finally got back to my Facebook archive converter script that produces a WordPress XML file. I’ve been testing it on batches of simple posts, refining it as I find problems or more complex examples. I still need to decide what to do about comments, for instance, and figure out the best way to import photos automatically. I’m also skipping some of the duplicates (I cross-posted from Twitter for a while back in the day, and Instagram later on) and items that either didn’t export well (most sharing-someone-else’s-post only comes through with whatever commentary I added – if any) or aren’t useful (auto-generated notes from Pinterest, etc.)

A couple of months ago I took a look through my Reddit comments and copied a few dozen comments I thought were worth hanging onto, mostly comics-related. Same with the ComicBloc forums, though I only grabbed a few comments from the last year I was active on the site (2011). I’d like to go back further, but it’s not a huge priority right now.

Image auto-imports and Mastodon Boost links

While fine-tuning the iNaturalist import, I started auto-importing images to the blog from the image in the feed, using media_sideload_image(). Since Pixelfed has started including an embedded image in its feed, I did the same there.

And I finally bit the bullet and wrote a simple feed proxy to read my Mastodon Atom feeds and rearrange the elements that IFTTT doesn’t know to look for: images attached as enclosures, and the post URL for boosted posts. (It was picking up the author’s URL because they’re both link rel=”alternate” and instead of actually looking at the feed structure – like the fact that the author’s link was inside an <author> entity – it was just pulling the first one it found.)

So in theory, I should be able to let this run and it will automatically import not only my Instagram photos, but Pixelfed, Mastodon, and iNaturalist as well. Twitter’s a bit more complex because it doesn’t link directly to the photo URL, so I’ll have to retrieve the link and parse the HTML to find it. And multi-photo posts are still an issue on both Pixelfed (because the feed only includes the first image) and Instagram (because the importer doesn’t handle it quite right). But that’s still a big improvement for my usual use case, and a lot less manual adjusting I need to do!

Adding iNaturalist

I’ve been using iNaturalist for a couple of weeks, submitting photos of wild (or at least feral) animals and plants as observations. I decided to see if I could import that and record it here too (partly to get a sense of what I end up cross-posting), and I can at least get a rudimentary archive from the Atom feed: Photo, current identification, URL. So I’ve started importing that, and backporting the first 30 or so items manually since there aren’t too many.

Updates: Google+, Tumblr, Quora & Yelp

With an official target date for the Google+ shutdown, I’m going through the site looking for unique posts and expanded posts (ex: places where I wrote a full paragraph on Google+ and then trimmed it down for Twitter) and either importing or combining them here. I figure if I’m going to manually review everything anyway, I may as well do it as one pass. (Update: Finished Feb. 10)

I’ve merged most of the duplicates from Tumblr, though there’s still some that need cleaning up. I have been posting the occasional new item or reblog on the site since the import, and I’m thinking I may pull in those new posts at some point.

I’ve found links to Yelp reviews and Quora answers, and decided to add an Other category. I’m not archiving everything, though, just the better ones. I’ve only posted a handful of items on each site anyway.

Adding Tumblr to the Archive

Tumblr’s jettisoning a major part of their user base, which doesn’t give me much confidence in its future. So I’m importing everything up to now, even though very little of my Tumblr activity is unique to Tumblr. Just about all of it is cross-posted or notified from my blog, or Flickr, or somewhere else, or is a reblog of someone else’s Tumblr where I didn’t add anything except maybe new tags.

I guess cleaning up the duplicates will give me another ongoing project for when I’m bored.

Which reminds me, I should decide what I’m going to do with the rest of the Google Plus archive. I’d been manually importing the original stuff, and then ran into a wall when it came to deciding how to handle posts with interesting comments: Do I import the post without comments? Do I copy the comments into the post body? Do I finally take the time to write that import/converter script I thought about doing a year ago?

Archive Updates: PixelFed, Automation, GooglePlus, AltBrowser

I’ve automated a few more things: Splitting retweets/boosts/etc to a separate “repost” author, for instance, adding tags, and setting formats where there’s enough information.

PixelFed.Social is being archived now, though it’s got the same problem as Mastodon where the media link doesn’t appear in the section IFTTT knows about, so I still have to manually add the image later on.

The Instagram/Twitter merge is done.

I keep changing my mind about how I want to handle cases where I posted the same photo to several sites, but cropped or filtered it differently. For now I’m grouping by version, so if I post a square crop to Instagram (which goes to Facebook and Twitter) and a rectangular crop to PixelFed and Photog.Social, I’m merging the Instagram/Facebook/Twitter posts into one and the PixelFed/Photog.Social posts into another.

I pulled in the AltBrowser Twitter archive, fixing truncated items and timezone in the spreadsheet before importing it.

I still haven’t gotten around to writing that Google Plus Archive to WordPress XML converter, but I’ve started going through and manually importing just the original posts at Google Plus. Not the auto-links to my blogs, except in a few cases where there’s a comment thread I want to be able to find again. I’ve found a few photos that I haven’t posted anywhere else, and at least one post that I later cross-posted to K2R.

Lately I’ve been phasing out Facebook and Twitter, and moved my primary Mastodon presence from Mastodon.Social to Wandering.Shop. The last few months, most of my Facebook posts have been cross-posts from other services, and now that they’ve shut down all of those (except Instagram, of course, since they own it), I basically haven’t been there at all.

Added Speed Force’s Instagram (most of it)

I originally intended for the SpeedForceOrg Instagram account to represent the Speed Force blog, but 90% of it has ended up being me posting photos from comic cons. Another 5% is me posting something that’s halfway between my comics-fan persona and the blog’s editorial voice.

So I’ve imported it here using the same tool that I’m using for my personal Instagram. I pulled the copies of the handful of photos that belong to another member of the site.

At this point I’m not planning on importing the SpeedForceOrg Twitter account because it’s more of an editorial voice. Well, it is these days…maybe I should bring in some of the early stuff from when it was also my comics fan persona.

Added Tumblr: First Pass

I imported one of my Tumblr blogs using WordPress’ Tumblr importer. Re-Reading Les Mis started out as just a mirror of the corresponding blog, but I did the occasional image post or repost-with-commentary, and after finishing the original series covering the entire book, I started posting excerpts & follow-up commentary, some of which made it back to the source.

I’ve updated my What’s in your archive? post. The importer does a good job of transferring your blog directly from Tumblr to a WordPress blog. It even imports images (though sometimes it imports a single-image post as a gallery for some reason). The original URL is stored in a custom field, and you can leave it connected and import new items when you want to bring them in.

Some gotchas: It can only map to one author, but you get to choose which one. It puts everything in the default category. Videos don’t get imported, even if you’ve just embedded a YouTube video.

There are only ~150 posts in the Les Mis Tumblr, and not much overlap with other material (though again, a lot of the early stuff is just mirroring). There are closer to 1500 1700 on my main Tumblr, and the vast majority of it is automatic shares from K2R, Instagram, or Flickr.

Added: Twitter Archive & Google Buzz

I exported my entire Twitter archive, and imported the CSV here using WP All Import. Over 7000 posts going back to 2008. I pulled them in as drafts, rather than publishing them directly, to make sure that the import worked properly, which means I’m going to have to look through all 7000 (maybe not the best idea).

Retweets are a little weird, and sometimes cut off a bit. Timestamps are in UTC, but the oldest timestamps are only datestamps, which means there’s no order info within a day except for the post IDs.

I’m not sure what I want to do about “New blog post” or “Flashback of the day” plugs. The actual content is already on another site that I own, so an extra copy of the links doesn’t really make much difference unless there’s a comment thread.

While I was at it, I wondered whether I had an archive of Google Buzz. It turns out I do: Google automatically put it on my drive when they shut down the site. It’s in the form of a bunch of PDFs, but you can’t import them easily. Why? The text is scrambled, with a custom font that unscrambles it. So I had to retype entries instead of copying and pasting. Fortunately there were only a handful of native posts — the vast majority of the pages and pages and pages of posts were imported from Twitter, Flickr, or one of my blogs — but I had to search for “Dw¦¦” instead of “Buzz” to find them.

New Archive: Facebook, Twitter, Mastodon, Instagram

I’ve set up this site to archive my third-party social networking posts on a site that I control and can easily search. For now I’m setting up the following networks to archive here using IFTTT:

  • Facebook (public posts)
  • Twitter
  • Instagram
  • Mastodon.Social
  • Photog.Social (a Mastodon instance dedicated to photography)

I’m less concerned with keeping everything in its original form, and more concerned with being able to find it (and work my way back to comment threads), so I plan on removing/combining duplicates as I find them, cleaning up links, etc. but I don’t want to get too complicated with it.