Prismo (a federated link sharing site like Reddit) development is stalled, and the flagship instance is offline. I never did get around to setting up an auto-archive, partly because it doesn’t expose an RSS feed. One of these days I’ll write an ActivityPub subscriber that can just follow a channel and generate posts from it. Anyway, I copy-and-pasted the profile as seen in Mastodon & manually looked for the original URLs. Not sure why I bothered, but it felt like something was missing here.
I finally got back to my Facebook archive converter script that produces a WordPress XML file. I’ve been testing it on batches of simple posts, refining it as I find problems or more complex examples. I still need to decide what to do about comments, for instance, and figure out the best way to import photos automatically. I’m also skipping some of the duplicates (I cross-posted from Twitter for a while back in the day, and Instagram later on) and items that either didn’t export well (most sharing-someone-else’s-post only comes through with whatever commentary I added – if any) or aren’t useful (auto-generated notes from Pinterest, etc.)
A couple of months ago I took a look through my Reddit comments and copied a few dozen comments I thought were worth hanging onto, mostly comics-related. Same with the ComicBloc forums, though I only grabbed a few comments from the last year I was active on the site (2011). I’d like to go back further, but it’s not a huge priority right now.
While fine-tuning the iNaturalist import, I started auto-importing images to the blog from the image in the feed, using media_sideload_image(). Since Pixelfed has started including an embedded image in its feed, I did the same there.
And I finally bit the bullet and wrote a simple feed proxy to read my Mastodon Atom feeds and rearrange the elements that IFTTT doesn’t know to look for: images attached as enclosures, and the post URL for boosted posts. (It was picking up the author’s URL because they’re both link rel=”alternate” and instead of actually looking at the feed structure – like the fact that the author’s link was inside an <author> entity – it was just pulling the first one it found.)
So in theory, I should be able to let this run and it will automatically import not only my Instagram photos, but Pixelfed, Mastodon, and iNaturalist as well. Twitter’s a bit more complex because it doesn’t link directly to the photo URL, so I’ll have to retrieve the link and parse the HTML to find it. And multi-photo posts are still an issue on both Pixelfed (because the feed only includes the first image) and Instagram (because the importer doesn’t handle it quite right). But that’s still a big improvement for my usual use case, and a lot less manual adjusting I need to do!
I’ve been using iNaturalist for a couple of weeks, submitting photos of wild (or at least feral) animals and plants as observations. I decided to see if I could import that and record it here too (partly to get a sense of what I end up cross-posting), and I can at least get a rudimentary archive from the Atom feed: Photo, current identification, URL. So I’ve started importing that, and backporting the first 30 or so items manually since there aren’t too many.
With an official target date for the Google+ shutdown, I’m going through the site looking for unique posts and expanded posts (ex: places where I wrote a full paragraph on Google+ and then trimmed it down for Twitter) and either importing or combining them here. I figure if I’m going to manually review everything anyway, I may as well do it as one pass. (Update: Finished Feb. 10)
I’ve merged most of the duplicates from Tumblr, though there’s still some that need cleaning up. I have been posting the occasional new item or reblog on the site since the import, and I’m thinking I may pull in those new posts at some point.
I’ve found links to Yelp reviews and Quora answers, and decided to add an Other category. I’m not archiving everything, though, just the better ones. I’ve only posted a handful of items on each site anyway.
Tumblr’s jettisoning a major part of their user base, which doesn’t give me much confidence in its future. So I’m importing everything up to now, even though very little of my Tumblr activity is unique to Tumblr. Just about all of it is cross-posted or notified from my blog, or Flickr, or somewhere else, or is a reblog of someone else’s Tumblr where I didn’t add anything except maybe new tags.
I guess cleaning up the duplicates will give me another ongoing project for when I’m bored.
Which reminds me, I should decide what I’m going to do with the rest of the Google Plus archive. I’d been manually importing the original stuff, and then ran into a wall when it came to deciding how to handle posts with interesting comments: Do I import the post without comments? Do I copy the comments into the post body? Do I finally take the time to write that import/converter script I thought about doing a year ago?
I’ve automated a few more things: Splitting retweets/boosts/etc to a separate “repost” author, for instance, adding tags, and setting formats where there’s enough information.
PixelFed.Social is being archived now, though it’s got the same problem as Mastodon where the media link doesn’t appear in the section IFTTT knows about, so I still have to manually add the image later on.
The Instagram/Twitter merge is done.
I keep changing my mind about how I want to handle cases where I posted the same photo to several sites, but cropped or filtered it differently. For now I’m grouping by version, so if I post a square crop to Instagram (which goes to Facebook and Twitter) and a rectangular crop to PixelFed and Photog.Social, I’m merging the Instagram/Facebook/Twitter posts into one and the PixelFed/Photog.Social posts into another.
I pulled in the AltBrowser Twitter archive, fixing truncated items and timezone in the spreadsheet before importing it.
I still haven’t gotten around to writing that Google Plus Archive to WordPress XML converter, but I’ve started going through and manually importing just the original posts at Google Plus. Not the auto-links to my blogs, except in a few cases where there’s a comment thread I want to be able to find again. I’ve found a few photos that I haven’t posted anywhere else, and at least one post that I later cross-posted to K2R.
Lately I’ve been phasing out Facebook and Twitter, and moved my primary Mastodon presence from Mastodon.Social to Wandering.Shop. The last few months, most of my Facebook posts have been cross-posts from other services, and now that they’ve shut down all of those (except Instagram, of course, since they own it), I basically haven’t been there at all.
Currently in place:
- Primary Twitter, Instagram, Mastodon.Social, Photog.Social, Google Buzz (past only), LinkedIn (past only). Facebook (forward only, moderated)
- ReadingLesMis Twitter, Tumblr (past only).
- SpeedForce Instagram.
I’ve been slowly going through & merging the Twitter/Instagram posts, because I feel like there’s no point in keeping a link to the image/description separate from the actual image/description. This messes up the times a little – I’m going to have to fix the Twitter timestamps to adjust them to the right zone at some point.
I’ve manually copied in a few Google Plus posts based on topic. I keep telling myself I need to stop that and actually write the G+-archive-to WordPress XML converter I’ve been meaning to do, since that’ll take care of all the rest at once, including comments.
I can probably pull in the AltBrowser Twitter archive at any time. Fix the timezone in the spreadsheet, then import it using the processing filters I’ve set up.
Still undecided on LOL Spam (Twitter & Tumblr). Thinking I might import it to a dedicated site & make a bot to auto-post old stuff & add new stuff as I see it.
Personal Tumblr…honestly I’m not sure how much original stuff is on there. Most of it is blog autoposting, with the occasional reblog of something I wanted to boost, and the very occasional reply. I suppose I could import it & find out. It’s a heck of a lot easier to search my own stuff on WordPress than Tumblr (which is half of why I’m doing this).
I originally intended for the SpeedForceOrg Instagram account to represent the Speed Force blog, but 90% of it has ended up being me posting photos from comic cons. Another 5% is me posting something that’s halfway between my comics-fan persona and the blog’s editorial voice.
So I’ve imported it here using the same tool that I’m using for my personal Instagram. I pulled the copies of the handful of photos that belong to another member of the site.
At this point I’m not planning on importing the SpeedForceOrg Twitter account because it’s more of an editorial voice. Well, it is these days…maybe I should bring in some of the early stuff from when it was also my comics fan persona.
I imported one of my Tumblr blogs using WordPress’ Tumblr importer. Re-Reading Les Mis started out as just a mirror of the corresponding blog, but I did the occasional image post or repost-with-commentary, and after finishing the original series covering the entire book, I started posting excerpts & follow-up commentary, some of which made it back to the source.
I’ve updated my What’s in your archive? post. The importer does a good job of transferring your blog directly from Tumblr to a WordPress blog. It even imports images (though sometimes it imports a single-image post as a gallery for some reason). The original URL is stored in a custom field, and you can leave it connected and import new items when you want to bring them in.
Some gotchas: It can only map to one author, but you get to choose which one. It puts everything in the default category. Videos don’t get imported, even if you’ve just embedded a YouTube video.
There are only ~150 posts in the Les Mis Tumblr, and not much overlap with other material (though again, a lot of the early stuff is just mirroring). There are closer to
1500 1700 on my main Tumblr, and the vast majority of it is automatic shares from K2R, Instagram, or Flickr.
After the Twitter export, I decided to find out what various networks offer in their archives and see what else I could import. LinkedIn turned out to be surprisingly thorough, and since I only had about 80 shares (mostly linkblogging), I imported them all. I’ve merged some of them with Twitter posts if they linked to the same source on the same day (or near enough).
There doesn’t seem to be a good way to archive LinkedIn going forward, though, since there’s no RSS support anymore and IFTTT only has actions, no triggers for it. But then I made my first post in three years last week, so it’s not exactly something I use frequently.
I did an initial import of my Google+ posts, but it didn’t complete, and the formatting ended up being waaay too messy to clean up manually on any sort of scale. My plan is to write a script to convert entries to WordPress’ XML format, which will not only give me more flexibility in which pieces to include & how (such as using the same blockquote format for link excerpts that I’m using elsewhere), it will let me import the comments as comments instead of including them inline in the post.
While looking for overlaps with the LinkedIn posts, I found a few Google+ entries in the incomplete import, and I figured I might as well merge them now. So I’ve got 8 out of about 800.
The Twitter backlog continues, slowly. I’m wondering if merging threads isn’t such a good idea after all. I definitely should’ve done some pre-processing on the CSV to at least unwrap shortened URLs and make a copy of the text with no URLs for the titles. I should still be able to automate it inside WP, I just need to brush up on PHP & WordPress programming again.
I haven’t even tried to import my Facebook archive yet. Too much pre-processing to do.
I exported my entire Twitter archive, and imported the CSV here using WP All Import. Over 7000 posts going back to 2008. I pulled them in as drafts, rather than publishing them directly, to make sure that the import worked properly, which means I’m going to have to look through all 7000 (maybe not the best idea).
Retweets are a little weird, and sometimes cut off a bit. Timestamps are in UTC, but the oldest timestamps are only datestamps, which means there’s no order info within a day except for the post IDs.
I’m not sure what I want to do about “New blog post” or “Flashback of the day” plugs. The actual content is already on another site that I own, so an extra copy of the links doesn’t really make much difference unless there’s a comment thread.
While I was at it, I wondered whether I had an archive of Google Buzz. It turns out I do: Google automatically put it on my drive when they shut down the site. It’s in the form of a bunch of PDFs, but you can’t import them easily. Why? The text is scrambled, with a custom font that unscrambles it. So I had to retype entries instead of copying and pasting. Fortunately there were only a handful of native posts — the vast majority of the pages and pages and pages of posts were imported from Twitter, Flickr, or one of my blogs — but I had to search for “Dw¦¦” instead of “Buzz” to find them.
Something I’ve wanted for a while is the ability to quickly post to social networks without getting onto the site or app where I can be sucked in my the timeline. Sure, you can log into Twitter just long enough to post and then close the site, but it takes effort.
So I set up a Broadcast category on here, this time using IFTTT to post to various sites. I’m starting with Twitter, Facebook, and Mastodon.social (using the webhooks method I tested with Pocket-based linkblogging).
Of course they end up coming back here, but the broadcast rules I set up on IFTTT only run on the specific category, so it boomerangs, but doesn’t loop.
Not sure how much I’ll actually use this, come to think of it.
I’ve set up this site to archive my third-party social networking posts on a site that I control and can easily search. For now I’m setting up the following networks to archive here using IFTTT:
- Facebook (public posts)
- Photog.Social (a Mastodon instance dedicated to photography)
I’m less concerned with keeping everything in its original form, and more concerned with being able to find it (and work my way back to comment threads), so I plan on removing/combining duplicates as I find them, cleaning up links, etc. but I don’t want to get too complicated with it.