taupe 1.2.0

Creator: bradpython12

Last updated:

0 purchases

taupe 1.2.0 Image
taupe 1.2.0 Images

Languages

Categories

Add to Cart

Description:

taupe 1.2.0

Taupe
A simple program to extract the URLs of your tweets, retweets, replies, quote tweets, and "likes" from a personal Twitter archive.


Table of contents

Introduction
Installation
Usage
Known issues and limitations
Getting help
Contributing
License
Acknowledgments

Introduction
When you download your personal Twitter archive, you receive a ZIP file. The contents are not necessarily in a format convenient for doing something with them. For example, you may want to send the URLs to the Wayback Machine at the Internet Archive or do something else with the URLs. For tasks like that, you need to extract URLs from your Twitter archive. That's the purpose of Taupe.
Taupe (a loose acronym of Twitter archive URL parser) takes a Twitter archive ZIP file, extracts the URLs corresponding to your tweets, retweets, replies, quote tweets, and liked tweets, and outputs the results in a comma-separated values (CSV) format that you can easily use with other software tools. Once you have installed it, using taupe is easy:
# Extract tweets, retweets, replies, and quote tweets:
taupe /path/to/your/twitter-archive.zip

# Extract likes:
taupe --extract likes /path/to/your/twitter-archive.zip

# Learn more:
taupe --help

Installation
There are multiple ways of installing Taupe. Please choose the alternative that suits you.
Alternative 1: installing Taupe using pipx
Pipx lets you install Python programs in a way that isolates Python dependencies, and yet the resulting taupe command can be run from any shell and directory – like any normal program on your computer. If you use pipx on your system, you can install Taupe with the following command:
pipx install taupe

Pipx can also let you run Taupe directly using pipx run taupe, although in that case, you must always prefix every Taupe command with pipx run. Consult the documentation for pipx run for more information.
Alternative 2: installing Taupe using pip
You should be able to install taupe with pip for Python 3. To install taupe from the Python package repository (PyPI), run the following command:
python3 -m pip install taupe

As an alternative to getting it from PyPI, you can use pip to install taupe directly from GitHub:
python3 -m pip install git+https://github.com/mhucka/taupe.git

If you already installed Taupe once before, and want to update to the latest version, add --upgrade to the end of either command line above.
Alternative 3: installing Taupe from sources
If you prefer to install Taupe directly from the source code, you can do that too. To get a copy of the files, you can clone the GitHub repository:
git clone https://github.com/mhucka/taupe

Alternatively, you can download the software source files as a ZIP archive directly from your browser using this link: https://github.com/mhucka/taupe/archive/refs/heads/main.zip
Next, after getting a copy of the files, run setup.py inside the code directory:
cd taupe
python3 setup.py install

Usage
If the installation process described above is successful, you should end up with a program named taupe in a location where software is normally installed on your computer. Running taupe should be as simple as running any other command-line program. For example, the following command should print a helpful message to your terminal:
taupe --help

If not given the option --help or --version, this program expects to be given a personal Twitter archive file, either on the command line (as an argument) or on standard input (from a pipe or file redirection). Here's an example (and note this path is fake – substitute a real path on your computer when you do this!):
taupe /path/to/twitter-archive.zip

The URLs produced by taupe will be, by default, as they appear in the archive. If you want to normalize the URLs into the canonical form https://twitter.com/twitter/status/TWEETID, use the option --canonical-urls (-c for short):
taupe -c /path/to/twitter-archive.zip

The structure of the output
The option --extract controls both the content and the format of the output. The following options are recognized:



Value
Synonym
Output




all-tweets
tweets
CSV table with all tweets and details (default)


my-tweets

list of URLs of only your original tweets


retweets

list of URLs of tweets that are retweets


quoted-tweets
quote-tweets
list of URLs of other tweets you quoted


replied-tweets
reply-tweets
list of URLs of other tweets you replied to


liked
likes
list of URLs of tweets you "liked"



all-tweets
When using --extract all-tweets (the default), taupe produces a table with four columns. Each row of the table corresponds to a type of event in the Twitter timeline: a tweet, a retweet, a reply to another tweet, or a quote tweet. The values in the columns provide details about the event. The following is a summary of the structure:



Column 1
Column 2
Column 3
Column 4




tweet timestamp in ISO format
The URL of the tweet
The type; one of tweet, reply, retweet, or quote
(For type reply or quote.) The URL of the original or source tweet



The last column only has a value for replies and quote-tweets; in those cases, the URL in the column refers to the tweet being replied to or the tweet being quoted. The fourth column does not have a value for retweets even though it would be desirable, because the Twitter archive – strangely – does not provide the URLs of retweeted tweets.
Here is an example of the output:
2022-09-21T22:36:29+00:00,https://twitter.com/mhucka/status/1572716422857658368,quote,https://twitter.com/poppy_northcutt/status/1572714310077673472
2022-10-10T22:04:20+00:00,https://twitter.com/mhucka/status/1579593701965582336,reply,https://twitter.com/arfon/status/1579572453726355456
2022-10-14T04:17:01+00:00,https://twitter.com/mhucka/status/1580774654217625600,tweet
2022-10-25T14:49:06+00:00,https://twitter.com/mhucka/status/1584919989307715586,retweet
...

my-tweets
When using --extract my-tweets, the output is just a single column (a list) of URLs, one per line, of just your original tweets. This list corresponds exactly to column 2 in the --extract all-tweets case above.
retweets
When using --extract retweets, the output is a single column (a list) of URLs, one per line, of tweets that are retweets of other tweets. This list corresponds to the values of column 2 above when the type is retweet. Important: the Twitter archive does not contain the original tweet's URL, only the URL of your retweet. Consequently, the output for --extract retweets is your retweet's URL, not the URL of the source tweet.
quoted-tweets
When using --extract quoted-tweets, the output is a list of the URLs of other tweets that you have quoted. It corresponds to the subset of column 4 values above when the type is "quote". Note that these are the source tweet URLs, not the URLs of your tweets.
replied-tweets
When using --extract replied-tweets, the output is a list of the URLs of other tweets that you have replied to. It corresponds to the subset of column 4 values above when the type is "reply". Note that these are the source tweet URLs, not the URLs of your tweets.
likes
When using the option --extract likes, the output will only contain one column: the URLs of the "liked" tweets. taupe cannot provide more detail because the Twitter archive format does not contain date/time information for "likes". (This is also why "likes" are not part of the output when --extract all-tweets is used – there is no possible value for column 1.)
Here is an example of the output when using --extract likes in combination with --canonical-urls:
https://twitter.com/twitter/status/1588146224376463365
https://twitter.com/twitter/status/1588349144803905536
https://twitter.com/twitter/status/1590475356976578560
...

Other options recognized by taupe
Running taupe with the option --help will make it print help text and exit without doing anything else.
The option --output controls where taupe writes the output. If the value given to --output is - (a single dash), the output is written to the terminal (stdout). Otherwise, the value must be a file.
If given the --version option, this program will print its version and other information, and exit without doing anything else.
If given the --debug argument, taupe will output a detailed trace of what it is doing. The debug trace will be sent to the given destination, which can be - to indicate console output, or a file path to send the debug output to a file.
Summary of command-line options
The following table summarizes all the command line options available.



Short     
Long form opt  
Meaning
Default





-c
--canonical-urls
Normalize Twitter URLs
Leave as-is



-h
--help
Print help info and exit




-eE
--extractE
Extract URL type E
all-tweets



-oO
--outputO
Write output to file O
Terminal



-V
--version
Print program version & exit




-@OUT
--debugOUT
Write debug output to OUT





⚑   Recognized values: all-tweets, tweets, my-tweets, retweets, quoted-tweets, replied-tweets, and likes. See section above for more information.
✦   To write to the console, you can also use the character - as the value of O; otherwise, O must be the name of a file where the output should be written.
⚐   To write to the console, use the character - as the value of OUT; otherwise, OUT must be the name of a file where the output should be written.
Known issues and limitations
This program assumes that the Twitter archive ZIP file is in the format which Twitter produced in mid-November 2022. Twitter probably used a different format in the past, and may change the format again in the future, so taupe may or may not work on Twitter archives obtained in different historical periods.
The Twitter archive format for "likes" contains only the tweet identifier and the text of the tweet; consequently, taupe cannot provide date/time information for this case.
This program does all its work in memory, which means that taupe's ability to process a given archive depends on its size and how much RAM the computer has. It has only been tested with modest-sized archives. It is unknown how it will behave with exceptionally large archives.
Getting help
If you find a problem or have a request or suggestion, please submit it in the GitHub issue tracker for this repository.
Contributing
I would be happy to receive your help and participation if you are interested. Everyone is asked to read and respect the code of conduct when participating in this project. Please feel free to report issues or do a pull request to fix bugs or add new features.
License
This software is Copyright (C) 2022, by Michael Hucka. This software is freely distributed under the MIT license. Please see the LICENSE file for more information.
Acknowledgments
This work is a personal project developed by the author, using computing equipment owned by the California Institute of Technology Library.
The vector artwork of a bird, used as the icon for this repository, was created by Noe Araujo from the Noun Project. It is licensed under the Creative Commons CC-BY 3.0 license. I manually changed the color to be a shade of taupe.
Taupe uses multiple other open-source packages, without which it would have taken much longer to write the software. I want to acknowledge this debt. In alphabetical order, the packages are:

Aenum – Python package for advanced enumerations
CommonPy – a collection of commonly-useful Python functions
Plac – a command line argument parser
Rich – library for writing styled text to the terminal
Sidetrack – simple debug logging/tracing package
Twine – utilities for publishing Python packages on PyPI

License

For personal and professional use. You cannot resell or redistribute these repositories in their original state.

Files In This Product:

Customer Reviews

There are no reviews.