aboutsummaryrefslogtreecommitdiff
path: root/blog/youtube_to_rss.md
blob: 89bd3d7a8503ba88fb155818d8160cf6b54e61ee (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
# YouTube subscriptions to RSS feeds

I spend *waaaaaaaaaaaay* too much time on YouTube, 
this is mainly due to the recommendations (which are freakishly accurate).  
I generally go on my subscriptions page then get baited into watching garbage content from the recommendations.
Having RSS feeds for all my subscriptions will allow me to have more control over the content I consume.  
The videos will be streamed directly to my computer to avoid going on the website.

> Fun fact: I'm such a zoomer that I only discovered RSS recently, which is older than me (1995).

## Get all subscriptions

You could do some fancy stuff with the [YouTube API](https://developers.google.com/youtube/v3/docs/subscriptions/list) but it would be overkill.  
Instead go to <https://www.youtube.com/feed/channels>, scroll to the bottom 
(Good luck if you got 1000+ subscriptions, your scrolling finger wight get cramped).  
Right click on something that is not a link or an image and select `Save as`, 
give a name to the file (`channels.html` in my case) and save it.

### Parsing

The following command will retrieve all links to YouTube channels in the file we just saved.  
Replace `channels.html` with the name of the HTML file you saved.

```sh
grep -o -E 'href="https://www.youtube.com/(c|channel|user)/[a-zA-Z0-9 ]+"' \
        channels.html |
    sort |
    uniq |
    sed 's/href="\(.*\)"/\1/' > channel_urls
```

Some channels aren't prefixed with `/c/`, `/channel/` or `/user/` in the URL 
so you'll either have to add them manually or change the `grep` regex to accept all URL
which begin with `https://www.youtube.com/` and remove the non YouTube channel links.

> Those channels are pretty rare tho, on my 300+ subscriptions I only had 2 or 3 which fell in this category.

## Get each channel feed/name

Maybe RSS readers understand the HTML tag `<link rel="alternate" type="application/rss" .../>` 
but [newsboat](https://newsboat.org/) (which is the one I use) unfortunately doesn't.  
This **pipeline of hell** will get all channel name and feed, 
format them in lines that you can directly add to your `urls` file.

```sh
# stdbuf -oL force line buffering instead of output being buffered in pipe
#     this allows to see the process in real time
# xargs - pass each channel url to curl
# grep  - get channel name and feed url
# awk   - uniq without sorted lines
# sed   - join every 2 line 
# sed   - put feed url then the channel name in a comment after it
# sed   - trim surrounding whitespaces
# tee   - output to terminal and a urls file
# cat   - print line number

xargs -a channel_urls curl -s |
    stdbuf -oL grep -o -E \
        -e '<title>.* - YouTube</title>' \
        -e 'https://www\.youtube\.com/feeds/videos\.xml\?channel_id=[a-zA-Z0-9_-]+' |
    stdbuf -oL awk '!seen[$0]++' |
    stdbuf -oL sed 'N; s/\n/ /' |
    stdbuf -oL sed 's_\(.*\)<title>\(.*\) - YouTube<\/title>\(.*\)_\1\3  # \2_' |
    stdbuf -oL sed 's/^ *\(.*\) *$/\1/' |
    stdbuf -oL tee /dev/stderr 2> urls |
    cat -n
```

The `urls` file should look something like this:

```
...
https://www.youtube.com/feeds/videos.xml?channel_id=UCS0N5baNlQWJCUrhCEo8WlA  # Ben Eater
https://www.youtube.com/feeds/videos.xml?channel_id=UCld68syR8Wi-GY_n4CaoJGA  # Brodie Robertson
https://www.youtube.com/feeds/videos.xml?channel_id=UCrv269YwJzuZL3dH5PCgxUw  # CodeParade
https://www.youtube.com/feeds/videos.xml?channel_id=UCSX3MR0gnKDxyXAyljWzm0Q  # Computer Science
...
```

### Choose the channels to add to your feeds

Now comes the tedious and cringe part where you need to through all your old and obscure subscription,
filtering the bad ones out.  

> Pro tip: If you want to automate this part you can ask Google to do it for you, 
> they know you better than yourself by now.

## Stream videos to your computer

### Dependencies

* [youtube-dl](https://youtube-dl.org/) - download video from YouTube and other websites
* [mpv](https://mpv.io/) - simple video player

### Stream

Replace `<link>` in the command bellow and you should have a video streaming in `mpv`.

```
youtube-dl -o - <link> | mpv -
```

> You can use the video player you want, just read the man to know how to read a video from standard input
> (e.g for `vlc` it's `vlc -`)

### Newsboat macro

Add the following to `~/.newsboat/config`

```
macro v set browser "youtube-dl -o - %u | mpv -"; open-in-browser; set browser chromium
```

Now you can press `,` and `v` on a video feed entry to stream it to your player.

## Sources

* [Luke Smith video on RSS feeds](https://www.youtube.com/watch?v=hMH9w6pyzvU)
* `man stdbuf`
* [Line buffering with pipes](https://stackoverflow.com/questions/293278)
* [Remove duplicate lines without sorting?](https://stackoverflow.com/questions/11532157)
* [How to merge every two lines into one from the command line?](https://stackoverflow.com/questions/9605232)
* [RSS Wikipedia page](https://wikipedia.org/wiki/RSS)
* [Arch wiki - newsboat - pass article to external command](https://wiki.archlinux.org/index.php/Newsboat#Pass_article_URL_to_external_command)