Now Reading
How one can learn Hacker Information threads with most up-to-date feedback first

How one can learn Hacker Information threads with most up-to-date feedback first

2023-03-23 18:12:08

Hacker News shows feedback in a tree. This may be irritating if you wish to preserve observe of a selected dialog, as you consistently have to hunt by way of the tree to search out the most recent remark.

I solved this downside in three other ways at the moment. I am going to element them in rising orders of complexity (which frustratingly is the reverse order to how I figured them out!)

The simplest approach: Algolia search

The official Hacker Information search makes use of Algolia, with a consistently up to date index.

When you seek for story:35111646, filter for feedback after which order by date you may get the feedback for that specific story, most up-to-date on the prime. Downside solved!

Here is the URL – edit the story ID in that URL (or on the web page) to view feedback for a special story.

Screenshot of those search results

The Algolia search_by_date API

The Algolia Hacker Information API is documented right here:

Notice that it is a separate system from the official Hacker Information API, which is powered by Firebase, would not present any search or filtering endpoints and is documented at

To retrieve all feedback on a selected story ordered by date, most up-to-date first as JSON you may hit this endpoint:,story_35111646&hitsPerPage=1000

The tags= parameter does the entire work right here – we’re asking it for gadgets of kind remark which were tagged with story_35111646.

This returns 20 outcomes by default. I’ve added &hitsPerPage=1000 to get again the utmost of 1,000.

This provides you again JSON, however how can we flip that into one thing that is extra readable in our browser?

Loading that into Datasette Lite

I’ll use Datasette Lite, my construct of Datasette working immediately within the browser utilizing Python compiled to WebAssembly. Extra on how that works here.

That is attainable as a result of the Algolia API returns JSON with a access-control-allow-origin: * CORS header, permitting that knowledge to be loaded by different net purposes working on totally different domains.

When you go Datasette Lite a ?json= parameter with the URL to a JSON file that returns a listing of objects, it can use sqlite-utils to load that JSON right into a SQLite desk with a column for every of the keys in these objects.

This URL masses that knowledge:

We are able to navigate to the search_by_date desk to browse and filter the feedback.

The search_by_date table in Datasette Lite

Let’s make some enhancements to the way it’s displayed utilizing a customized SQL question and a few Datasette plugins.

Datasette Lite helps some Datasette plugins – you may add ?set up=name-of-plugin to the URL to put in them immediately from PyPI.

I’ll load two plugins:

Here is a URL that masses these two plugins:

Now this is a customized SQL question that makes the feedback a bit nicer to learn when they’re displayed by Datasette:

  ) as textual content,
order by
  created_at desc

This link will execute that SQL question towards the info in Datasette Lite.

One final trick: it could be neat if we might click on by way of from the outcomes to the touch upon Hacker Information. Here is easy methods to add that, utilizing a trick enabled by datasette-json-html:

    'label', objectID,
    'href', '' || objectID
  ) as hyperlink,
  ) as textual content,
order by
  created_at desc

See the results of that here.

This provides hyperlink to every remark as the primary column within the desk.

It really works by constructing a JSON string {"label": "35123521", "href": ""} – the plugin then renders that as a hyperlink when the desk is displayed, utilizing Datasette’s render_cell() plugin hook.

SQL query results with a link in the first column

Probably the most difficult resolution, with json_tree()

My first try at fixing this was by far essentially the most complicated.

Earlier than I explored the search_by_date API I noticed that Algolia affords a gadgets API, which returns ALL of the content material for a thread in a large nested JSON object:

Strive that now and you will see that the highest degree object has this form {"id": ..., "youngsters": [...]} – with that `”youngsters” array containing an additional nested array of objects representing the entire thread.

Datasette Lite’s ?json= parameter expects an array of objects. However… for those who give it a top-level object which has a key that’s itself an array of objects, it can load the objects from that array as an alternative.

Which suggests passing it the above URL ends in a desk with a row for every of the top-level feedback on that merchandise… plus a youngsters column with the JSON string of every of their descendents.

You possibly can attempt that right here:

See Also

SQLite has a robust suite of JSON functions, plus the flexibility to execute recursive CTEs – certainly it could be attainable to put in writing a question that flattens that nested construction right into a desk with a row for every remark?

I spent fairly a little bit of time on this. Ultimately I spotted that you do not even want a recursive CTE for this – you should use the json_tree() operate offered by SQLite as an alternative.

Here is the question I got here up with:

with gadgets as (choose * from [35111646]),
outcomes as (
  json_extract(worth, '$.id') as id,
  json_extract(worth, '$.created_at') as created_at,
  json_extract(worth, '$.creator') as creator,
  html_strip_tags(html_unescape(json_extract(worth, '$.textual content'))) as textual content,
  json_extract(worth, '$.parent_id') as parent_id
  gadgets, json_tree(gadgets.youngsters) tree
the place
  tree.kind = 'object'
union all
choose id, created_at, creator, html_strip_tags(html_unescape(textual content)) as textual content, parent_id
from gadgets
  json_object('label', id, 'href', '' || id) as hyperlink,
from outcomes order by created_at desc

Try that out here.

That SQL query in Datasette Lite, returning a table of recent comments

The important thing to understanding the above is to know how json_tree() works. Given a JSON worth it returns an enormous digital desk representing each node in that tree as a flat checklist.

Here is a easy instance:

choose * from json_tree('[
    "id": 1,
    "name": "A",
    "children": [
        "id": 2,
        "name": "B"
    "id": 3,
    "title": "C"

Try that against Datasette. The output appears like this:

key worth kind atom id father or mother fullkey path
[{“id”:1,”name”:”A”,”children”:[{“id”:2,”name”:”B”}]},{“id”:3,”title”:”C”}] array 0 $ $
0 {“id”:1,”title”:”A”,”youngsters”:[{“id”:2,”name”:”B”}]} object 1 0 $[0] $
id 1 integer 1 3 1 $[0].id $[0]
title A textual content A 5 1 $[0].title $[0]
youngsters [{“id”:2,”name”:”B”}] array 7 1 $[0].youngsters $[0]
0 {“id”:2,”title”:”B”} object 8 7 $[0].youngsters[0] $[0].youngsters
id 2 integer 2 10 8 $[0].youngsters[0].id $[0].youngsters[0]
title B textual content B 12 8 $[0].youngsters[0].title $[0].youngsters[0]
1 {“id”:3,”title”:”C”} object 13 0 $[1] $
id 3 integer 3 15 13 $[1].id $[1]
title C textual content C 17 13 $[1].title $[1]

That is fairly helpful! The complicated nested object has been flattened for us. Most of those rows aren’t related… but when we filter for kind="object" we are able to pay money for simply the nested gadgets inside that construction which might be full JSON objects.

So that is what my greater question does. I name json_tree() on the youngsters column for every of these prime degree objects, then filter for object inside that to get out the nested feedback.

Then on the finish I do a union all towards the highest degree rows, to make sure they’re included within the ensuing desk.

I used to be actually proud of this question! After which I learn a bit extra of the Algolia API documentation and realized that it was completely unneccessary for fixing this downside. However I did at the least get to discover ways to use json_tree().

My unique resolution: hacker-news-to-sqlite

Previous to at the moment I’ve solved this downside utilizing my hacker-news-to-sqlite CLI instrument as an alternative.

This may suck the entire feedback for a thread right into a SQLite database, so you may type them chronologically utilizing Datasette.

To run that command:

pipx set up hacker-news-to-sqlite
hacker-news-to-sqlite timber feedback.db 35111646

Then open it in Datasette (or Datasette Desktop):

datasette feedback.db

hacker-news-to-sqlite makes use of the unique Hacker Information API, which suggests it has to fetch every remark in flip after which fetch any baby feedback as separate requests – so it takes some time to run!

I’ll follow the Algolia resolution sooner or later.


Created 2023-03-12T13:41:22-07:00, up to date 2023-03-12T14:40:11-07:00 · History · Edit

Source Link

What's Your Reaction?
In Love
Not Sure
View Comments (0)

Leave a Reply

Your email address will not be published.

2022 Blinking Robots.
WordPress by Doejo

Scroll To Top