Bogleheads talk:Imported Wikipedia modules

Table update
I intend to update the table utilizing the MediaWiki API implemented in python. --LadyGeek 08:45, 11 January 2023 (UTC)
 * ✅ (Additional: Added above signature per revision history)--LadyGeek 12:49, 20 February 2023 (UTC)

Python implementation
The sections below describe an automated approach implemented in python. --LadyGeek 13:35, 20 February 2023 (UTC)

Approach

 * Generate an API query string.
 * Extract the page title and timestamp (latest revision date)
 * Build the wikitext table
 * mw-datatable class
 * Page title formatted as internal link Page title
 * timestamp formatted appropriately

API query
The query is defined through the wiki's built-in API sandbox. Special pages --> Version --> /w/api.php --> Special:ApiSandbox

An initial prototype was developed at MediaWiki.org. API sandbox - MediaWiki uses a full text search. However, using the same query here produced no results. This is likely due to MediaWiki.org using a later version customized for Wikipedia.

The "allpages" list can not be customized to add a revision property. Instead, I ended up with an "allrevisions" list query.

The working API sandbox query: API sandbox - Bogleheads

Click "Make request", then go to the Results section on the left-side menu.


 * Show request data as: JSON
 * "Request JSON:" are the parameters for the python request.

Documentation: API:Allrevisions - MediaWiki - scroll down to the Sample code for the python implementation. There's also a ton of help via google.

Discussion
This effort is open to all wiki editors. Feel free to work on your own approach. Comments on my approach are also welcome. --LadyGeek 13:45, 11 January 2023 (UTC)
 * At first glance there are a number of entries with  that have no corresponding Module. Digging further the script seems to be missing many modules. The current page shows 192 entries while  shows 296 entries. --Peculiar Investor 14:26, 20 February 2023 (UTC)
 * At second glance, my debugger shows 192 entries returned from the API sandbox. I'll start looking there. --LadyGeek 15:04, 20 February 2023 (UTC)
 * Looking at the API sandbox directly, the discrepancy may be due to the python code. Under investigation. --LadyGeek 15:24, 20 February 2023 (UTC)
 * Have you also considered API will append an additonal element in some cases? --Peculiar Investor 20:48, 20 February 2023 (UTC)
 * Yes. The data returns with the boolean element  set to.
 * I also get a  element containing   The API sandbox documentation for this parameter says "5000" is the upper limit, so there is a typo somewhere. I changed the query to remove the warning and still get the same number of elements returned.
 * In my test environment - The discrepancy is between (190 occurrences of "Module:" seen on the API sandbox results page) and (296 occurrences of "Module:" in the All pages namespace search results). I'll experiment with the API further. --LadyGeek 22:25, 20 February 2023 (UTC)

I have completed my API experiment. The MediaWiki documentation is unclear to me on the definition of "batch" regarding the  element, which is always set to. I thought it meant that all of the data had been fetched. That wasn't the case.

In my testing, the  sub-elements were also present. I ignored the  element and continued the API requests as documented. The  sub-elements did disappear after all the data was fetched. The API is working as documented.

There may also be some interaction with testing via the web browser (setting cookies?) vs. python. In some cases, all of the data was returned (all 296 results) vs. partial data (190 results).

Regardless, I'll use existence of the  sub-elements as an exit criteria in a for loop. The example python code in API:Continue alludes to this approach. --LadyGeek 01:26, 21 February 2023 (UTC)
 * The script was modified per the example python code in API:Continue. The API is returning multiple entries with different revisions (the descending timestamp order was preserved). This is not the expected behavior and the number of multiple entries increased with a decreasing  parameter. No clue why.
 * The number of modules in the table matches the search results in All pages (Module namespace) (296). --LadyGeek 15:54, 22 February 2023 (UTC)

Source
The python script was developed in Linux.

The  in the MediaWiki documentation states that the upper limit is 500 revisions, but the API sandbox says 5000. I kept the 5000 sandbox limit to avoid going over 500 in the near future. A warning will be printed every time API data is fetched; use it as a loop counter. A different number of entries will be returned each time (it's not evenly divided).