This is the multi-page printable view of this section. Click here to print.

Return to the regular view of this page.

Documentation

Welcome to the SakeSaySo documentation - where we host resources for the thirsty learner. Currently documentation is created as necessity dictates, responding to more frequent queries.

SakeSaySo caters to learners favoring pragmatic language learning over traditional textbook methods. The initial focus is primarily on enhancing reading experiences with texts drawn from everyday situations and news. While it’s not tailored for standardized tests like the JLPT or TOEIC, it may offer a differentiated approach to language learning.

SakeSaySo, at its core is a dictionary with example sentences. Such entries, including sentences, may be added to named “decks” you can create for review. Decks can be learned with flash cards, utilizing spaced repetition, but also imported and exported. On top of these simpler capabilities are stories, articles and dialogues, that are leveraging the aforementioned basic tools.

1 - Import and Export Learning Decks

SakeSaySo is designed with sharing, collaboration and open formats in mind. We support plaintext exchange formats for importing and exporting flash-card decks, compatible with both the Anki and the “Japanese” app.

We’ve used flash cards in various other apps over the years. Our default plaintext format is similar to the “Japanese” app, for which we support import from and export too. Example:

乾杯(かんぱい)
cheers, bottoms-up, prosit

宴会(えんかい)
party, banquet, reception

飲み会(のみかい)
drinking party, get-together

酔っ払い(よっぱらい)
drunkard

悪酔い(わるよい)
drunken sickness, getting sick from drinking, drunken frenzy

失言(しつげん)
verbal gaffe, verbal slip, slip of the tongue

迷走(めいそう)
straying, wandering (off course)

二次会(にじかい)
after-party, second party (of the night), second meeting

空き瓶(あきびん)
empty bottle

二日酔い(ふつかよい)
hangover

SakeSaySo supports a variety of plaintext and Anki-specific formats for import and export. Examples include:

"乾杯";"cheers, bottoms-up, prosit"
"宴会";"party, banquet, reception"
"飲み会";"drinking party, get-together"
"酔っ払い";"drunkard"
"悪酔い";"drunken sickness, getting sick from drinking, drunken frenzy"
"失言";"verbal gaffe, verbal slip, slip of the tongue"
"迷走";"straying, wandering (off course)"
"二次会";"after-party, second party (of the night), second meeting"
"空き瓶";"empty bottle"
"二日酔い";"hangover"

This feature employs best-effort matching rules to auto-detect various import formats. If more features or formats are desired, please request them.

Exporting Decks from Anki

Exporting a deck from Anki is straightforward. Simply navigate to your desired deck and select Export.

Anki Deck Export

From the export options, select Cards in Plain Text (.txt) as the format and click Export.

Anki Deck Export Format

The exported file can be imported into SakeSaySo, from clipboard or file and may look like this:

Anki Deck Export Result

Importing Decks into SakeSaySo

To import a deck into SakeSaySo, either create a new deck or select an existing one. Then, tap the three dots in the upper right corner to access the import and export options.

Import Deck

Under Import Options, you can choose Only if found in dictionary, for automatic dictionary matching. Entries not found can be compiled into a separate list for import under Only if not found in dictionary. For unmatched entries, the sentence pages link to the dictionary, allowing you to add these entries individually to your matched list.

Import Deck Options

Working with Sentences

Sentences can be prepared in any text editor and imported into SakeSaySo. Note that sentences have no entries in the dictionary and therefore must be imported with the option Import All or Only if not found in dictionary. Example:

こちらで社会保険に加入する必要があると思います。教えてもらえますか?
I think I need to enroll in social insurance here. Could you tell me how?

こちらで雇用保険と失業保険に加入する必要があると思います。教えてもらえますか?
I think I need to enroll in employment insurance and unemployment insurance here. Could you tell me how?

The format requires each sentence on a new line, separated by a line break. The order of English-Japanese or Japanese-English is automatically recognize by the app and appropriately indexed. Sentences can be learned on flash cards and their tokens can be inspected for individual words and phrases. From the sentence page, we can navigate to the dictionary entry for a word or phrase and add unknown vocabulary to our vocabulary decks.

Deck with Sentences

deck-sentences

Flash-Cards with Sentences

deck-sentences

Vocabulary and sentences can be mixed on the same decks and follow spaced repetition learning.

2 - SakeScript

How to contribute to the community repository

SakeSaySo’s iOS and Android applications are designed to support custom content creation. Our content, including articles, stories, and dialogues, is structured in the SakeScript format. SakeScript bundles a manifest, a JSON file for content, and any optional assets like images into a zip archive, simplifying content portability. The app indexes the manifest, thus creating a searchable and engaging user experience.

Import Options:

  • Community Repository: The simplest method to share your stories is via the community GitHub repository. This repository is pre-configured in the app, allowing your content to benefit from and contribute to the community. Merging a pull request into this repository triggers an automated GitHub action, updating the search index in the app.
  • Custom Repository: Add a personal story library under Settings -> Advanced Settings -> Add Repository. For details, visit Custom Repositories.
  • App Imports: In the app, navigate to the stories page, tap the + button at the top right, and select a SakeScript .zip file.

Export Method:

To share a story from the app, first open the story, then navigate to the story info page, and finally tap the share button.

2.1 - SakeScript Format Specification

SakeScript is a structured file format for the SakeSaySo language learning app. It facilitates the packaging and distribution of learning materials, such as stories and articles in a portable manner.

Each SakeScript ZIP archive represents a single unit of learning content (e.g., a story, a news article, a lesson, or exercise). The archive includes:

  • Manifest File: A manifest.json file containing metadata about the learning content.
  • Content Files: main.json and various files (text, images, audio) constituting the learning material.

To create a SakeScript archive, you can use the zip cli. Ensure to include all necessary files (JSON files, images, etc.) in the archive. For example:

zip my-story-name.zip manifest.json main.json images/*

main.json: Content File Format

The main.json file contains the main content of the learning material. SakeScript currently supports two types, ‘story’ and ‘article’. The format for each is described below.

  • title: A map of language codes to titles (e.g., “en”: “The Mountain Trail”).
  • cover: This field supports image files. The uri can be a URL pointing to an external image (e.g., “https://example.org/cover.jpg") or a relative path to an image file within the archive (e.g., “images/cover.jpg”). For example:
    "cover": {
        "type": "image",
        "uri": "images/cover.jpg" // or "https://example.org/cover.jpg"
    }
  • type: Type of content (“story” or “article”).
  • chapters: List of chapters.
    • title (optional): Currently supported for ‘story’ type. A map of language codes to titles (e.g., “en”: “About Tokyo”).
    • sentences: List of sentences.
      • ja: Japanese sentence.
      • en: English sentence.
{
    "title": {
        "en": "Journey Through Japan",
        "ja": "日本の旅"
    },
    "cover": {
        "type": "image",
        "uri": "https://www3.nhk.or.jp/news/html/20231111/K10014254991_2311111600_1111160953_01_02.jpg"
    },
    "type": "story",
    "chapters": [
        {
            "title": {
                "en": "About Tokyo",
                "ja": "東京について"
            },
            "sentences": [
                {
                    "ja": "東京は日本の首都です。",
                    "en": "Tokyo is the capital of Japan."
                },
                {
                    "ja": "新宿はにぎやかな場所です。",
                    "en": "Shinjuku is a bustling area."
                }
            ]
        }
    ]
}

manifest.json: Metadata File Format

The manifest.json file in each SakeScript archive contains these fields:

  • id: Unique script identifier for the content (e.g., UUID).
  • type: Type of content (e.g., “story”, “article”).
  • version: Format version (e.g., “1.0”).
  • title: A map of language codes to titles (e.g., “en”: “The Mountain Trail”).
  • created: Creation date, RFC3339 format (2020-12-29T12:00:00Z).
  • modified: Last modification date, RFC3339 format (2020-12-29T12:00:00Z).
  • author: Content author or creator.
  • language: Primary language of the content.
  • summary: A map of language codes to summaries (e.g., “en”: “A beginner-level story about a hike in the mountains.”).
  • license: License for the content (e.g., “Creative Commons”).
  • tags: List of tags for the content. Optional fields:
  • teaserImage (optional): Teaser image for the content.
  • authorTwitter (optional): X/Twitter handle for the author.
  • authorNote (optional): Author’s note about the content.
  • origin (optional): Source URL for the content.

Example

{
    "id": "474007F8-F307-42F5-BA0E-E8B4547C7DAF",
    "type": "story",
    "version": "1.0",
    "title": {
        "en": "The Mountain Trail",
        "ja": "山道"
    },
    "author": "SakeSaySo",
    "authorTwitter": "sakesayso",
    "authorNote": "demo story",
    "teaserImage": "https://raw.githubusercontent.com/sakesayso/community/master/non-fiction/sci/2F98A92E-B14F-435F-B62E-2AD91FD0E862/cover.jpg",
    "created": "2020-12-13",
    "modified": "2023-12-13",
    "summary": {
        "en": "A beginner-level story about a hike in the mountains.",
        "ja": "初級者向けの山登りの話。"
    },
    "tags": [
        "BIZ",
        "N3"
    ],
    "license": "Creative Commons Attribution-ShareAlike",
    "origin": "https://www3.nhk.or.jp/news/easy/k10014288051000/k10014288051000.html"
}

Note: We recommend to use uuidgen or https://www.uuidgenerator.net/ or similar to generate an actually unique UUID.

If you include a cover image, we recommend using JPEG format for cover images to minimize file size. To convert a PNG image from e.g. DALL·E to JPEG, you can use ImageMagick with the following command: convert cover.png -resize 1080x -quality 92 cover.jpg.

Alongside JLPT levels (N1-N5), SakeScript supports arbitrary tags to categorize content. We recommend to use one JLPT level tag and at least one content tag.

Non-fiction content should use the following tags:

  • AME - for arts, media, entertainment
  • TEC - for technology, internet
  • SCI - for science, environment
  • MED - for health, medical, fitness
  • SPO - for sports, esports
  • LIF - for lifestyle, leasure
  • POL - for politics, society
  • BIZ - for finance, business, economics, military

Fiction content should use the following tags:

  • ADV - for adventure, exploration
  • COM - for comedy, humor
  • DRA - for drama, relationships
  • DYS - for dystopia, social Commentary
  • FAN - for fantasy, mythology
  • HIS - for historical, period
  • SFI - for science fiction, futurism
  • THR - for thriller, mystery

Repository Index File

An index.json file is maintained in the repository to catalog all available SakeScript materials. This index, auto-generated from each archive’s manifest file, includes:

  • path: Relative path to the SakeScript ZIP in the repository.
  • sha256: SHA-256 integrity hash of the ZIP archive.
  • manifest: Extracted manifest data.

Example

[
  {
    "path": "the-mountain-trail.zip",
    "sha256": "bf35415b1ee00fe56e6a8016848d7c7c35e392ca4732716dfce190a403b8303a",
    "manifest": {
      "id": "474007F8-F307-42F5-BA0E-E8B4547C7DAF",
      "version": "1.0",
      "title": {
        "en": "The Mountain Trail",
        "ja": "山道"
      },
      "author": "SakeSaySo",
      "authorTwitter": "sakesayso",
      "authorNote": "demo story",
      "created": "2020-12-13",
      "modified": "2023-12-13",
      "difficulty": "beginner",
      "summary": {
        "en": "A beginner-level story about a hike in the mountains.",
        "ja": "初級者向けの山登りの話。"
      },
      "tags": [
        "LIF",
        "N4"
      ],
      "license": "Creative Commons Attribution-ShareAlike"
    }
  }
  // ...
]

Contribution and Usage Guidelines

See the content repository for more information on contributing and using SakeScript materials at https://github.com/sakesayso/community.

Contributing to SakeScript

  • Prepare your content and package it in a SakeScript ZIP file.
  • Include a manifest.json file with accurate metadata.
  • Place the ZIP file in the appropriate directory within the repository.
  • Ensure the index.json is updated post-merge (typically automated).

Content Licensing

We encourage the use of the “Creative Commons Attribution-ShareAlike” license. This license allows for both commercial and non-commercial use, modification, and distribution of content, as long as the original author is credited and any derivative works are shared under the same terms. This promotes a collaborative and open learning environment while ensuring creators receive recognition for their work.

How to License Your Content?

Simply include the “Creative Commons Attribution-ShareAlike” license in your manifest.json file. For more details on how to apply this license, visit Creative Commons.

2.2 - Flash Card Deck Format

The SakeScript format for the SakeSaySo language learning app facilitates the packaging and distribution of learning materials, including flash cards for spaced repetition learning of vocabulary, sentences and phrases.

SakeScript flash card decks are much simpler than Anki decks. They only contain a list of vocabulary, sentences, or phrases in a plaintext format that currently require importing with or without dictionary matching as outlined in the deck import and export section. If you’re converting an Anki deck, it is recommended importing the deck into SakeSaySo initially. Following this, you can export the content to a cleaner, simplified txt file for sharing or further modifications within the app.

Beyond simple txt formats, SakeScript decks are then shareable through SakeScript ZIP archives and the community repository and other Git repositories, where the archive includes:

  • Manifest File: A manifest.json file containing metadata about the learning content.
  • The content Files: main.txt.

manifest.json: Metadata File Format

The manifest.json file in each SakeScript archive works as usual.

main.txt: Content File

A ‘deck’ is a compilation of flash cards detailed in a main.txt file in the portable zip file. This file is expected to present learning resources in a simple text format. English-Japanese can be in any order and importing allows for optional dictionary matching, for cross-references and better integration of the apps features. For example:

乾杯(かんぱい)
cheers, bottoms-up, prosit

宴会(えんかい)
party, banquet, reception

See the community repository for examples.

2.3 - Import and Export SakeScript Stories

SakeSaySo is designed with sharing, collaboration and open formats in mind. Our story, article and learning format SakeScript is open and welcomes contributors.

This page is dealing with the exchange of stories and articles via SakeScript .zip files.

By default, the SakeScript app accesses both the community and daily news repositories. You may configure custom http based repositories that may also be hosted on GitHub, as discussed on the custom repository page.

All online content in SakeScript, once downloaded, is available for offline use. The app ensures content integrity by verifying the sha256 hash of each download and stores the zip files in the phone’s document directory.

To export a story, navigate to the story info page, tap the (i) icon after downloading, and you can easily share the zip file via messengers, Google Drive, or other platforms.

2.4 - Custom Repositories

How to contribute to the community repository

The technical aspects of this guide assume you’re familiar with Git, GitHub, and JSON text formats.

SakeSaySo’s iOS and Android apps are specifically designed to support custom content through two main repository options:

  • Community Repository: Contributing to the community GitHub repository is straightforward. This repository is pre-set in the app, and an automated GitHub action updates the search index upon the merging of a pull request.
  • Custom Repository: For a personal, tailored learning experience, you may add a custom repository under Settings -> Advanced Settings -> Add Repository.
Add Private or Custom Repository

Setting Up a Custom Repository

The app requires custom repositories to be accessible via an HTTP server, hosting an index.json file containing manifests. The list of manifests should list story .zip files at relative paths, with valid sha256 checksums for integrity verification and updates tracking.

Hosting on GitHub

Github can host your custom repositories, similarly to the community repository. Use the following format for GitHub-hosted repositories:

name: [repository name]
uri: https://api.github.com/repos/<username or organization>/<repository name>/contents/

Optionally provide a weburl to have users see an info icon (i) on the repository page, linking to the browsable GitHub repository if it’s public.

weburl: https://github.com/<username or organization>/<repository name>

For branch-specific content (e.g., for testing):

branch: [branch name]

Accessing Private Repositories

  • Basic Authentication: Include username and password in the repository URL for basic auth-protected servers.
https://username:[email protected]/my/custom/repository
  • Github PAT Tokens: For GitHub repositories, use personal access tokens (PAT) for authentication. Add the token directly in the app.
github_pat_[your_token]

Note: The app currently supports basic auth and GitHub PAT tokens for private repository access.

2.5 - GitHub Actions for Daily News Aggregation

Supporting the trend of content generation with LLMs, this page shows how we’re using GitHub Actions with our Go scripts to create daily translated news in one of our community repositories.

This guide demonstrates setting up custom scripts with GitHub Actions to automate tasks on a schedule.

Embracing the wave of content generation through LLMs since 2023, SakeSaySo leverages Anthropic AI in its news aggregation process. We use GitHub Actions, combined with Go-based tools and scripts that we prefer over Python, to automate such tasks. The configuration below is an example setup, including a cron schedule, showcasing how you can replicate this for your needs.

name: Go Scheduled Newswriter

on:
  push:
    branches: [ master ]
  schedule:
    - cron: '0 21 * * *'  # Runs at 21:00 UTC (6 AM JST)

jobs:
  build:
    runs-on: ubuntu-latest

    steps:
    - name: Checkout Current Repository
      uses: actions/checkout@v2

    - name: Checkout sakesayso/news Repository
      uses: actions/checkout@v2
      with:
        repository: 'sakesayso/news'
        token: ${{ secrets.SAKESAYSO_WRITER_PAT }}
        path: 'news'

    - name: Set up Go
      uses: actions/setup-go@v2
      with:
        go-version: '1.21'

    - name: Run Newswriter A
      env:
        ANTHROPIC_TOKEN: ${{ secrets.ANTHROPIC_TOKEN }}
      run: go run cmd/newswriter/main.go

    - name: Commit and Push Changes
      run: |
        cd news
        git config --global user.name 'sakebot'
        git config --global user.email '[email protected]'
        git add .
        git commit -m "Update from newswriter" || true
        git push        

This configuration details the steps from checking out repositories to executing the Go script and pushing updates.