SakeScript
How to contribute to the community repository
SakeSaySo’s iOS and Android applications are designed to support custom content creation. Our content, including articles, stories, and dialogues, is structured in the SakeScript format. SakeScript bundles a manifest, a JSON file for content, and any optional assets like images into a zip archive, simplifying content portability. The app indexes the manifest, thus creating a searchable and engaging user experience.
Import Options:
- Community Repository: The simplest method to share your stories is via the community GitHub repository. This repository is pre-configured in the app, allowing your content to benefit from and contribute to the community. Merging a pull request into this repository triggers an automated GitHub action, updating the search index in the app.
- Custom Repository: Add a personal story library under
Settings -> Advanced Settings -> Add Repository
. For details, visit Custom Repositories. - App Imports: In the app, navigate to the stories page, tap the
+
button at the top right, and select a SakeScript .zip
file.
Export Method:
To share a story from the app, first open the story, then navigate to the story info page, and finally tap the share button.
1 - SakeScript Format Specification
SakeScript is a structured file format for the SakeSaySo language learning app. It facilitates the packaging and distribution of learning materials, such as stories and articles in a portable manner.
Each SakeScript ZIP archive represents a single unit of learning content (e.g., a story, a news article, a lesson, or exercise). The archive includes:
- Manifest File: A
manifest.json
file containing metadata about the learning content. - Content Files:
main.json
and various files (text, images, audio) constituting the learning material.
To create a SakeScript archive, you can use the zip cli. Ensure to include all necessary files (JSON files, images, etc.) in the archive. For example:
zip my-story-name.zip manifest.json main.json images/*
main.json: Content File Format
The main.json
file contains the main content of the learning material. SakeScript currently supports two types, ‘story’ and ‘article’. The format for each is described below.
- title: A map of language codes to titles (e.g., “en”: “The Mountain Trail”).
- cover: This field supports image files. The
uri
can be a URL pointing to an external image (e.g., “https://example.org/cover.jpg") or a relative path to an image file within the archive (e.g., “images/cover.jpg”). For example:
"cover": {
"type": "image",
"uri": "images/cover.jpg" // or "https://example.org/cover.jpg"
}
- type: Type of content (“story” or “article”).
- chapters: List of chapters.
- title (optional): Currently supported for ‘story’ type. A map of language codes to titles (e.g., “en”: “About Tokyo”).
- sentences: List of sentences.
- ja: Japanese sentence.
- en: English sentence.
{
"title": {
"en": "Journey Through Japan",
"ja": "日本の旅"
},
"cover": {
"type": "image",
"uri": "https://www3.nhk.or.jp/news/html/20231111/K10014254991_2311111600_1111160953_01_02.jpg"
},
"type": "story",
"chapters": [
{
"title": {
"en": "About Tokyo",
"ja": "東京について"
},
"sentences": [
{
"ja": "東京は日本の首都です。",
"en": "Tokyo is the capital of Japan."
},
{
"ja": "新宿はにぎやかな場所です。",
"en": "Shinjuku is a bustling area."
}
]
}
]
}
The manifest.json
file in each SakeScript archive contains these fields:
- id: Unique script identifier for the content (e.g., UUID).
- type: Type of content (e.g., “story”, “article”).
- version: Format version (e.g., “1.0”).
- title: A map of language codes to titles (e.g., “en”: “The Mountain Trail”).
- created: Creation date, RFC3339 format (2020-12-29T12:00:00Z).
- modified: Last modification date, RFC3339 format (2020-12-29T12:00:00Z).
- author: Content author or creator.
- language: Primary language of the content.
- summary: A map of language codes to summaries (e.g., “en”: “A beginner-level story about a hike in the mountains.”).
- license: License for the content (e.g., “Creative Commons”).
- tags: List of tags for the content.
Optional fields:
- teaserImage (optional): Teaser image for the content.
- authorTwitter (optional): X/Twitter handle for the author.
- authorNote (optional): Author’s note about the content.
- origin (optional): Source URL for the content.
Example
{
"id": "474007F8-F307-42F5-BA0E-E8B4547C7DAF",
"type": "story",
"version": "1.0",
"title": {
"en": "The Mountain Trail",
"ja": "山道"
},
"author": "SakeSaySo",
"authorTwitter": "sakesayso",
"authorNote": "demo story",
"teaserImage": "https://raw.githubusercontent.com/sakesayso/community/master/non-fiction/sci/2F98A92E-B14F-435F-B62E-2AD91FD0E862/cover.jpg",
"created": "2020-12-13",
"modified": "2023-12-13",
"summary": {
"en": "A beginner-level story about a hike in the mountains.",
"ja": "初級者向けの山登りの話。"
},
"tags": [
"BIZ",
"N3"
],
"license": "Creative Commons Attribution-ShareAlike",
"origin": "https://www3.nhk.or.jp/news/easy/k10014288051000/k10014288051000.html"
}
Note: We recommend to use uuidgen
or https://www.uuidgenerator.net/ or similar to generate an actually unique UUID.
If you include a cover image, we recommend using JPEG format for cover images to minimize file size. To convert a PNG image from e.g. DALL·E to JPEG, you can use ImageMagick with the following command: convert cover.png -resize 1080x -quality 92 cover.jpg
.
Recommended Content Tags
Alongside JLPT levels (N1-N5), SakeScript supports arbitrary tags to categorize content. We recommend to use one JLPT level tag and at least one content tag.
Non-fiction content should use the following tags:
- AME - for arts, media, entertainment
- TEC - for technology, internet
- SCI - for science, environment
- MED - for health, medical, fitness
- SPO - for sports, esports
- LIF - for lifestyle, leasure
- POL - for politics, society
- BIZ - for finance, business, economics, military
Fiction content should use the following tags:
- ADV - for adventure, exploration
- COM - for comedy, humor
- DRA - for drama, relationships
- DYS - for dystopia, social Commentary
- FAN - for fantasy, mythology
- HIS - for historical, period
- SFI - for science fiction, futurism
- THR - for thriller, mystery
Repository Index File
An index.json
file is maintained in the repository to catalog all available SakeScript materials. This index, auto-generated from each archive’s manifest file, includes:
- path: Relative path to the SakeScript ZIP in the repository.
- sha256: SHA-256 integrity hash of the ZIP archive.
- manifest: Extracted manifest data.
Example
[
{
"path": "the-mountain-trail.zip",
"sha256": "bf35415b1ee00fe56e6a8016848d7c7c35e392ca4732716dfce190a403b8303a",
"manifest": {
"id": "474007F8-F307-42F5-BA0E-E8B4547C7DAF",
"version": "1.0",
"title": {
"en": "The Mountain Trail",
"ja": "山道"
},
"author": "SakeSaySo",
"authorTwitter": "sakesayso",
"authorNote": "demo story",
"created": "2020-12-13",
"modified": "2023-12-13",
"difficulty": "beginner",
"summary": {
"en": "A beginner-level story about a hike in the mountains.",
"ja": "初級者向けの山登りの話。"
},
"tags": [
"LIF",
"N4"
],
"license": "Creative Commons Attribution-ShareAlike"
}
}
// ...
]
Contribution and Usage Guidelines
See the content repository for more information on contributing and using SakeScript materials at https://github.com/sakesayso/community.
Contributing to SakeScript
- Prepare your content and package it in a SakeScript ZIP file.
- Include a
manifest.json
file with accurate metadata. - Place the ZIP file in the appropriate directory within the repository.
- Ensure the
index.json
is updated post-merge (typically automated).
Content Licensing
We encourage the use of the “Creative Commons Attribution-ShareAlike” license. This license allows for both commercial and non-commercial use, modification, and distribution of content, as long as the original author is credited and any derivative works are shared under the same terms. This promotes a collaborative and open learning environment while ensuring creators receive recognition for their work.
How to License Your Content?
Simply include the “Creative Commons Attribution-ShareAlike” license in your manifest.json
file. For more details on how to apply this license, visit Creative Commons.
2 - Flash Card Deck Format
The SakeScript format for the SakeSaySo language learning app facilitates the packaging and distribution of learning materials, including flash cards for spaced repetition learning of vocabulary, sentences and phrases.
SakeScript flash card decks are much simpler than Anki decks. They only contain a list of vocabulary, sentences, or phrases in a plaintext format that currently require importing with or without dictionary matching as outlined in the deck import and export section. If you’re converting an Anki deck, it is recommended importing the deck into SakeSaySo initially. Following this, you can export the content to a cleaner, simplified txt file for sharing or further modifications within the app.
Beyond simple txt formats, SakeScript decks are then shareable through SakeScript ZIP archives and the community repository and other Git repositories, where the archive includes:
- Manifest File: A
manifest.json
file containing metadata about the learning content. - The content Files:
main.txt
.
The manifest.json
file in each SakeScript archive works as usual.
main.txt: Content File
A ‘deck’ is a compilation of flash cards detailed in a main.txt
file in the portable zip file. This file is expected to present learning resources in a simple text format. English-Japanese can be in any order and importing allows for optional dictionary matching, for cross-references and better integration of the apps features. For example:
乾杯(かんぱい)
cheers, bottoms-up, prosit
宴会(えんかい)
party, banquet, reception
See the community repository for examples.
3 - Import and Export SakeScript Stories
SakeSaySo is designed with sharing, collaboration and open formats in mind. Our story, article and learning format SakeScript is open and welcomes contributors.
This page is dealing with the exchange of stories and articles via SakeScript .zip
files.
By default, the SakeScript app accesses both the community and daily news repositories. You may configure custom http based repositories that may also be hosted on GitHub, as discussed on the custom repository page.
All online content in SakeScript, once downloaded, is available for offline use. The app ensures content integrity by verifying the sha256 hash of each download and stores the zip files in the phone’s document directory.
To export a story, navigate to the story info page, tap the (i)
icon after downloading, and you can easily share the zip file via messengers, Google Drive, or other platforms.
4 - Custom Repositories
How to contribute to the community repository
The technical aspects of this guide assume you’re familiar with Git, GitHub, and JSON text formats.
SakeSaySo’s iOS and Android apps are specifically designed to support custom content through two main repository options:
- Community Repository: Contributing to the community GitHub repository is straightforward. This repository is pre-set in the app, and an automated GitHub action updates the search index upon the merging of a pull request.
- Custom Repository: For a personal, tailored learning experience, you may add a custom repository under
Settings -> Advanced Settings -> Add Repository
.
Setting Up a Custom Repository
The app requires custom repositories to be accessible via an HTTP server, hosting an index.json
file containing manifests. The list of manifests should list story .zip
files at relative paths, with valid sha256
checksums for integrity verification and updates tracking.
Hosting on GitHub
Github can host your custom repositories, similarly to the community repository. Use the following format for GitHub-hosted repositories:
name: [repository name]
uri: https://api.github.com/repos/<username or organization>/<repository name>/contents/
Optionally provide a weburl to have users see an info icon (i)
on the repository page, linking to the browsable GitHub repository if it’s public.
weburl: https://github.com/<username or organization>/<repository name>
For branch-specific content (e.g., for testing):
branch: [branch name]
Accessing Private Repositories
- Basic Authentication: Include username and password in the repository URL for basic auth-protected servers.
https://username:[email protected]/my/custom/repository
- Github PAT Tokens: For GitHub repositories, use personal access tokens (PAT) for authentication. Add the token directly in the app.
github_pat_[your_token]
Note: The app currently supports basic auth and GitHub PAT tokens for private repository access.
5 - GitHub Actions for Daily News Aggregation
Supporting the trend of content generation with LLMs, this page shows how we’re using GitHub Actions with our Go scripts to create daily translated news in one of our community repositories.
This guide demonstrates setting up custom scripts with GitHub Actions to automate tasks on a schedule.
Embracing the wave of content generation through LLMs since 2023, SakeSaySo leverages Anthropic AI in its news aggregation process. We use GitHub Actions, combined with Go-based tools and scripts that we prefer over Python, to automate such tasks. The configuration below is an example setup, including a cron schedule, showcasing how you can replicate this for your needs.
name: Go Scheduled Newswriter
on:
push:
branches: [ master ]
schedule:
- cron: '0 21 * * *' # Runs at 21:00 UTC (6 AM JST)
jobs:
build:
runs-on: ubuntu-latest
steps:
- name: Checkout Current Repository
uses: actions/checkout@v2
- name: Checkout sakesayso/news Repository
uses: actions/checkout@v2
with:
repository: 'sakesayso/news'
token: ${{ secrets.SAKESAYSO_WRITER_PAT }}
path: 'news'
- name: Set up Go
uses: actions/setup-go@v2
with:
go-version: '1.21'
- name: Run Newswriter A
env:
ANTHROPIC_TOKEN: ${{ secrets.ANTHROPIC_TOKEN }}
run: go run cmd/newswriter/main.go
- name: Commit and Push Changes
run: |
cd news
git config --global user.name 'sakebot'
git config --global user.email '[email protected]'
git add .
git commit -m "Update from newswriter" || true
git push
This configuration details the steps from checking out repositories to executing the Go script and pushing updates.